[ https://issues.apache.org/jira/browse/SOLR-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175297#comment-16175297 ]
Michael A. Alcorn edited comment on SOLR-11386 at 9/21/17 9:51 PM: ------------------------------------------------------------------- Thanks for the reply, [~cpoerschke]. After further investigation, I think the issue is the multi-term EFI arguments. The query I thought was working is actually only returning the extracted feature values corresponding to _the first term_. So: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=of fiber channel added couple efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} is the same as: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=of efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} (i.e., all zeros because "of" is a stop word). And: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=fiber channel added couple of efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} is the same as: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=fiber efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} This is an example of how we're defining the features: {code} { "store": "redhat_efi_feature_store", "name": "case_description_issue_tfidf", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q": "{!field f=issue_tfidf}${case_description}" } } {code} was (Author: malcorn_redhat): Thanks for the reply, [~cpoerschke]. After further investigation, I think the issue is the multi-term EFI arguments. The query I thought was working is actually only returning the extracted feature values corresponding to _the first term_. So: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=of fiber channel added couple efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} is the same as: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=of efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} (i.e., all zeros because "of" is a stop word). And: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=fiber channel added couple of efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} is the same as: {code} http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 efi.case_summary=the efi.case_description=fiber efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 {code} This is an example of how we're defining the feature values: {code} { "store": "redhat_efi_feature_store", "name": "case_description_issue_tfidf", "class": "org.apache.solr.ltr.feature.SolrFeature", "params": { "q": "{!field f=issue_tfidf}${case_description}" } } {code} > Extracting learning to rank features fails when word ordering of EFI argument > changed. > -------------------------------------------------------------------------------------- > > Key: SOLR-11386 > URL: https://issues.apache.org/jira/browse/SOLR-11386 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LTR > Affects Versions: 6.5.1 > Reporter: Michael A. Alcorn > > I'm getting some extremely strange behavior when trying to extract features > for a learning to rank model. The following query incorrectly says all > features have zero values: > {code} > http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=added > couple of fiber channel&rq={!ltr model=redhat_efi_model reRankDocs=1 > efi.case_summary=the efi.case_description=added couple of fiber channel > efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 > {code} > But this query, which simply moves the word "added" from the front of the > provided text to the back, properly fills in the feature values: > {code} > http://gss-test-fusion.usersys.redhat.com:8983/solr/access/query?q=couple of > fiber channel added&rq={!ltr model=redhat_efi_model reRankDocs=1 > efi.case_summary=the efi.case_description=couple of fiber channel added > efi.case_issue=the efi.case_environment=the}&fl=id,score,[features]&rows=10 > {code} > The explain output for the failing query can be found here: > https://gist.github.com/manisnesan/18a8f1804f29b1b62ebfae1211f38cc4 > and the explain output for the properly functioning query can be found here: > https://gist.github.com/manisnesan/47685a561605e2229434b38aed11cc65 -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org