[ 
https://issues.apache.org/jira/browse/SOLR-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288167#comment-13288167
 ] 

Jack Krupansky commented on SOLR-3503:
--------------------------------------

It could be tricky, but it could work, but users would have to be made aware of 
how wildcards could interfere or interact with stemming. And testing is 
essential, as well as good user documentation of how to navigate the stemming 
vs. wildcards minefield.

Unless the user actually knows what the stemmed term will be, even simple 
trailing wildcards can be tricky since the stem could be much shorter than the 
user expects. For example "investment*" where the actual stemmed and indexed 
term might be "invest" for a particular stemmer.

Leading wildcards can sometimes be okay, but completely dependent on the 
particular stemmer. For example, "*ment".

And simple embedded wildcards can be a real wildcard, once again depending on 
the specific stemmer. For example, "inve*ment".

But, I don't think any or all of those concerns are any worse than the 
situation we have today.

But, some robust tests would be needed to persuade me that this improvement is 
actually okay.

Right now, I say go for it, including the test examples for various stemmers 
and documentation for issues that users must be aware of (call it "safe 
wildcards in the presence of stemming.") I think the only restriction is that 
query results should not be worse than without this improvement.

Unfortunately, the doc may be stemmer-dependent. And separate tests needed for 
each stemmer.

The bottom line is to reduce the surprise factor for the user.

As a side note, it would be nice if Solr had a mechanism to return "informative 
notes and warnings" with a query response. For example, "Wildcard term 
inves*ment matches no indexed terms".

                
> Make SnowballPorterFilterFactory (and other stemmers?) MultiTermAware
> ---------------------------------------------------------------------
>
>                 Key: SOLR-3503
>                 URL: https://issues.apache.org/jira/browse/SOLR-3503
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> It seems to me that all the stemmers could be MultiTermAware, anyone know of 
> a reason not?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to