[
https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251930#comment-13251930
]
Jan Høydahl commented on SOLR-3099:
-----------------------------------
The stored part will be duplicated, and to support highlighting for a multiple
field solution you need to do extra programming to merge the highlights from
each field. It won't give *more* query features, but will work more nicely
together with existing features. I'm working towards support for stuff like
{{foo ONEAR/10 "bar"}}, a span query between the two terms where "bar" should
then be matched literally - spans would not work across words in different
fields.
Instead of assuming that we'd *complicate* analysis as you're afraid of, we
should work on simplifying and refactoring analysis to make it more flexible
and easier to work with, implementing features like this. Other stuff that
could be useful in analysis is a graph structure instead of the current linear
one to be able to overlay "New York" as a synonym for "NY" on the same position
offset even if they have different number of tokens; or to attach metadata to
field input e.g. to signify that the input is pre-tokenized.
Also note that at this stage of this issue we're just discussing possible ways
forward, any implementation details are still left to decide...
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
> Key: SOLR-3099
> URL: https://issues.apache.org/jira/browse/SOLR-3099
> Project: Solr
> Issue Type: Sub-task
> Components: Schema and Analysis
> Reporter: Mike
> Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming
> turned off. The users are accostomed to Sphinx search, and thus expect a
> query like [ =runs ] to return only documents that contain the exact term,
> "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it
> uses a huge-synonym file rather than storing the original terms directly in
> the index.
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index
> along with the stemmed variations." If necessary, it's fine if this is simply
> an unstemmed field, but that seems cumbersome schema-wise and
> performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the
> word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other
> operators in edismax (quorum search and word order).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]