[
https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251933#comment-13251933
]
Robert Muir commented on SOLR-3099:
-----------------------------------
{quote}
The stored part will be duplicated, and to support highlighting for a multiple
field solution you need to do extra programming to merge the highlights from
each field.
{quote}
Wait, why would you duplicate that? just store it once.
if the highlighter cannot deal with the fact that foo_unstemmed and foo_stemmed
have the same stored content only in one field (called whatever, i dont care),
then thats a highlighter problem.
Its not something to be worked around by making analyzers more complicated or
screwing up scoring by injecting things.
{quote}
Instead of assuming that we'd complicate analysis as you're afraid of, we
should work on simplifying and refactoring analysis to make it more flexible
and easier to work with, implementing features like this. Other stuff that
could be useful in analysis is a graph structure instead of the current linear
one to be able to overlay "New York" as a synonym for "NY" on the same position
offset even if they have different number of tokens
{quote}
Who is doing the assumptions? this has already happened: its called
PositionLengthAttribute and is already in 3.6 and trunk...
> Add query operator, index structure, and analyzer for "exact match" searching
> -----------------------------------------------------------------------------
>
> Key: SOLR-3099
> URL: https://issues.apache.org/jira/browse/SOLR-3099
> Project: Solr
> Issue Type: Sub-task
> Components: Schema and Analysis
> Reporter: Mike
> Fix For: 4.0
>
>
> A project I'm working on requires *exact match* searching with stemming
> turned off. The users are accostomed to Sphinx search, and thus expect a
> query like [ =runs ] to return only documents that contain the exact term,
> "runs", and not the stemmed word "run".
> In SOLR-2866, there is similar work, but I believe it is different because it
> uses a huge-synonym file rather than storing the original terms directly in
> the index.
> What I'd like instead is two things:
> 1. An analyzer that says, "store the original form of all words in the index
> along with the stemmed variations." If necessary, it's fine if this is simply
> an unstemmed field, but that seems cumbersome schema-wise and
> performance-wise.
> 2. An operator in edismax that allows users to query the exact form of the
> word. Sphinx uses the equals sign (=), and that makes sense logically to me.
> This issue is part of a meta issue, SOLR-3028, that is requesting two other
> operators in edismax (quorum search and word order).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]