Erick Erickson created SOLR-7085:
------------------------------------

             Summary: Add a comment to the schema.xml file(s) warning against 
applying analysis chains to the <uniqueKey> field.
                 Key: SOLR-7085
                 URL: https://issues.apache.org/jira/browse/SOLR-7085
             Project: Solr
          Issue Type: Improvement
            Reporter: Erick Erickson
            Assignee: Erick Erickson
            Priority: Minor


If you apply index-time transformations to the <uniqueKey> field, very 
interesting things happen, all of them bad.
1> the doc doesn't get updated
2> Docs are routed to shards based on the original form of the ID field.

I stopped looking there. There are much bigger fish to fry than trying to apply 
an index-time analysis chain to the <uniqueKey> so a comment in the schema.xml 
seems all that is necessary.

Trying  to change this at a code level would be a nightmare I suspect. Consider 
routing by a secondary field for instance and N+1 other places this would pop 
out.

Limited _query_ time transformations are OK, they just have to match the 
indexing program's transformations, about the only one I'd recommend is 
lowercasing, but others are possible if you're brave as long as they match the 
indexing program's transformations.

My "rule of thumb" I was trying to apply here is that "anything a human enters 
in your search app should not be a case-sensitive when searching" and it can be 
enforced easily enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to