bq: I didn't even think about SolrCloud Me neither until I was driving away...
Something like that might work, but my personal feeling here is that we're getting into a complex solution for something that people are solving so far, I was just surprised by the behavior b/c I hadn't really thought it through, changed the <uniqueKey> to lowercase in the analysis chain and started seeing duplicates. Ya' learn something new every day it seems. I find the things I learn when it's embarrassingly public stick in my head better ;)... I'll add a comment to the schema.xml file(s). On Thu, Feb 5, 2015 at 7:54 PM, Shawn Heisey <[email protected]> wrote: > On 2/5/2015 5:24 PM, Erick Erickson wrote: >> Hmmm, driving away from my client, I got to wondering about routing in >> SolrCloud. You'd have to apply the analysis chain _before_ you routed >> on ID, and I have no clue what would happen with things like the ! >> operator in the id field. > > I didn't even think about SolrCloud. Fun. > >> So to handle my "rule of thumb", which is that anything that a human >> could possibly enter should _not_ be case sensitive, the <uniqueKey> >> field needs to be >> 1> normalized as far as case is concerned at index time >> 2> have a query-time transformation done to match <1>. So something >> like this should do it assuming that >> the indexer took care to uppercase the <uniqueKey>: >> <fieldType name="eoe_test" class="solr.TextField"> >> <analyzer type="index"> >> <tokenizer class="solr.KeywordTokenizerFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.KeywordTokenizerFactory"/> >> <filter class="solr.UpperCaseFilterFactory" /> >> </analyzer> >> </fieldType> > > I realize with what I'm saying below that it is outside "typical user" > land, but it might work. For an advanced user it wouldn't even be all > that messy. Proceeding into "thinking out loud" territory: > > A custom UpdateRequestProcessor could do all the normalization on the > uniqueKey field at index time. If we used that processor in combination > with a fieldType like the one you outlined above, I think it would > work. The simple version of that processor would just be a > case-changing filter. > > Getting back to what a typical user wants to happen ... an update > processor could be included in Solr that figures out the configured > uniqueKey field and lowercases the input on that field. We could > provide documentation showing how to insert it into the default update > chain to allow case-insensitive unique IDs. If somebody needs more > complicated normalization (perhaps they want to use the ICU folding > class instead of Java's built-in lowercase capability, or do some really > wild stuff that's domain-specific), they can write their own processor, > and maybe even their own analysis component for the query side. > > Thanks, > Shawn > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
