Re: Are docs updated based on comparing the id before analysis?

Erick Erickson Fri, 06 Feb 2015 04:50:04 -0800

bq: I didn't even think about SolrCloud

Me neither until I was driving away...


Something like that might work, but my personal feeling
here is that we're getting into a complex solution for something
that people are solving so far, I was just surprised by
the behavior b/c I hadn't really thought it through, changed the
<uniqueKey> to lowercase in the analysis chain and started
seeing duplicates.

Ya' learn something new every day it seems. I find the things
I learn when it's embarrassingly public stick in my head better ;)...

I'll add a comment to the schema.xml file(s).

On Thu, Feb 5, 2015 at 7:54 PM, Shawn Heisey <[email protected]> wrote:
> On 2/5/2015 5:24 PM, Erick Erickson wrote:
>> Hmmm, driving away from my client, I got to wondering about routing in
>> SolrCloud. You'd have to apply the analysis chain _before_ you routed
>> on ID, and I have no clue what would happen with things like the !
>> operator in the id field.
>
> I didn't even think about SolrCloud.  Fun.
>
>> So to handle my "rule of thumb", which is that anything that a human
>> could possibly enter should _not_ be case sensitive, the <uniqueKey>
>> field needs to be
>> 1> normalized as far as case is concerned at index time
>> 2> have a query-time transformation done to match <1>. So something
>> like this should do it assuming that
>>     the indexer took care to uppercase the <uniqueKey>:
>>     <fieldType name="eoe_test" class="solr.TextField">
>>       <analyzer type="index">
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>       </analyzer>
>>      <analyzer type="query">
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.UpperCaseFilterFactory" />
>>       </analyzer>
>>     </fieldType>
>
> I realize with what I'm saying below that it is outside "typical user"
> land, but it might work.  For an advanced user it wouldn't even be all
> that messy.  Proceeding into "thinking out loud" territory:
>
> A custom UpdateRequestProcessor could do all the normalization on the
> uniqueKey field at index time.  If we used that processor in combination
> with a fieldType like the one you outlined above, I think it would
> work.  The simple version of that processor would just be a
> case-changing filter.
>
> Getting back to what a typical user wants to happen ... an update
> processor could be included in Solr that figures out the configured
> uniqueKey field and lowercases the input on that field.  We could
> provide documentation showing how to insert it into the default update
> chain to allow case-insensitive unique IDs.  If somebody needs more
> complicated normalization (perhaps they want to use the ICU folding
> class instead of Java's built-in lowercase capability, or do some really
> wild stuff that's domain-specific), they can write their own processor,
> and maybe even their own analysis component for the query side.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Are docs updated based on comparing the id before analysis?

Reply via email to