[GitHub] [lucene-solr] magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

GitBox Tue, 10 Dec 2019 11:56:31 -0800

magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to 
support pre-tokenizer ICU text transformation
URL: https://github.com/apache/lucene-solr/pull/892#issuecomment-564217729
 
 
   > Regarding further automatic performance optimization, I think that is very 
ambitious and maybe too much magic.
   
   Agreed.
   
   The only issue I have with the change you recently posted is the 
randomization of `assumeExternalUnicodeNormalization`. The tests _shouldn't_ 
generally pass with that arg randomly flipped, unless accompanied by a 
complementary, transliterator-specific top-level normalization change. If the 
tests pass with that arg randomized, that's dependent on the specific 
transliterator being used, and I'm concerned that even if the tests happen pass 
as written, randomly flipping that arg in tests conveys the impression that 
that's something one should be able to generally.
   
   Would you object to reverting/removing (from the proposed patch) the 
randomized form of `getTransliterator(String)` in `TestICUTransformCharFilter`? 
Could be replaced with additional test variants that _explicitly_ disable 
bundled normalization (and specifically add corresponding top-level -- i.e., 
`ICUNormalizer2CharFilter` -- normalization if necessary)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] magibney commented on issue #892: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

Reply via email to