OK I opened https://issues.apache.org/jira/browse/LUCENE-8265 and will submit a pr soon
On Sat, Apr 21, 2018 at 3:56 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > +1 > > Mike > > On Fri, Apr 20, 2018, 9:42 AM Michael Sokolov <msoko...@gmail.com> wrote: > > > I have a use case that generates some tokens containing punctuation > > (fractions and other numerical constructs), but I am handling most > > punctuation with WordDelimiterGraphFilter, which then decomposes those > > tokens into parts and re-composes, so eg 1/2 becomes {1, 2, 12}. I > thought > > at first that I could mark those tokens as keywords to prevent any future > > analysis, but I discovered WDGF ignores that. > > > > I have a workaround using Arabic numerals as separators instead of > > punctuation (1/2 -> 1١2) -- these are classified as digits, so WDGF does > > not split on them --, but someday I would like to support Arabic (or > Hindi) > > language numbers as well, and then this hack will bite me. > > > > Does it seem reasonable to update WDGF (and its cousin WDF) to respect > > KeywordAttribute? I think it can be done with a very small change. > > >