+1 Mike
On Fri, Apr 20, 2018, 9:42 AM Michael Sokolov <msoko...@gmail.com> wrote: > I have a use case that generates some tokens containing punctuation > (fractions and other numerical constructs), but I am handling most > punctuation with WordDelimiterGraphFilter, which then decomposes those > tokens into parts and re-composes, so eg 1/2 becomes {1, 2, 12}. I thought > at first that I could mark those tokens as keywords to prevent any future > analysis, but I discovered WDGF ignores that. > > I have a workaround using Arabic numerals as separators instead of > punctuation (1/2 -> 1١2) -- these are classified as digits, so WDGF does > not split on them --, but someday I would like to support Arabic (or Hindi) > language numbers as well, and then this hack will bite me. > > Does it seem reasonable to update WDGF (and its cousin WDF) to respect > KeywordAttribute? I think it can be done with a very small change. >