+1

Mike

On Fri, Apr 20, 2018, 9:42 AM Michael Sokolov <msoko...@gmail.com> wrote:

> I have a use case that generates some tokens containing punctuation
> (fractions and other numerical constructs), but I am handling most
> punctuation with WordDelimiterGraphFilter, which then decomposes those
> tokens into parts and re-composes, so eg 1/2 becomes {1, 2, 12}. I thought
> at first that I could mark those tokens as keywords to prevent any future
> analysis, but I discovered WDGF ignores that.
>
> I have a workaround using Arabic numerals as separators instead of
> punctuation (1/2 -> 1١2) -- these are classified as digits, so WDGF does
> not split on them --, but someday I would like to support Arabic (or Hindi)
> language numbers as well, and then this hack will bite me.
>
> Does it seem reasonable to update WDGF (and its cousin WDF) to respect
> KeywordAttribute? I think it can be done with a very small change.
>

Reply via email to