OK I opened https://issues.apache.org/jira/browse/LUCENE-8265 and will
submit a pr soon

On Sat, Apr 21, 2018 at 3:56 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> +1
>
> Mike
>
> On Fri, Apr 20, 2018, 9:42 AM Michael Sokolov <msoko...@gmail.com> wrote:
>
> > I have a use case that generates some tokens containing punctuation
> > (fractions and other numerical constructs), but I am handling most
> > punctuation with WordDelimiterGraphFilter, which then decomposes those
> > tokens into parts and re-composes, so eg 1/2 becomes {1, 2, 12}. I
> thought
> > at first that I could mark those tokens as keywords to prevent any future
> > analysis, but I discovered WDGF ignores that.
> >
> > I have a workaround using Arabic numerals as separators instead of
> > punctuation (1/2 -> 1١2) -- these are classified as digits, so WDGF does
> > not split on them --, but someday I would like to support Arabic (or
> Hindi)
> > language numbers as well, and then this hack will bite me.
> >
> > Does it seem reasonable to update WDGF (and its cousin WDF) to respect
> > KeywordAttribute? I think it can be done with a very small change.
> >
>

Reply via email to