Robert Muir created LUCENE-5517:
-----------------------------------

             Summary: stricter parsing for hunspell parseFlag()
                 Key: LUCENE-5517
                 URL: https://issues.apache.org/jira/browse/LUCENE-5517
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/analysis
            Reporter: Robert Muir
         Attachments: LUCENE-5517.patch

I was trying to debug why a hunspell dictionary (an updated version fixes the 
bug!) used so much ram, and the reason is the dictionary was buggy and didnt 
have FLAG NUM (so each digit was treated as its own flag, leading to chaos).

In many situations in the hunspell file (e.g. affix rule), the flag should only 
be a single one. But today we don't detect this, we just take the first one.

We should throw exception here: in most cases hunspell itself is doing this for 
the impacted dictionaries. In these cases the dictionary is buggy and in some 
cases you do in fact get an error from hunspell commandline. We should throw 
exception instead of emitting chaos...




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to