[ 
https://issues.apache.org/jira/browse/LUCENE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5824:
--------------------------------

    Attachment: LUCENE-5824.patch

Simple patch and test to encode as A << 8 + B (and also check the values are 
really within range: they should be two ascii characters). 

This bug currently impacts the more complicated dictionaries using this 
encoding type (russian, arabic, hebrew, etc)

> hunspell FLAG LONG implemented incorrectly
> ------------------------------------------
>
>                 Key: LUCENE-5824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5824
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-5824.patch
>
>
> If you have more than 256 flags, you run out of 8-bit characters, so you have 
> to use another flag type to get 64k:
> * UTF-8: 16-bit BMP flags
> * long: two-character flags like 'AB'
> * num: decimal numbers like '10234'
> But our implementation for 'long' is wrong, it encodes as 'A+B', which means 
> it cant distinguish between 'AB' and 'BA' and causes overgeneration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to