[ 
https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743931#action_12743931
 ] 

Robert Muir commented on LUCENE-1813:
-------------------------------------

another issue, besides the fact they are deprecated, is that tag characters are 
outside of the BMP. 

Currently, the reverse filter does not properly reverse characters outside of 
the BMP [it does not recognize them as one character], 
This means characters such as tag characters will be 'reversed' into trail 
surrogate followed by lead surrogate (two unpaired surrogates).
But we cannot fix the above, as lucene wildcard support does not recognize 
codepoints > FFFF as one 'character' either.

If we are gonna pick a character other than U+0001, it needs to be inside the 
BMP.

> Add option to ReverseStringFilter to mark reversed tokens
> ---------------------------------------------------------
>
>                 Key: LUCENE-1813
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1813
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>             Fix For: 2.9
>
>         Attachments: reverseMark-2.patch, reverseMark.patch
>
>
> This patch implements additional functionality in the filter to "mark" 
> reversed tokens with a special marker character (Unicode 0001). This is 
> useful when indexing both straight and reversed tokens (e.g. to implement 
> efficient leading wildcards search).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to