[
https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744092#action_12744092
]
DM Smith commented on LUCENE-1813:
----------------------------------
I like the idea of a constant and it presented as a default. I suggest that
others be given in the JavaDoc.
I have some texts which are using PUAs until Unicode includes the code points
(e.g. Myanmar text), so I'm glad that allowing a choice doesn't create a
potential conflict there. I think PUA should be left to the text author.
As my texts are all derived from XML, I like the use of a character that is not
allowed in XML. I think 0001 is just fine, even if not from a purity
perspective.
Some of my texts have BIDI markers and while these will be stripped by filters,
I don't think this use is analogous.
> Add option to ReverseStringFilter to mark reversed tokens
> ---------------------------------------------------------
>
> Key: LUCENE-1813
> URL: https://issues.apache.org/jira/browse/LUCENE-1813
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9
> Reporter: Andrzej Bialecki
> Assignee: Robert Muir
> Fix For: 2.9
>
> Attachments: LUCENE-1813.patch, reverseMark-2.patch, reverseMark.patch
>
>
> This patch implements additional functionality in the filter to "mark"
> reversed tokens with a special marker character (Unicode 0001). This is
> useful when indexing both straight and reversed tokens (e.g. to implement
> efficient leading wildcards search).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]