[ 
https://issues.apache.org/jira/browse/LUCENE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved LUCENE-5564.
------------------------------------

    Resolution: Invalid

First of all, please raise this kind of issue on the user's list first so 
others have a chance to comment and you have some assurance that what you 
expect is a good thing.

In this case, the analyzer isn't much good if it can't compare numbers for 
currency. If it has the Euro or US dollar sign attached, it isn't a number any 
more, and it's compared lexically. So, for instance, the sort order (and this 
affects range queries etc) for $100 and $20 would sort (ascending) as
$100
$20

which is clearly wrong. The symbol _will_ be _stored_ if you set things to 
stored, so you can get it back, it just won't be part of the token in the index.

What is it you really want? This seems like an XY problem; you're asking for a 
solution without clearly defining the problem.

Feel free to reopen this if, through discussion on the user's list, you truly 
find that this behavior is unexpected.

> Currency characters are not tokenized
> -------------------------------------
>
>                 Key: LUCENE-5564
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5564
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.6.2
>            Reporter: Jerome Lanneluc
>
> It is not possible to have the SmartChineseAnalyzer(nor the StandardAnalyzer) 
> include the currency characters (e.g $ or €) in the token stream.
> For example, the following will output 100 200. I would expect a way to 
> configure the analyzers to output 100$ 200€ instead.
> import java.io.StringReader;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
> import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
> import org.apache.lucene.util.Version;
> public class Test {
>       public static void main(String[] args) throws Exception {
>               Analyzer analyzer = new 
> SmartChineseAnalyzer(Version.LUCENE_36); //new 
> StandardAnalyzer(Version.LUCENE_36);
>               TokenStream stream = analyzer.tokenStream(null, new 
> StringReader("100$ 200€"));
>               while (stream.incrementToken()) {
>                       CharTermAttribute attr = 
> stream.getAttribute(CharTermAttribute.class);
>                       System.out.print(new String(attr.buffer(), 0, 
> attr.length()));
>                       System.out.print(' ');
>               }
>       }
> }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to