I'm using Lucene to search address data, and came across an interesting case
where StandardAnalyzer appears not to remove punctuation (a comma). To
illustrate, the following code snippet uses StandardAnalyzer to analyze an
address, printing out each analyzed token. 
 The output of the code snippet is: 
If the code is altered slightly so the String text is initialized as
follows:
 (there's a space between the first comma and the building number) then the
output is as follows:I would expect the output to be the same in both cases
based on my understanding. Is this a known issue? Or am I off on my
understanding? It's not a biggie. It caught my attention because I have a
unit test that asserts token text is all lower case or alphanumeric. It can
be easily got around, but I thought it worth posting about.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Case-where-StandardAnalyzer-doesn-t-remove-punctuation-tp3848460p3848460.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to