Hi, I am not a Lucene expert by any means and I am hoping that you all may be able to help me with a little problem i currently have.
I am using the lucene standard analyzer on an address text field. My issue arises when that address contains a comma. For example 87,Green Street The standard analyzer sees the comma as an important interconnecting character and retains the token 87,Green. I presume this is to ensure numeric values (10,000) are correctly maintained. The problem is that individual searches for 87 or green come back as non matching to the token 87,Green. Should the standard analyzer not check the text either side of the comma to ensure they are both numeric in nature and if not split the token 87,Green into 87 and Green. I can wrap up the standard analyzer and process the tokens generated to create the above effect but was wondering if the issue above was the standard analyzer 'working as intended'. Many thanks -- View this message in context: http://www.nabble.com/Issue-with-Tokenising-with-Standard-Analyzer-and-comma%27s-tp26010057p26010057.html Sent from the Lucene - General mailing list archive at Nabble.com.
