You probably don't want to use StandardAnalyzer: maybe try WhitespaceAnalyzer, but you'll need to enhance your regex a little to deal with punctuation since WA may give you tokens like:

5106-7922-9469-8422.

"5106-7922-9469-8422"

etc

-Mike

On 12/15/14 3:45 AM, Valentin Popov wrote:
I have a need to find mastercard numbers with regular expression.

I’m using Query query = new RegexpQuery(new Term("body", 
"5{1}<1-5>{1}<0-9>{14}"), RegExp.ALL) to search numbers in email’s body and 
StandardAnalizer used for body indexing. So number like 5106792294698422 will be indexed as it is and all 
mastercard numbers will be on search results, but numbers like 5106 7922 9469 8422 will be indexed as 4 tokens 
5106, 7922, 9469, 8422, simular for 5106-7922-9469-8422.

Any ideas how to find the sequence of numbers with spaces, dashes etc? Maybe 
multiterm regexp search query?


Regards,
Valentin Popov







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to