What is #LETTER definition in SnardarTokernize.jj?
I saw:
| <#P: ("_"|"-"|"/"|"."|",") >
| <#HAS_DIGIT: // at least one digit
(<LETTER>|<DIGIT>)*
<DIGIT>
(<LETTER>|<DIGIT>)*
>
Should I remove "_" and recompile the source code?
Sincerely,
Anh Ngo
-----Original Message-----
From: Daniel Naber [mailto:[EMAIL PROTECTED]
Sent: Friday, July 21, 2006 2:49 PM
To: [email protected]
Subject: Re: StandardAnalyzer question
On Freitag 21 Juli 2006 16:16, Ngo, Anh (ISS Southfield) wrote:
> The lucene 2.0.0 StandardAnalyzer does treat the "_"(underscore) as a
> token. Is there a way I can make StandardAnalyzer don't tokenize for
> "_" or any given characters?
You need to add "_" to the #LETTER definition in StandardTokenizer.jj, then
rebuild StandardTokenizer.java using the appropriate and task.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]