Terry Steichen wrote: > PS: Is this kind of thing (and more importantly, any other similar > design issues) documented any place?
This one is described in the source code, with the comment: // floating point, serial, model numbers, ip addresses, etc. // every other segment must have at least one digit
According to:PSS: What is the simplest way to alter this behavior to one that parses the same regardless of the presence or absence of numeric characters?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/standard/StandardTokenizer.html
"Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer."
You need to copy StandardTokenizer.jj, change its package statement, add some import statements, add a JavaCC task to your build.xml, and, finally, modify the clause following the above comment.
Doug
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
