On 15-Dec-07, at 3:14 PM, Beyer,Nathan wrote:
I have a few fields that use package names and class names and I've
been
looking for some suggestions for analyzing these fields.
A few examples -
Text (class name)
- "org.apache.lucene.document.Document"
Queries that would match
- "org.apache" , "org.apache.lucene.document"
Text (class name + method signature)
-- "org.apache.lucene.document.Document#add(Fieldable)"
Queries that would match
-- "org.apache.lucene", "org.apache.lucene.document.Document#add"
Any thoughts on how to approach tokenizing these types of texts?
Perhaps it would help to include some examples of queries you _don't_
want to match. For all the examples above, simply tokenizing
alphanumeric components would suffice.
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]