On 15-Dec-07, at 3:14 PM, Beyer,Nathan wrote:

I have a few fields that use package names and class names and I've been
looking for some suggestions for analyzing these fields.

A few examples -

Text (class name)
- "org.apache.lucene.document.Document"
Queries that would match
- "org.apache" , "org.apache.lucene.document"

Text (class name + method signature)
-- "org.apache.lucene.document.Document#add(Fieldable)"
Queries that would match
-- "org.apache.lucene", "org.apache.lucene.document.Document#add"

Any thoughts on how to approach tokenizing these types of texts?

Perhaps it would help to include some examples of queries you _don't_ want to match. For all the examples above, simply tokenizing alphanumeric components would suffice.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to