On 6/21/13 11:18 AM, Uwe Schindler wrote:
You may also be interested in this talk @ BerlinBuzzwords2013: 
http://intrafind.de/tl_files/documents/INTRAFIND_BerlinBuzzwords2013_The-Typed-Index.pdf

Unfortunately the slides are not available.

Uwe

I've been wondering why we seem to handle case- and diacritic-normalization (among other things, like stemming) using multiple fields when really it would be more compact to index normalized terms in the same position as their base term in a single field. The missing piece of course is how to exclude the normalized terms when you want to. IE - it would be great to have a single text field with terms reflecting a variety of different analysis options, plus the ability to search the terms selectively (by type) at query time, so that you could do (say) a case-sensitive, unstemmed query using the same field as a case-insensitive stemmed query, and even intermingle such query terms in a single query with a positional (NEAR, or phrase) relationship. Wouldn't that be nice?

It sounds like that might be the topic of that paper? I would be interested in the proposed solution, but perhaps it is proprietary? I guess payloads are the only place such type information can be stored, although I'm fuzzy on that. I wonder if anyone has contributed such a thing to Lucene?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to