On 6/21/13 11:18 AM, Uwe Schindler wrote:
You may also be interested in this talk @ BerlinBuzzwords2013:
http://intrafind.de/tl_files/documents/INTRAFIND_BerlinBuzzwords2013_The-Typed-Index.pdf
Unfortunately the slides are not available.
Uwe
I've been wondering why we seem to handle case- and
diacritic-normalization (among other things, like stemming) using
multiple fields when really it would be more compact to index normalized
terms in the same position as their base term in a single field. The
missing piece of course is how to exclude the normalized terms when you
want to. IE - it would be great to have a single text field with terms
reflecting a variety of different analysis options, plus the ability to
search the terms selectively (by type) at query time, so that you could
do (say) a case-sensitive, unstemmed query using the same field as a
case-insensitive stemmed query, and even intermingle such query terms in
a single query with a positional (NEAR, or phrase) relationship.
Wouldn't that be nice?
It sounds like that might be the topic of that paper? I would be
interested in the proposed solution, but perhaps it is proprietary? I
guess payloads are the only place such type information can be stored,
although I'm fuzzy on that. I wonder if anyone has contributed such a
thing to Lucene?
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org