Chris Hostetter wrote:
the enumeration is in lexigraphical order, so "Dell" is no where near
"dell" in the enumeration. even if we added a boolean property to Terms
indicating that it's case insensitive Term the "seeking" along that
enumeration would be ... lss optimal ... then it can be now.
Ah, now I understand!
: > > Let's say, for example, you want to find "Dell" (with a capital "D"), near
: > > "computers" (with or without capitals, ie. in any case). The problem is
: > > that
: > > you would need to use a SpanQuery to find terms near each other; but if
: > > the
: > > case-sensitivity required is different for each term, then they will be in
: > > different fields, making the use of SpanQuerys inpossible.
i assume by this statement that you are suggesting that you want your
users to be able to say "find me $foo near $bar where $foo must be in the
case i specified but bar can be in any case" is that correct?
Yes, that's exactly what I meant.
in that case Erick's point about indexing both the orriginal case and
some normalized casing at the same term position is the best way to go --
the only downside this has compared to seperate fields is that it can
introduce some bias in your tf/idf values ... but that can be eliminated
by prefaxing all of your "normalized" terms with some unicode character
that your tokenizer would normally strip off.
From Erick's reply:
"I suppose something like that might work, but I still think that presenting
a user with matches that sometimes work case sensitive and sometimes
doesn't would be...er..fraught."
The user would, of course, choose which terms are case-sensitive when
they query, using a modifier in the query language. (I would have to
implement that). It's something my users have asked to be able to do -
in their view, fields are something that should be used for different
content, and case-sensitivity should be an option on *any* field. But
what you have suggested should allow it to work that way, by adding both
versions of the term at the same position.
Thanks guys!
-John
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]