Re: derive tokens from single token

Erik Hatcher Mon, 29 Sep 2003 08:46:35 -0700

On Monday, September 29, 2003, at 11:01 AM, Hackl, Rene wrote:

except that you'll be indexing a ton of
terms I'd guess.  If there is some other way to split these words by
separating by prefix ("hexa", "hepta") and suffix ("alene", "alin") it
would likely be better.  But maybe its not practical to do so.
There'll be at least two indexes, one "normal" one and another bloated one. Dan suggested splitting, too, but, unfortunately, if users search for e.g.

"9-Oxabicyclo[3.3.1]nona-2,6-diene"

they don't want anything else than that substance, as opposed to

"*-Oxabicyclo[3.3.1]nona*"

where they'd be interested in substances from that family - whatever the numbers are.

But consider the same type of thing like a phrase query. If two documents are indexed with a field containing "a b c" and "x b y"

If searching for "b" is done, both documents are returned. If searching for "a b" is done, then only the first document is returned. So I think with a domain aware analyzer, you might be able to split things up into separate terms and then on the querying side of things the same type of analysis would be done Certainly not a trivial thing, and maybe not even the right approach, but it seems that intelligent analysis can make things a lot easier on users and performance for searching. Maybe? Just food for thought.

If you're interested, once I've some hard performance results at hand, I could post them around.

Definitely interested!

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: derive tokens from single token

Reply via email to