except that you'll be indexing a ton of terms I'd guess. If there is some other way to split these words by separating by prefix ("hexa", "hepta") and suffix ("alene", "alin") it would likely be better. But maybe its not practical to do so.
There'll be at least two indexes, one "normal" one and another bloated one.
Dan suggested splitting, too, but, unfortunately, if users search for e.g.
"9-Oxabicyclo[3.3.1]nona-2,6-diene"
they don't want anything else than that substance, as opposed to
"*-Oxabicyclo[3.3.1]nona*"
where they'd be interested in substances from that family - whatever the
numbers are.
But consider the same type of thing like a phrase query. If two documents are indexed with a field containing "a b c" and "x b y"
If searching for "b" is done, both documents are returned. If searching for "a b" is done, then only the first document is returned. So I think with a domain aware analyzer, you might be able to split things up into separate terms and then on the querying side of things the same type of analysis would be done Certainly not a trivial thing, and maybe not even the right approach, but it seems that intelligent analysis can make things a lot easier on users and performance for searching. Maybe? Just food for thought.
If you're interested, once I've some hard performance results at hand, I
could post them around.
Definitely interested!
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
