In an earlier version, specifying your own fragmentation for the
thesaurus caused it not to work at all. I suspect that has since been
fixed, but it does seem to indicate you don't need to do that. We've
worked with a 1.7M thesaurus that has 14000 terms or so and had good
performance, but I think the performance will depend to some extent on
the branching factor: if any term maps to a very large number of
synonyms, then queries involving that term will tend to become very
large, and their performance might suffer. Not sure, since we never
really ran into that case in practice. In our case (largish number of
terms, each mapping to a few synonyms), all went well.
-Mike
On 01/04/2010 08:05 AM, Lee, David wrote:
I'm thinking of experimenting with the Thesaurus and spelling
dictionary features of ML.
Before I embark, is there any advise about performance and size issues ?
Should I setup fragmentation rules myself or just let the system
handle them ?
I am looking at adding about 10,000 terms to the thesaurus and
dictionary ... is this too much ?
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected] <mailto:[email protected]>
812-482-5224
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general