In an earlier version, specifying your own fragmentation for the thesaurus caused it not to work at all. I suspect that has since been fixed, but it does seem to indicate you don't need to do that. We've worked with a 1.7M thesaurus that has 14000 terms or so and had good performance, but I think the performance will depend to some extent on the branching factor: if any term maps to a very large number of synonyms, then queries involving that term will tend to become very large, and their performance might suffer. Not sure, since we never really ran into that case in practice. In our case (largish number of terms, each mapping to a few synonyms), all went well.

-Mike

On 01/04/2010 08:05 AM, Lee, David wrote:

I'm thinking of experimenting with the Thesaurus and spelling dictionary features of ML.

Before I embark, is there any advise about performance and size issues ?
Should I setup fragmentation rules myself or just let the system handle them ?


I am looking at adding about 10,000 terms to the thesaurus and dictionary ... is this too much ?

----------------------------------------

David A. Lee

Senior Principal Software Engineer

Epocrates, Inc.

[email protected] <mailto:[email protected]>

812-482-5224


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to