We have been generating spelling dictionaries using all the "words" in 
our data; this enables us to capture names, for example.  But when the 
dictionary gets large, spell:suggest seems to slow down a lot.  We've 
noticed this with a dictionary containing approx. 2M+ words (yes there 
is a lot of junk in there).  My questions:

1) is this expected?

2) I've considered limiting the dictionary size by using only words that 
occur more than N times.  But the way we have been building our 
dictionary is:

xdmp:document-insert("/spelling/spelling-dictionary.xml",
   spell:make-dictionary(cts:field-words("body")),
   xdmp:default-permissions(),
   ("http://marklogic.com/xdmp/documents";,
    "http://marklogic.com/xdmp/spell";))

and word-lexicons don't include any frequency information.  There must 
be frequency info stored somewhere in ML in order for it to be able to 
make its relevance calculations.  Is that exposed anywhere in the API?  
Is there some other approach that would work here?  A ready-made 
dictionary crafting module perhaps?

-- 
Michael Sokolov
Engineering Director
www.ifactory.com

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to