Hi Michael On 1 February 2011 01:17, Michael Meeks <michael.me...@novell.com> wrote: > Hi Steve,
> Sure - so; in response to user input I suspect we can take a second to > parse the thesaurus; we have around 20Mb of text to load for en_US; > perhaps 32Mb is a reasonable upper-bound; it does seem a lot to parse so > quickly. Where it will hurt is if it is not in cache and the user has some background task running that hits the disk. An example might be on Windows with virus scanning (or viruses :) ). > Right. I think we could easily serialize a small skip-list to disk too > - if we simply store ~8 or ~32 or so indexes into the data - we can > parse only a fraction of it, and pop that in our home directory. We > could also drop the MyThes code too as a depedency to manage. I'm not sure what you mean by a skip list unless you simply mean a similar file to the existing .idx, or just a list of offsets for where the words are to skip loading the whole file. The trouble with that approach is the readahead will likely pull in the whole file anyway as the words aren't generally _that_ far apart in it, so you'll still do all the IO and just skip a bit of the CPU time. > > The code using it is in: > > lingucomponent/source/thesaurus/libnth/nthesimp.cxx > >> BTW, if I did that I'd probably do some major surgery on mythes and >> just use STL because it basically is doing C style memory management >> and processing and I think I would screw it up if I started messing >> with it. The only problem with simplifying it with STL constructs is >> that I would want to change the interface (string vs char *), maybe >> use STL vectors for the list of synonyms, etc. > > Heh; sure. I've cooled off on this a bit as performance is slower when using lots of strings etc. I was able to change the approach to loading the idx to treat it as a big buffer and sped it up considerably too. This did mean resorting to lots of pointer tomfoolery but it is easy to cleanup as there are only 3 allocations instead of 100k+ worth. > I guess we could re-write it inside lingucomponent then (?) but we > should prolly get a better understanding of how frequently this code is > called first - is it hooked into from the spell checking code ? or is it > really just the Tools->Language->Thesaurus ? It's actually hooked into the right click menu (probably amongst other things). The first time you right click on a word, the dictionary for the current locale is loaded before the right click menu shows up. After that, it uses the cached thesaurus dictionary for subsequent lookups. If you look in your right-click menu, you'll notice a thesaurus list of synonyms shows up (assuming the word is found) :). Regards, Steven Butler _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice