Hi Steve, On Sat, 2011-01-29 at 21:45 +1000, Steve Butler wrote: > I haven't had a look at this yet as I thought getting a script to > analyze the existing thesaurus files would be helpful to get those > errors looked at.
Nice work with that :-) > I thought I would discuss your idea about not using the index at all > to see what reception it gets, but I think you may also have been > suggesting a similar thing: are the index files even useful on modern gear? I suspect the index files are mostly useless (personally). > I can populate the en_US index in memory from the .dat file with the > C++ code in 0.287 s after dropping all cache, and 0.188s when the > cache is hot. Sure - so; in response to user input I suspect we can take a second to parse the thesaurus; we have around 20Mb of text to load for en_US; perhaps 32Mb is a reasonable upper-bound; it does seem a lot to parse so quickly. > I do admit that my desktop is pretty quick though, with 4 cores, SATA > II drives etc. Sure - but it will only use one of these ;-) > If the thesaurus is only loaded when the user pops it up, then > couldn't mythes be taught to generate its own in-memory index > from the dictionary and not bother with an index file at all? Right. I think we could easily serialize a small skip-list to disk too - if we simply store ~8 or ~32 or so indexes into the data - we can parse only a fraction of it, and pop that in our home directory. We could also drop the MyThes code too as a depedency to manage. The code using it is in: lingucomponent/source/thesaurus/libnth/nthesimp.cxx > BTW, if I did that I'd probably do some major surgery on mythes and > just use STL because it basically is doing C style memory management > and processing and I think I would screw it up if I started messing > with it. The only problem with simplifying it with STL constructs is > that I would want to change the interface (string vs char *), maybe > use STL vectors for the list of synonyms, etc. Heh; sure. > By this stage it's not looking much like mythes anymore ... I guess we could re-write it inside lingucomponent then (?) but we should prolly get a better understanding of how frequently this code is called first - is it hooked into from the spell checking code ? or is it really just the Tools->Language->Thesaurus ? Thanks ! Michael. -- michael.me...@novell.com <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice