W dniu 2013-10-20 11:16, Daniel Naber pisze: > On 2013-10-19 21:00, Daniel Naber wrote: > >> When I re-generate the dict files with the new Java class >> (POSDictionaryBuilder), the result has its order changed, and some >> French disambiguation tests seem to break because of that. Didn't we >> have that problem before? How did we solve it? > > Another issue: what's special about the Polish synth dictionary? When I > build it using SynthDictionaryBuilderTest (which exports polish.dict and > builds a new synth dict from that), I get a dict file that's 36MB large. > I guess there's something to consider that's not mentioned on > http://wiki.languagetool.org/developing-a-tagger-dictionary#toc8?
Not really. I use cfsa2 encoding and exclude some tags (including negated forms, which are all regular and are added by the Polish synthesizer based on a simple rule). See what I wrote on our wiki: "Note: it might be helpful to remove all forms from the synthesizer dict where POS tags indicate "unknown form", "foreign word" etc., as they only take space. Probably nobody will ever use them. It is also advisable to remove all archaic forms of main verbs (see English src/main/resources/org/languagetool/resource/en/filter-archaic.txt) for an example what you might want to exclude." I exclude negated forms, all depreciative forms (they are mostly archaic) and a category of non-inflected words, and it works pretty well. You might want to add a parameter to your builder to include filtering on POS tags. Regards, Marcin ------------------------------------------------------------------------------ October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
