W dniu 2013-10-20 11:16, Daniel Naber pisze:
> On 2013-10-19 21:00, Daniel Naber wrote:
>
>> When I re-generate the dict files with the new Java class
>> (POSDictionaryBuilder), the result has its order changed, and some
>> French disambiguation tests seem to break because of that. Didn't we
>> have that problem before? How did we solve it?
>
> Another issue: what's special about the Polish synth dictionary? When I
> build it using SynthDictionaryBuilderTest (which exports polish.dict and
> builds a new synth dict from that), I get a dict file that's 36MB large.
> I guess there's something to consider that's not mentioned on
> http://wiki.languagetool.org/developing-a-tagger-dictionary#toc8?

Not really. I use cfsa2 encoding and exclude some tags (including 
negated forms, which are all regular and are added by the Polish 
synthesizer based on a simple rule).

See what I wrote on our wiki:

"Note: it might be helpful to remove all forms from the synthesizer dict 
where POS tags indicate "unknown form", "foreign word" etc., as they 
only take space. Probably nobody will ever use them. It is also 
advisable to remove all archaic forms of main verbs (see English 
src/main/resources/org/languagetool/resource/en/filter-archaic.txt) for 
an example what you might want to exclude."

I exclude negated forms, all depreciative forms (they are mostly 
archaic) and a category of non-inflected words, and it works pretty 
well. You might want to add a parameter to your builder to include 
filtering on POS tags.

Regards,
Marcin

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to