W dniu 2013-08-06 18:37, Daniel Naber pisze:
> Am 06.08.2013 17:39, schrieb Marcin Miłkowski:
>
>> (1) dictionaries should not be developed manually but generated from
>> some database system (I know we do develop dictionaries manually for
>> some languages but this should not be encouraged);
>
> I agree. Trying lexeme_forge for German is on my TODO list. Please post
> tips about this if you can.

lexeme_forge requires quite a bit of manual configuration right now (in 
particular for patterns of inflection, which are handled by several 
templates). But I think it's almost usable.

>
>> (2) text files will inevitably consume more memory than the finite
>> state
>> dictionary -- and the tagger will be slower;
>
> The data is put in a HashMap, so it takes memory but it will be fast.
> The plain text dictionary shouldn't contain more than a few thousands
> words or so.

Right.

>
>> BTW, after reading about all the problems with github, I'm beginning
>> to
>> be skeptical if this is really worth the trouble. SVN at sf.net works
>> most of the time anyway.
>
> It was a lot of trial'n'error work but all major problems are solved
> now. I'm basically just waiting that at least one person clones the code
> (https://github.com/danielnaber/languagetool-test) and says that
> everything looks fine.

What about large binary files, again? Maybe we should use them in some 
other way?

Regards,
Marcin

>
> Regards
>    Daniel
>


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to