W dniu 2013-08-06 18:37, Daniel Naber pisze: > Am 06.08.2013 17:39, schrieb Marcin Miłkowski: > >> (1) dictionaries should not be developed manually but generated from >> some database system (I know we do develop dictionaries manually for >> some languages but this should not be encouraged); > > I agree. Trying lexeme_forge for German is on my TODO list. Please post > tips about this if you can.
lexeme_forge requires quite a bit of manual configuration right now (in particular for patterns of inflection, which are handled by several templates). But I think it's almost usable. > >> (2) text files will inevitably consume more memory than the finite >> state >> dictionary -- and the tagger will be slower; > > The data is put in a HashMap, so it takes memory but it will be fast. > The plain text dictionary shouldn't contain more than a few thousands > words or so. Right. > >> BTW, after reading about all the problems with github, I'm beginning >> to >> be skeptical if this is really worth the trouble. SVN at sf.net works >> most of the time anyway. > > It was a lot of trial'n'error work but all major problems are solved > now. I'm basically just waiting that at least one person clones the code > (https://github.com/danielnaber/languagetool-test) and says that > everything looks fine. What about large binary files, again? Maybe we should use them in some other way? Regards, Marcin > > Regards > Daniel > ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel