On 2014-10-17 17:22, Phil Ritchie wrote: Hi Phil,
> * Would we typically have to build a dictionary from scratch? for most languages you'll find something Open Source that you can build on. If there's a hunspell dictionary, you can use that almost directly for spell checking. If there's a list of mappings from inflected forms to theirs base forms and their part-of-speech tag (e.g. "better: good/ADJ"), you can turn that into the format LT needs. > * Is there a certain amount that can be leveraged because it's common > - i.e. tokenization for Latin languages? Yes, tokenization is there, as are rule for common issues like missing spaces after certain punctuation. > * Do all features for a language have to be there at the get go: e.g. > dictionary, POS tagger, tokenizer, ... You can add one by one. Actually, for testing you can start without anything: just remove the rules from the grammar.xml of an existing languages and add your new rules there (obviously choosing a language with similar tokenization). These rules cannot refer to part-of-speech tags, as long as you don't have a part-of-speech dictionary. In case you have huge amounts of language data you might also consider a data-centric approach, as described at http://wiki.languagetool.org/finding-errors-using-big-data > Do you think that a language could be bootstrapped in say, two > full-time weeks (80 hours)? Absolutely. Regards Daniel ------------------------------------------------------------------------------ Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel