On 2014-10-17 17:22, Phil Ritchie wrote:

Hi Phil,

>       * ​Would we typically have to build a dictionary from scratch?

for most languages you'll find something Open Source that you can build 
on. If there's a hunspell dictionary, you can use that almost directly 
for spell checking. If there's a list of mappings from inflected forms 
to theirs base forms and their part-of-speech tag (e.g. "better: 
good/ADJ"), you can turn that into the format LT needs.

>       * Is there a certain amount that can be leveraged because it's common
> - i.e. tokenization for Latin languages?

Yes, tokenization is there, as are rule for common issues like missing 
spaces after certain punctuation.

>       * Do all features for a language have to be there at the get go: e.g.
> dictionary, POS tagger, tokenizer, ...

You can add one by one. Actually, for testing you can start without 
anything: just remove the rules from the grammar.xml of an existing 
languages and add your new rules there (obviously choosing a language 
with similar tokenization). These rules cannot refer to part-of-speech 
tags, as long as you don't have a part-of-speech dictionary.

In case you have huge amounts of language data you might also consider a 
data-centric approach, as described at 
http://wiki.languagetool.org/finding-errors-using-big-data

> ​Do you think that a language could be bootstrapped in say, two
> full-time weeks (80 hours)?

Absolutely.

Regards
  Daniel


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to