Gerard Meijssen hett schreven:
> Hoi,
> One of the most important things that is needed for adding languages to a
> technology like this is having a sufficiently sized corpus. For general
> availability, the expectation for the quality is quite high. To me this
> seems to be one reason why Google did not add more languages. Another reason
> why many corpora are not big enough is because of the problem of identifying
> a text for the language it is written in. When you consider that a few years
> ago I learned that only a small percentage of Internet content has the
> metadata for the language that is used.. When you then consider that
> something like 75% is actually wrong...
>
> Given that Google actually supports MediaWiki, it may be that they are
> willing to support our language. The problem however is that many of our
> language have illegal and even wrong codes. The consequence is that it is
> not obvious to just support our "language". This issue will not be resolved
> because people are under the impression that the "community" has the final
> word about the names of our languages. This is naive as well as problematic
> because it prevents the ease of the argument for Google to support our
> languages..
> Thanks,
>       GerardM
Your old ISO code hobby horse ;-) I guess, if Google wanted to, they 
would be able recognize the languages of our projects. Just like all our 
users do too.

> One of the most important things that is needed for adding languages to a
> technology like this is having a sufficiently sized corpus.
Yes, that was basically my main question: What is sufficiently? How much 
pages or MB of text? At least the order of magnitude.

Marcus Buck

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to