Gerard Meijssen hett schreven: > Hoi, > One of the most important things that is needed for adding languages to a > technology like this is having a sufficiently sized corpus. For general > availability, the expectation for the quality is quite high. To me this > seems to be one reason why Google did not add more languages. Another reason > why many corpora are not big enough is because of the problem of identifying > a text for the language it is written in. When you consider that a few years > ago I learned that only a small percentage of Internet content has the > metadata for the language that is used.. When you then consider that > something like 75% is actually wrong... > > Given that Google actually supports MediaWiki, it may be that they are > willing to support our language. The problem however is that many of our > language have illegal and even wrong codes. The consequence is that it is > not obvious to just support our "language". This issue will not be resolved > because people are under the impression that the "community" has the final > word about the names of our languages. This is naive as well as problematic > because it prevents the ease of the argument for Google to support our > languages.. > Thanks, > GerardM Your old ISO code hobby horse ;-) I guess, if Google wanted to, they would be able recognize the languages of our projects. Just like all our users do too.
> One of the most important things that is needed for adding languages to a > technology like this is having a sufficiently sized corpus. Yes, that was basically my main question: What is sufficiently? How much pages or MB of text? At least the order of magnitude. Marcus Buck _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l