Hoi,
The proper use of language codes is indeed a recurring theme. Calling it a
hobby horse gives the impression that it does not have a real world
application. It does have a real world application and one of the problems
with language is that it is truly hard to recognise  languages confidently.
Suggesting that Google can because of its size is too easy. I am sure they
would have if they could.
Thanks,
      GerardM

2009/6/15 Marcus Buck <m...@marcusbuck.org>

> Gerard Meijssen hett schreven:
> > Hoi,
> > One of the most important things that is needed for adding languages to a
> > technology like this is having a sufficiently sized corpus. For general
> > availability, the expectation for the quality is quite high. To me this
> > seems to be one reason why Google did not add more languages. Another
> reason
> > why many corpora are not big enough is because of the problem of
> identifying
> > a text for the language it is written in. When you consider that a few
> years
> > ago I learned that only a small percentage of Internet content has the
> > metadata for the language that is used.. When you then consider that
> > something like 75% is actually wrong...
> >
> > Given that Google actually supports MediaWiki, it may be that they are
> > willing to support our language. The problem however is that many of our
> > language have illegal and even wrong codes. The consequence is that it is
> > not obvious to just support our "language". This issue will not be
> resolved
> > because people are under the impression that the "community" has the
> final
> > word about the names of our languages. This is naive as well as
> problematic
> > because it prevents the ease of the argument for Google to support our
> > languages..
> > Thanks,
> >       GerardM
> Your old ISO code hobby horse ;-) I guess, if Google wanted to, they
> would be able recognize the languages of our projects. Just like all our
> users do too.
>
> > One of the most important things that is needed for adding languages to a
> > technology like this is having a sufficiently sized corpus.
> Yes, that was basically my main question: What is sufficiently? How much
> pages or MB of text? At least the order of magnitude.
>
> Marcus Buck
>
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to