Gerard Meijssen hett schreven:
> Hoi,
> The proper use of language codes is indeed a recurring theme. Calling it a
> hobby horse gives the impression that it does not have a real world
> application. It does have a real world application and one of the problems
> with language is that it is truly hard to recognise  languages confidently.
> Suggesting that Google can because of its size is too easy. I am sure they
> would have if they could.
> Thanks,
>       GerardM
>   
Let's assume Google wants to build an Alemannic translation tool. They 
are searching for an Alemannic text corpus. Will they fail to find the 
Alemannic Wikipedia cause 'als' stands for a form of Albanian? I don't 
think so.

Don't understand me wrong, I am _pro_ the use of correct codes and I 
would reject the opinion, that projects have the right to decide to 
stick to a wrong code. But I also reject to switch projects to codes 
that don't match the project ('gsw' for example is no proper substitute 
for 'als') and I reject code switches that do harm to the projects (that 
means that the old code has to be a redirect to the new code at least 
for several years).
And most importantly I think, that the question of ISO codes is not 
related to Google's operations. If Google wants to use Wikipedia content 
to improve their tools it should be really easy for them to do the code 
mapping (e.g. 'no'->'nb').


So does anybody know how big a corpus must be to be helpful to Google?

Marcus Buck

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to