Hi Apertiumers, Fran: > El dv 14 de 05 de 2010 a les 13:38 +0200, en/na Mikel L. Forcada va > escriure: > >> (1) "What is the difference between Google Translate and Apertium? >> Google Translate build systems that work but they don't know why, >> whereas in Apertium we build systems that don't work but we know why." >> (a retake of the usual joke on Natural Language Processing and >> Computational Linguistics). >> > > :D > > Perhaps we should get that put on the tshirt ;) > Too long, isn't it? >> (2) I wonder if they are using any of Apertium or Matxin to do some >> morphological preprocessing... >> [The mystery lingers...] >> (3) >> >>>> Yep, especially considering that after talking with Mike Galvez (Google) >>>> and Ofis ar Brezhoneg, I have been sending them data for Breton. Mike >>>> has told me that any data I send them will be returned as TMX. >>>> >>>> It was a hard decision -- making ourselves less relevant one pair at a >>>> time -- but as many people have told me, getting on Google Translate is >>>> a real point of pride for speakers of smaller languages. And the >>>> language should always come first. >>>> >>>> >> Sorry, Fran, but isn't this the same as effectively collaborating with a >> company that does closed-source MT? You won't get the code to their >> system, >> > > I don't expect their system is anything special from a coding point of > view. > I'm sure it is. Efficient "decoding", clever disk storage for "frayze" tables and probabilities, distributed computing, efficient factored models, alignment templates... >> and you won't have access to the enormous amount of corpora they >> have access to. >> That's true. > > Mike has said that he will send me back in TMX format anything that I > send them. Thus, the value added is not having to process and align all > that bilingual text myself -- something I probably would not have time > to do. What are the licensing terms on on those TMX files? Could we distribute them? > In the case of Breton, I doubt whether they have any > substantially more enormous corpora than what we have. > What amazes me is their choice of languages. They do seem to want to cater for small languages (cymraeg, gaeilge, euskara...). Perhaps they are just adding whatever they can get their hands on. > >> So, wouldn't these language communities actually be "taking pride" in >> becoming dependent on Google? Wouldn't these language communities be >> effectively giving up on actually understanding how their languages work >> so that they can build technologies of their own for them? >> > > It's up to them if they want to give up, Them? We are in some way part of these communities, aren't we? I give myself every chance to participate, and you do too, Fran. And one of our motivations is actually to contribute to their success. This is why I am paying 10€ an hour for my ranganna Gaeilge (and painstakingly learning, by the way, with great help from Kevin's Gramadóir and, yes, Google Translate) while I am in Ireland, or I embraced Catalan when I moved to Alacant.
These communities rely also on our judgment, as experts, on what is good or bad for their future in language technologies. And we should try to make sure they understand the issues and the alternatives. We Apertiumers have our share of that responsibility. It sometimes feels like too much responsibility for me: I am always reminded of the parable of the talents (http://en.wikipedia.org/wiki/Parable_of_the_Talents, Matthew 25:14-30 <http://www.biblegateway.com/bible?passage=Matthew%2025:14-30;&version=NIV;>), [remember Gema?]: I might not have been given five talents, but I wouldn't want to be thrown where there is weeping and gnashing of teeth, even if I am not a Christian... I know choices are hard sometimes, and that all the issues here are more or less known to the main people involved, but I think it is a good idea that we stop and think every now and then. > Google sets the benchmark > pretty high. It's a similar challenge that I'd set to linguists -- if > linguistics works, then make better MT. > I'm not sure I understood this correctly. > >> I 'm sure your collaboration with Google is well-meant, but I think we >> should be very careful about the way we facilitate Google's moves >> towards generating translation technology monopolies for small languages. >> > > They were going to do it anyway, there are two main differences, 1) This > way it happens faster -- which doesn't really change much, 2) We get > access to the data as opposed to them hoarding it for themselves. > > I already told you how Google's MT blitzkrieg gets me down, but I don't > really see much other option. Keep the unprocessed data to myself, where > it isn't useful for anyone, or let Google process it and get it back in > a useable form. > The hard choices... > This is probably going to be the main struggle for the next ten years. > If researchers don't come together and pool their resources and efforts > then they'll probably just be picked off one by one. > Scary. Very scary. > Sorry if this email seems too rambling, > It was very clarifying. It made me think - which, in itself, is A Good Thing, so thanks! Mikel > ------------------------------------------------------------------------------ > > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ------------------------------------------------------------------------------ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
