Re: [Apertium-stuff] Google Translate: Basque ALPHA

Jimmy O'Regan Fri, 14 May 2010 08:42:49 -0700

On 14 May 2010 14:14, Mikel L. Forcada <[email protected]> wrote:
> Hi Apertiumers, Fran:
>> El dv 14 de 05 de 2010 a les 13:38 +0200, en/na Mikel L. Forcada va
>> escriure:
>>
>>> (1) "What is the difference between Google Translate and Apertium?
>>> Google Translate build systems that work but they don't know why,
>>> whereas in Apertium we build systems that don't work but we know why."
>>> (a retake of the usual joke on Natural Language Processing and
>>> Computational Linguistics).
>>>
>>
>> :D
>>
>> Perhaps we should get that put on the tshirt ;)
>>
> Too long, isn't it?
>>> (2) I wonder if they are using any of Apertium or Matxin to do some
>>> morphological preprocessing...
>>>
> [The mystery lingers...]
>>> (3)
>>>
>>>>> Yep, especially considering that after talking with Mike Galvez (Google)
>>>>> and Ofis ar Brezhoneg, I have been sending them data for Breton. Mike
>>>>> has told me that any data I send them will be returned as TMX.
>>>>>
>>>>> It was a hard decision -- making ourselves less relevant one pair at a
>>>>> time -- but as many people have told me, getting on Google Translate is
>>>>> a real point of pride for speakers of smaller languages. And the
>>>>> language should always come first.
>>>>>
>>>>>
>>> Sorry, Fran, but isn't this the same as effectively collaborating with a
>>> company that does closed-source MT? You won't get the code to their
>>> system,
>>>
>>
>> I don't expect their system is anything special from a coding point of
>> view.
>>
> I'm sure it is. Efficient "decoding", clever disk storage for "frayze"
> tables and probabilities, distributed computing, efficient factored
> models, alignment templates...


Well, most of those things come down to being Map/Reduce, GFS and/or
BigTable-based implementations of the usual stuff... and those are two
of the big proprietary pieces of Google's infrastructure. The best you
could hope for is that they publish a paper and someone else
implements an open source version on top of, say, Hadoop. Which seems
to be what's happening... FWIW, CMU's work on distributed SMT is being
done using the IBM/Google cluster, so even if they're not directly
open sourcing their SMT software, they're making some sort of a
tangible contribution towards an open source version.


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

------------------------------------------------------------------------------

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Google Translate: Basque ALPHA

Reply via email to