El dv 14 de 05 de 2010 a les 13:38 +0200, en/na Mikel L. Forcada va
escriure:
> (1) "What is the difference between Google Translate and Apertium?
> Google Translate build systems that work but they don't know why, 
> whereas in Apertium we build systems that don't work but we know why."
> (a retake of the usual joke on Natural Language Processing and 
> Computational Linguistics).

:D 

Perhaps we should get that put on the tshirt ;)

> (2) I wonder if they are using any of Apertium or Matxin to do some 
> morphological preprocessing...
> 
> (3)
> >> Yep, especially considering that after talking with Mike Galvez (Google)
> >> and Ofis ar Brezhoneg, I have been sending them data for Breton. Mike
> >> has told me that any data I send them will be returned as TMX.
> >>
> >> It was a hard decision -- making ourselves less relevant one pair at a
> >> time -- but as many people have told me, getting on Google Translate is
> >> a real point of pride for speakers of smaller languages. And the
> >> language should always come first.
> >>     
> Sorry, Fran, but isn't this the same as effectively collaborating with a 
> company that does closed-source MT? You won't get the code to their 
> system, 

I don't expect their system is anything special from a coding point of
view. 

> and you won't have access to the enormous amount of corpora they 
> have access to.

Mike has said that he will send me back in TMX format anything that I
send them. Thus, the value added is not having to process and align all
that bilingual text myself -- something I probably would not have time
to do. In the case of Breton, I doubt whether they have any
substantially more enormous corpora than what we have.

> So, wouldn't these language communities actually be "taking pride" in 
> becoming dependent on Google? Wouldn't these language communities be 
> effectively giving up on actually understanding how their languages work 
> so that they can build technologies of their own for them?

It's up to them if they want to give up, Google sets the benchmark
pretty high. It's a similar challenge that I'd set to linguists -- if
linguistics works, then make better MT. 

> I 'm sure your collaboration with Google is well-meant, but I think we 
> should be very careful about the way we facilitate Google's moves 
> towards generating translation technology monopolies for small languages.

They were going to do it anyway, there are two main differences, 1) This
way it happens faster -- which doesn't really change much, 2) We get
access to the data as opposed to them hoarding it for themselves.

I already told you how Google's MT blitzkrieg gets me down, but I don't
really see much other option. Keep the unprocessed data to myself, where
it isn't useful for anyone, or let Google process it and get it back in
a useable form. 

This is probably going to be the main struggle for the next ten years.
If researchers don't come together and pool their resources and efforts
then they'll probably just be picked off one by one.

Sorry if this email seems too rambling,

Fran


------------------------------------------------------------------------------

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to