Let me provide you some insight. I'm not affiliated with Google, but
I'm a polyglot and I understand how these systems work.

Google translate works really well when you know how to use it and
what kinds of texts it can translate. Basically, the more colloquial
something is, the more difficult it will be, for obvious reasons I
explain below.

The translation service uses lexicostatistical methods based on
parallel texts. In other words, there's a huge database with all the
sentences that have ever been written and their translations next to
them. This huge database is mostly English and every other language.
The greatest number of parallel-text documents are reports, news, any
government translations, technical stuff, patents, etc etc. But for
some languages like Icelandic, the amount of parallel texts is much
smaller than for more commonly used languages. The downside to the
translate feature is that the sentence you're requesting never
appeared in any report (especially sentences with "you" and "I" and
"could you tell me where" etc) and is generally only spoken, Google's
algorithm kicks into play grabbing the most common sentence structure
and replacing the vocabulary. I believe a future add-on might be a
double cross-check with actual usage in the blogosphere as this
represents more spoken language. But Google is already implementing
deciphering of all YouTube videos of spoken language which would be
even more accurate than the blogosphere. It will be some time for this
to mature however.

If you translate between two other languages like German and Polish,
it will end up going through English first which may erase some of the
verb conjugations that may exist in those two languages. Some
languages like Belorussian and Ukrainian all go through Russian as an
intermediate language, not English, so the verb conjugations and
declensions all stay intact (but not between Russian and Polish).

Getting back to your Icelandic. Many people have offered parallel
texts directly into Google's database. Google rarely builds the
contents of these databases themselves, but rather mines them from
users in any way they can. If you go in there and look around, you can
see there are some 300 languages you can provide parallel texts for.
Once they get enough of a database built up, they can release a
language in beta. These texts could potentially contain errors or even
deliberate jokes. You can imagine that some kid sitting in Iceland is
having fun inputting a whole book of jokes in both English and
Icelandic and that ends up as the only source in Google's database of
such sentences. When you type an exact match, Google spits out the
exact match for you.




On Apr 13, 6:37 am, "[email address]" wrote:
> Someone at Google is having fun playing with the translations returned
> from English->Icelandic.  I've run into a number of things for which
> the only possible explanation is someone hard-coding bogus answers
> in.  Here's what I found first:
>
> "Where" returns "Hvar" ("Where")
> "Where is" returns "Hvar er" ("Where is")
> "Where is the" returns "Hvar er" ("Where is")
> "Where is the bathroom?" returns "Talarðu ensku?" ("Do you speak
> English?", from ađ tala, to speak, and the accusative form of enska,
> "English")
>
> Har-di-har-har, Google.  You can ask other "Where is" questions and it
> translates them correctly, but they hard-coded "Where is the bathroom"
> to a joke answer.
>
> Want more?  Here's a more blatant example.
>
> "My" returns "My" (the personal possessive pronoun is a postposition
> in Icelandic)
> "My hovercraft" returns "Láttu mig" ("Let me", from ađ láta).
> "My hovercraft is full of eels" returns "Láttu mig í
> friði!" (basically, "Leave me in peace!")
>
> There's absolutely no way that's an accident.  I mean, they even added
> in an exclamation point!  And there's no way that Google has a
> dictionary where "Láttu" means "Hovercraft".
>
> Want more?  I keep finding these things.  Someone at Google apparently
> thinks this service is a big joke.  Accidental translation mistakes
> are one thing, but deliberate ones are an entirely different story
> altogether, and these are clearly not accidental.

-- 
You received this message because you are subscribed to the Google Groups 
"General" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-translate-general?hl=en.

Reply via email to