None Another intriguing translation of non-word

Harald Korneliussen Thu, 31 Dec 2009 05:58:58 -0800


I translated a phrase involving the word "movie" from English to
Norwegian, saw something odd in the instant translation, had to
backspace a little to confirm it: "movie" is translated "film",
correctly. But when it's just "movi", it's translated "filmdel",
meaning "movie part".

I'm pretty sure it didn't pick up that from parallel texts containing
the word "movi"! As far as I can see, that's usually a stock market
abbreviation for some movie related business.

It seems google translate uses word fragments, or lower-level features
than words. In one way, it isn't surprising that the translator can
chop up words likely to have independent meaning - since you can
construct arbitrarily long compound words in many languages, this
would be a necessity for doing well with them. What is surprising is
that it appears to be automatic and not constructed by hand - I had
assumed that part of the work GT did for adding a new language, was a
tokenizer to split up compound words and word parts likely to have
grammatical significance (such as the -s at the end of English words).

Another impressive thing is that the suggested translation of "movi"
looks so eminently reasonable :-)

None Another intriguing translation of non-word

Reply via email to