On Mar 30, 3:18 pm, neshomir wrote:
> I won't be so cruel to ask why isn't that the easiest thing in the
> world, or something similar, but will say that if there are Croatian
> and Serbian(Latin), as two different languages then its not O.K. that
> all Serbian(Latin) pages are recognized as Croatian. I mean, what can
> one think, when sees that i.e. official Serbian Government website is
> in Serbia, and Croatian (depends if it's in Cyrillic or Latin). That
> here official languages are Serbian and Croatian?

It's not so surprising since the language used to be "Serbo-Croatian".
I don't know much about how large the differences are, but I notice
that it doesn't matter much whether you ask for Serbian or Croatian on
either of those pages (Latin or Cyrillic) - the translation is
similar, and of similar quality. So it can't be that large
differences.

> So no matter if it's an easy or difficult thing to do, it should be
> done, or it's just me?

Google have open-sourced the system that they use to identify the
language of a text. Apparently it counts the relative occurrence of
four-letter word fragments, and compares these frequencies with tables
derived from corpus text. Probably Serbian-Latin is not possible to
distinguish from Croatian based on this technique, but you could try
improving it yourself.

What do you propose Google do? Label all Cyrillic pages "Serbian" and
all Latin ones "Serbian or Croatian"? I bet the Croats won't be happy
with that.

-- 
You received this message because you are subscribed to the Google Groups 
"General" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-translate-general?hl=en.

Reply via email to