On Mar 30, 3:18 pm, neshomir wrote: > I won't be so cruel to ask why isn't that the easiest thing in the > world, or something similar, but will say that if there are Croatian > and Serbian(Latin), as two different languages then its not O.K. that > all Serbian(Latin) pages are recognized as Croatian. I mean, what can > one think, when sees that i.e. official Serbian Government website is > in Serbia, and Croatian (depends if it's in Cyrillic or Latin). That > here official languages are Serbian and Croatian?
It's not so surprising since the language used to be "Serbo-Croatian". I don't know much about how large the differences are, but I notice that it doesn't matter much whether you ask for Serbian or Croatian on either of those pages (Latin or Cyrillic) - the translation is similar, and of similar quality. So it can't be that large differences. > So no matter if it's an easy or difficult thing to do, it should be > done, or it's just me? Google have open-sourced the system that they use to identify the language of a text. Apparently it counts the relative occurrence of four-letter word fragments, and compares these frequencies with tables derived from corpus text. Probably Serbian-Latin is not possible to distinguish from Croatian based on this technique, but you could try improving it yourself. What do you propose Google do? Label all Cyrillic pages "Serbian" and all Latin ones "Serbian or Croatian"? I bet the Croats won't be happy with that. -- You received this message because you are subscribed to the Google Groups "General" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-translate-general?hl=en.
