Thanks, Fran, for that explanation of Google's poor results. I was
wondering what might be going on. Especially since their Bulgarian works
well.
- Dan
On Thu, Feb 9, 2012 at 2:34 AM, Francis Tyers <[email protected]> wrote:
> El dc 08 de 02 de 2012 a les 20:26 -0500, en/na Dan Loehr va escriure:
> > Many thanks, Fran. I won't be able to download and test the new
> > version (apertium-mk-en-0.1.1.tar.gz) for a day or two. But I did
> > want to reply right away and say thank you.
> >
> > You also asked for feedback on the quality. You are probably already
> > aware that it does very well compared to Google Translate. Your
> > online platform at apertium.org provides this translation of a section
> > from the Macedonian version of the UN Declaration of Human Rights:
> >
> > Since the recognition on врoдeнoтo dignity, and on the equal and
> > нeoтуѓиви authentic on all members on the humanity are тeмeлитe on the
> > freedom, the justice and the peace in the world;
> >
> > And here's Google Translate's translation of the same passage:
> >
> > A great priznavanjeto Following the vrodenoto dostoinstvo, also in
> > case of ednakvite and neotugjivi prava Following the all outdoor
> > chlenovi Following the choveshtvoto everything temelite Following the
> > slobodata, pravdata and mirot vo svetot;
> >
> > Here's the UN's English version (available at
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=eng)
> >
> > Whereas recognition of the inherent dignity and of the equal and
> > inalienable rights of all members of the human family is the
> > foundation of freedom, justice and peace in the world,
> >
> > (And here's the actual section translated (available at
> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mkj):
> >
> > Бидejќи признaвaњeтo нa врoдeнoтo дoстoинствo, и нa eднaквитe и
> > нeoтуѓиви прaвa нa ситe члeнoви нa чoвeштвoтo сe тeмeлитe нa
> > слoбoдaтa, прaвдaтa и мирoт вo свeтoт;
> >
> > So for 8-10 days' work, I'd say you've done quite well!
> >
> > Thanks again,
>
> Hmm, the poor result from Google is surprising and leads me to think
> there is something else at play here. I'm sure they have the same corpus
> I was working with 'SETimes'. I would also be surprised if they haven't
> used the UDHR in their training corpus too.
>
> I just checked and the Macedonian input (from the UDHR) is full of Latin
> characters, e.g. Latin 'o' instead of Cyrillic 'о', 'e' and 'a' the
> same.
>
> If we replace them with their Cyrillic counterparts, Google gets a much
> better result:
>
> --
>
> Бидеjќи признавањето на вроденото достоинство, и на еднаквите и
> неотуѓиви права на сите членови на човештвото се темелите на слободата,
> правдата и мирот во светот;
>
> Since they recognizing the inherent dignity and equal and inalienable
> rights of all members of the human family is the foundation of freedom,
> justice and peace in the world;
>
> --
>
> So, if you want a free/rule-based system then Apertium is probably what
> you're looking for. And we'd definitely welcome further feedback and
> development. Otherwise, if you want to make a vanilla SMT system, use
> the SETimes corpus and make sure you sanitise your input on the
> Macedonian side for unexpected Latin characters (in Apertium we have an
> option to do it in the dictionary compilation stage).
>
> Best regards,
>
> Fran
>
> PS. I'm really surprised Google isn't doing this for languages using
> Cyrillic, having Latin characters pop up doesn't just happen in
> Macedonian (sometimes from bad keyboard layouts, sometimes from bad OCR
> software), but also in other languages with Cyrillic-based scripts,
> Chuvash, Komi etc.
>
>
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff