Fran, I didn't want to forget to close the loop. apertium-mk-en-0.1.1
compiles and runs just fine. Thanks again for updating it.
- Dan
On Thu, Feb 9, 2012 at 8:50 AM, Dan Loehr <[email protected]> wrote:
> Thanks, Fran, for that explanation of Google's poor results. I was
> wondering what might be going on. Especially since their Bulgarian works
> well.
>
> - Dan
> On Thu, Feb 9, 2012 at 2:34 AM, Francis Tyers <[email protected]>wrote:
>
>> El dc 08 de 02 de 2012 a les 20:26 -0500, en/na Dan Loehr va escriure:
>> > Many thanks, Fran. I won't be able to download and test the new
>> > version (apertium-mk-en-0.1.1.tar.gz) for a day or two. But I did
>> > want to reply right away and say thank you.
>> >
>> > You also asked for feedback on the quality. You are probably already
>> > aware that it does very well compared to Google Translate. Your
>> > online platform at apertium.org provides this translation of a section
>> > from the Macedonian version of the UN Declaration of Human Rights:
>> >
>> > Since the recognition on врoдeнoтo dignity, and on the equal and
>> > нeoтуѓиви authentic on all members on the humanity are тeмeлитe on the
>> > freedom, the justice and the peace in the world;
>> >
>> > And here's Google Translate's translation of the same passage:
>> >
>> > A great priznavanjeto Following the vrodenoto dostoinstvo, also in
>> > case of ednakvite and neotugjivi prava Following the all outdoor
>> > chlenovi Following the choveshtvoto everything temelite Following the
>> > slobodata, pravdata and mirot vo svetot;
>> >
>> > Here's the UN's English version (available at
>> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=eng)
>> >
>> > Whereas recognition of the inherent dignity and of the equal and
>> > inalienable rights of all members of the human family is the
>> > foundation of freedom, justice and peace in the world,
>> >
>> > (And here's the actual section translated (available at
>> > http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=mkj):
>> >
>> > Бидejќи признaвaњeтo нa врoдeнoтo дoстoинствo, и нa eднaквитe и
>> > нeoтуѓиви прaвa нa ситe члeнoви нa чoвeштвoтo сe тeмeлитe нa
>> > слoбoдaтa, прaвдaтa и мирoт вo свeтoт;
>> >
>> > So for 8-10 days' work, I'd say you've done quite well!
>> >
>> > Thanks again,
>>
>> Hmm, the poor result from Google is surprising and leads me to think
>> there is something else at play here. I'm sure they have the same corpus
>> I was working with 'SETimes'. I would also be surprised if they haven't
>> used the UDHR in their training corpus too.
>>
>> I just checked and the Macedonian input (from the UDHR) is full of Latin
>> characters, e.g. Latin 'o' instead of Cyrillic 'о', 'e' and 'a' the
>> same.
>>
>> If we replace them with their Cyrillic counterparts, Google gets a much
>> better result:
>>
>> --
>>
>> Бидеjќи признавањето на вроденото достоинство, и на еднаквите и
>> неотуѓиви права на сите членови на човештвото се темелите на слободата,
>> правдата и мирот во светот;
>>
>> Since they recognizing the inherent dignity and equal and inalienable
>> rights of all members of the human family is the foundation of freedom,
>> justice and peace in the world;
>>
>> --
>>
>> So, if you want a free/rule-based system then Apertium is probably what
>> you're looking for. And we'd definitely welcome further feedback and
>> development. Otherwise, if you want to make a vanilla SMT system, use
>> the SETimes corpus and make sure you sanitise your input on the
>> Macedonian side for unexpected Latin characters (in Apertium we have an
>> option to do it in the dictionary compilation stage).
>>
>> Best regards,
>>
>> Fran
>>
>> PS. I'm really surprised Google isn't doing this for languages using
>> Cyrillic, having Latin characters pop up doesn't just happen in
>> Macedonian (sometimes from bad keyboard layouts, sometimes from bad OCR
>> software), but also in other languages with Cyrillic-based scripts,
>> Chuvash, Komi etc.
>>
>>
>
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff