> What kind of lexical coverage do Google/Yandex have ?

This text shows about 98% coverage for Google and 97% coverage for Yandex,
based on words left untranslated.

What kind of effort/work do you think needs to be done to approach Google's
> quality?
> What would you say the main needs are now ?

The coverage is now fairly good thanks to the work done during GSoC, even
with proper nouns, but the rules have not changed much, so a good idea
would be to improve and expand them to support more patterns (English word
order changes in questions, for example, are not supported, and they are
very frequent). Of course, more work is needed on the tagger too to ensure
the rules are applied in every compatible case.

These needs are strongly related to approaching Google's translation
quality. Google Translate, thanks to it being based on corpora, has a lot
of information about different types of texts. Even if a single text is a
mix of different styles, it can easily solve them. Apertium, however, needs
to specifically know about every possible pattern and style, something
which is not reflected in neither the corpus used for tagger training nor
the transfer rules. Hence, while it can work well for what it "knows"
about, once being given something different, it outputs funny results.

One of my main goals for the near future is to rewrite everything related
to verbs to take advantage of three-staged transfer. Most of the current
rules have seen minor modifications since the switch from one-staged to
three-staged transfer and only apply for very specific patterns; a good
rewrite should offer noticeable improvements.

2018-03-12 12:21 GMT+01:00 Francis Tyers <fty...@prompsit.com>:

> El 2018-03-12 12:10, Marc Riera Irigoyen escribió:
>> Have you done any evaluation ? How does it compare to other systems
>>> (and
>>> the old system too) ? :)
>> The pair works fairly well with encyclopedia-like texts, and has a
>> good Wikipedia coverage (92% for English and 87% for Catalan). The
>> reference translation (an English article on Greece not used during
>> development) shows a WER/PER of 51%/35%, better than the old pair's
>> 56%/40% with the same text. Yandex is slightly better than Apertium,
>> with 56%/34%, and Google stands with the best results (43%/26%). I
>> have not really evaluated translations from Catalan (most of the
>> development has taken place in the other direction), but it should be
>> more or less the same as the old pair.
> Good to know that we are approaching the quality of Yandex! :)
> What kind of effort/work do you think needs to be done to approach Google's
> quality?
> What kind of lexical coverage do Google/Yandex have ?
> While the pair still needs a lot of work and love, the rewrite has
>> eased development. With good taggers on both sides, trained with
>> diverse texts (including dialogues to reflect oral language
>> constructions), as well as a reorganization/rewrite of the transfer
>> rules (inherited from the messy old pair), we should have a very
>> decent and useful language pair.
> What would you say the main needs are now ?
> Fran


*Marc Riera Irigoyen*
Freelance Translator EN/JA>CA/ES

(+34) 652 492 008
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Apertium-stuff mailing list

Reply via email to