El 2020-04-20 19:14, Francis Tyers escribió:
El 2020-04-20 19:05, Tanmai Khanna escribió:
Hey guys,
When I proposed the modification to the Apertium stream format
earlier, it was rightly pointed out to be a bit premature and not
coupled with adequate justification. As part of preparation for my
project, I have tried to document the modification in a robust way,
such that it makes it clear what it looks like, what the benefits are,
why there would be no regression, and what are the changes that would
be needed in each module of the pipeline.

Here is the document:
http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format
. For my proposal and an explanation for why this is needed (as part
of eliminating dictionary trimming), one can refer to:
http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming
.

One thing that I'd like to point out about the benefit these new
secondary tags will provide us, is that it will give us the ability to
attach information to a Lexical Unit that is NOT A PART OF A
PRE-DEFINED FIXED LIST, unlike primary tags (the tags we already have
in the stream). While a lot of information can, and should be provided
in primary tags, information like surface forms or markup tags can not
be provided as pre-defined lists, and hence cannot be put in the
current stream as primary tags.

This modification would open an avenue for now and for the future to
include information like this without touching the already immensely
useful primary tags. Given that ensuring backwards compatibility will
be primary in this project, I believe these documents provide enough
as a proof of concept for this modification.

I invite your comments, complaints and feedback regarding this
modification.

Thanks and Regards,
Tanmai Khanna


Thanks for this.

What I'm still missing is a real translation/linguistic motivation, in
terms of which language pair will benefit from this, and how. Or does
this give us the ability to create new language pairs that we cannot
currently create. What level of improvement will this give us?

I can see that a lot of work has been put in, but how does this work
correspond to improved translation quality or widened access to machine
translation?

Fran


Another way of putting this is that it looks like a technical solution
in search of a problem, rather than a problem description in search
of a solution.

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to