Re: [Apertium-stuff] New Occitan-French release

Hèctor Alòs i Font Fri, 04 Nov 2022 00:23:03 -0700

Missatge de Tino Didriksen <m...@tinodidriksen.com> del dia dj., 3 de
nov. 2022 a les 15:58:
>
> On Tue, 1 Nov 2022 at 11:45, Kevin Brubeck Unhammer <unham...@fsfe.org> wrote:
>>
>> Hèctor Alòs i Font <hectoralos-re5jqeeqqe8avxtiumw...@public.gmane.org>
>> čálii:
>>
>> > As for your proposal, I do not yet have sufficient knowledge of CG to fully
>> > understand it. My idea would be to make a first pass through a whole text
>> > to understand if enunciatives are used in it (for example, recognising
>> > other, more infrequent, but more easily recognisable enunciatives). In the
>> > solution you propose, it seems that this knowledge is acquired
>> > progressively, as sentences are translated. I fear that "que" is so messy
>> > that at least the first sentences of a text would have the same problems as
>> > we have now when we translate a Gascon text without enunciatives.
>>
>> That should be possible too, though I'm not sure how feasible it is to
>> get CG to go that far into a text. By default, CG keeps a context of two
>> windows, but that's configurable. It should be possible (perhaps with
>> minor modifications to cg-proc) to read a bunch of sentences and use
>> Window Spanning tests https://visl.sdu.dk/cg3/single/#test-spanning
>>
>> Tino, have you tried looking ahead several paragraphs, are there any
>> downsides? This should be a fairly simple rule file.
>
>
> The max I've seen in production is 9 windows, but there is no hard limit. 
> Just have to be careful of spanning tests, as they are going to look ahead 
> for every active window. A multi-pass system will perform better, and for 
> this particular task I'd say multi-pass is the correct approach.
>


So I thought, but then:

1) We need a first CG process that finds out whether the text has
enunciatives. Probably it should return somehow 0 or 1. How?
2) Depending on this, we will have two slightly different pipes, but
how? Should the syntax of the modes.xml be expanded to include a kind
of "if-else"?

More generally, it would be desirable to have a first step that
recognises from which variety of Occitan we are translating.
Currently, we force the user to say whether he is translating from
Languedocien (called "Occitan" in Apertium and "Occitan Languedocien"
in the translator of the Congrès Permanent de la Lenga Occitana). A
user does not necessarily know it. When there are two possibilities,
there is not too much of a problem: try one and, if it doesn't work
too well, try the other. But when we have four or more variants, it
will be less obvious. But, for now, the question is to differentiate
between two Gascon "flavours".

Hèctor


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] New Occitan-French release

Reply via email to