Turned out disappears the last token in the meaning of Apertium, no matter
it is a word or punctuation, just last part like ^./.<sent>$ or
^word/lemma<pos><tag1><tag2>$

Am Mi., 7. Nov. 2018 um 19:02 Uhr schrieb mansur <6688...@gmail.com>:

> Hello!
>
> It doesn't work for me:
> ><px3sp><nom>+да<cnjcoo>$ ^бит/бит<mod_ass>$
> _
> ^ул/бул<v><tv><imp><p2><sg>$^,/,<cm>$ ^театраль/театраль<adj>$
> ^жест/*жест$ ^ясап/яса<v><tv><gna_perf>$
> ^-/-<guio>$ ^Синнән/Син<prn><pers><p2><sg><abl>$
> ^сорап/сора<v><tv><prc_perf>$ ^торырмын/тор<vaux><fut><p1><sg>$
> ^Барлык/Барлык<det><qnt>$ ^иптәшләрдән/иптәш<n><pl><abl>$
> ^кул/кул<n><sg><nom>+и<cop><aor><p3><sg>$ ^куйды/куй<v><tv><ifi><p3><sg>$
> ^рам да/рам<n><sg><nom>+да<cnjcoo>$ ^тикшерү/тикшерү<n><sg><attr>$
> ^органнарына/орган<n><pl><px3sp><dat>$
> _
> _
> _
>
> Problems are where we see _ symbol. In the end 3 new lines. And almost
> each line loses last character or even words (it should be "рам да тикшерү
> органнарына тапшыра").
>
> By the way, rules:
>         tr '\n' '\0' |
>         apertium-destxt -n |
>         lt-proc -z -w 'apertium-tat/tat.automorf.bin' |
>         cg-proc -z 'apertium-tat/tat.rlx.bin' |
>         cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' |
>         tr '\0' '\n' |
>         apertium-retxt |
>
> Replacing these 'tr' commands with previous recommendations from Fran
> gives correct output.
>
> Am Di., 6. Nov. 2018 um 22:45 Uhr schrieb Francis Tyers <
> fty...@prompsit.com>:
>
>> El 2018-11-06 20:36, Kevin Brubeck Unhammer escribió:
>> > Francis Tyers <fty...@prompsit.com> čálii:
>> >
>> >> Yes it does. It will put a sentence boundary after every word, meaning
>> >> that you won't get reliable tagger output. Apertium as far as I know
>> >> has no way to treat sentences as a sequence of lines. This is because
>> >> of how the format handling works.
>> >>
>> >> I think it would really be an excellent feature though. Perhaps a
>> >> GitHub issue? I do however think it would involve messing with quite a
>> >> bit of the pipeline.
>> >
>> > However, we *should* treat NUL as hard separators – if we don't,
>> > apertium-apy (and thus www.apertium.org) will risk sending output meant
>> > for person1 to person2. (I have an inkling there might still be bugs in
>> > apertium-transfer related to this.)
>> >
>> > Anyway, if we at least handle NUL's correctly in lt-proc and cg-proc,
>> > you could turn linebreak's into NUL's (first deleting any existing
>> > NUL's
>> > in the corpus) and tag with the -z option to lt-/cg-proc:
>> >
>> >     cat corpus.txt                                   \
>> >     | tr -d '\0'                                     \
>> >     | tr '\n' '\0'                                   \
>> >     | apertium-deshtml -n                            \
>> >     | lt-proc -z -w 'apertium-tat/tat.automorf.bin'  \
>> >     | cg-proc -z 'apertium-tat/tat.rlx.bin'          \
>> >     | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' \
>> >     | tr '\0' '\n'                                   \
>> >     | apertium-rehtml-noent
>> >
>> > … finally turning NUL's back into newlines.
>> >
>> > With apertium-nob, this doesn't seem to run slower than without -z, and
>> > doesn't merge lines in my test corpus.
>> >
>>
>> Ooh, this is great, we should probably put this on the wiki!
>>
>> F.
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to