Some examples of Apertium's tagger messing with lines.

Original:
Китаплар да, кешеләр
дә кайтты.

Аңа ярдәм
итәргә кирәк.

Output lines where partial merging occurred:
^Китаплар да/Китап<n><pl><nom>+да<cnjcoo>$^,/,<cm>$ ^кешеләр
дә/кеше<n><pl><nom>+да<cnjcoo>$
^кайтты/кайт<v><tv><ifi><p3><sg>$^./.<sent>$

^Аңа/Ул<prn><dem><dat>$ ^ярдәм итәргә/ярдәм ит<v><tv><inf>$
^кирәк/кирәк<n><sg><nom>+и<cop><aor><p3><sg>$^./.<sent>$

It is very difficult to find such cases in the big corpus.

Best!
Mansur

Am Mi., 7. Nov. 2018 um 22:27 Uhr schrieb Kevin Brubeck Unhammer <
unham...@fsfe.org>:

> mansur <6688...@gmail.com> čálii:
>
> > Turned out disappears the last token in the meaning of Apertium, no
> matter
> > it is a word or punctuation, just last part like ^./.<sent>$ or
> > ^word/lemma<pos><tag1><tag2>$
>
> Hm, yeah it seems the NUL needs to go after the `]' on each linebreak
> (that's how apy does it). Something like a sed 's/^]/]\x00/' after
> deformatting might work better.
>
> I'm not sure how to avoid the final three NUL's at end-of-file, though
> they're easy enough to postprocess out.
>
> I'd still like to see a minimal test case where the regular pipeline
> merges lines though, lt-proc and cg-proc really shouldn't do that
> (unless you do things like REMCOHORT in CG).
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to