Some examples of Apertium's tagger messing with lines. Original: Китаплар да, кешеләр дә кайтты.
Аңа ярдәм итәргә кирәк. Output lines where partial merging occurred: ^Китаплар да/Китап<n><pl><nom>+да<cnjcoo>$^,/,<cm>$ ^кешеләр дә/кеше<n><pl><nom>+да<cnjcoo>$ ^кайтты/кайт<v><tv><ifi><p3><sg>$^./.<sent>$ ^Аңа/Ул<prn><dem><dat>$ ^ярдәм итәргә/ярдәм ит<v><tv><inf>$ ^кирәк/кирәк<n><sg><nom>+и<cop><aor><p3><sg>$^./.<sent>$ It is very difficult to find such cases in the big corpus. Best! Mansur Am Mi., 7. Nov. 2018 um 22:27 Uhr schrieb Kevin Brubeck Unhammer < unham...@fsfe.org>: > mansur <6688...@gmail.com> čálii: > > > Turned out disappears the last token in the meaning of Apertium, no > matter > > it is a word or punctuation, just last part like ^./.<sent>$ or > > ^word/lemma<pos><tag1><tag2>$ > > Hm, yeah it seems the NUL needs to go after the `]' on each linebreak > (that's how apy does it). Something like a sed 's/^]/]\x00/' after > deformatting might work better. > > I'm not sure how to avoid the final three NUL's at end-of-file, though > they're easy enough to postprocess out. > > I'd still like to see a minimal test case where the regular pipeline > merges lines though, lt-proc and cg-proc really shouldn't do that > (unless you do things like REMCOHORT in CG). > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff