Francis Tyers <fty...@prompsit.com> čálii: [...]
>>> That would be a good feature, but wouldn't get past the issue of the >>> tagger/cg. E.g. if we do that then the tagger can't take into account >>> context. >> >> Isn't that the whole point? (Ie. treat each line as completely >> independent, no context.) > > I don't think so, I think Mansur wants the tagger to disambiguate > according > to the context, but have it in line-by-line output, like TreeTagger or > UDpipe > etc. Well, it's only lt-proc doing the moving, so just move the NUL-deletion before cg-proc: cat corpus.txt \ | tr -d '\0' \ | apertium-deshtml -n \ | sed 's/\[$/[][/; s/^]/]\x00/' \ | lt-proc -z -w 'tat.automorf.bin' \ | tr -d '\0' \ | cg-proc -z 'tat.rlx.bin' \ | cg-proc -z -w -1 dev/mansur.bin' \ | apertium-rehtml-noent Now only lt-proc should treat end-of-line as a stream delimiter.
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff