Hi, 

I came across some bad space handling in apertium-postchunk, notably: if
there were two spaces in a row, they would be treated as separate
blanks, so that if you had 

    ^chunk{^word<tag>$  ^word<tag>$^word<tag>$}

and you tried outputting 

    chunk pos="1"
    b pos="1" 
    chunk pos="2"
    b pos="2" 
    chunk pos="3"

it would become

    ^chunk{^word<tag>$ ^word<tag>$ ^word<tag>$}

Also, escaped characters and non-alphabetics (stuff like \^ or " that
occur between words) were not output.

I added a patch to
http://bugs.apertium.org/cgi-bin/bugzilla/show_bug.cgi?id=89 where part
of the problem was reported already. I'd be happy if someone could test
if it works and can be committed.


On a related note, Gabriel Gregori Manzano's vm-for-transfer-cpp already
handles double spaces correctly, but doesn't handle escaped chars yet
(https://github.com/ggm/vm-for-transfer-cpp/issues/9). Although there
are still some issues with it, I'd recommend everyone who's working on
transfer to try apertium-transfervm-compiler; it can provide a lot of
helpful feedback (like if you've declared the wrong number of parameters
to a macro …).


best regards,
Kevin Brubeck Unhammer


------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Ciosco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to