[Apertium-stuff] ACX format and combining characters

Francis Tyers Tue, 02 Apr 2013 14:06:30 -0700

Hey all!

I'm not sure if this is a bug or a feature, but when I try to make
accented characters in Cyrillic (e.g. with combining characters: о́ ы́
у́ etc.) equivalent with their non-combined variants (e.g. о ы у) in an
ACX file,[1] then I don't get the unaccented characters treated as the
accented ones,


1.

  <char value="ы́">
    <equiv-char value="ы"/>
  </char>

Looking at the transducer, it seems that this makes some kind of sense
from an encoding point of view because о and combining ´ are treated as
separate 'characters', but linguistically it might be less clear. 

I suspect that this is can not be dealt with in a clean way (without
having some special code in lttoolbox to deal with combining
characters). But I thought it meritted an email to the list in case
anyone else comes up against this.

Fran


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] ACX format and combining characters

Reply via email to