> > I also found a few characters reversed with your routine so this past > > weekend I combined the functionality of the two routines (shape_arabic and > > arabjoin) and created a third. The routine also does the following: > > > > SOURCE: > > <Arabic1> <Latin1> <Arabic2> <Latin2> <Arabic3> > > > > RESULT: > > <3cibarA> <Latin2> <2cibarA> <Latin1> <1cibarA> > > I'm not a fan of arabjoin and I think it is your source of problems. > Dump it and use fribidi instead.
Hello, Chris and Nadim, I have no unsolvable problems with arabjoin. I do not know fribidi, but will have a look at Arabeyes. I would like to stress the striking 'perl-way' implementation of the shaping algorithm, which in the code at http://czyborra.com/ reads @uchar = # UTF-8 character chunks /([\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+)/g; # We walk through the line of text and do contextual analysis: for ($i = $[; $i <= $#uchar; $i = $j) { for ($b=$uchar[$j=$i]; $transparent{$c=$uchar[++$j]};){}; # The following assignment is the heart of the algorithm. # It reduces the Arabic joining algorithm described on # pages 6-24 to 6-26 of the Arabic character block description # in the Unicode 2.0 Standard to four lines of Perl: $uchar[$i] = $a && $final{$c} && $medial{$b} || $final{$c} && $initial{$b} || $a && $final{$b} || $isolated{$b} || $b; $a = $initial{$b} && $final{$c}; } [to avoid 'undefined' warnings, you might use something like for ($b=$uchar[$j=$i]; $transparent{$c=$uchar[++$j]||''};){}; in the code above] The rest of the script is either getting the Unicode data, or dealing with ligatures, which may be omitted except for the compulsory lam+alif ones. The problem you might face is that the data in the file are in utf8, and that you will need to perform conversions like use Encode; $internal_perl_representation = decode 'utf8', $arabjoin_data; # or if taking arabjoin.pl as is $expected_by_arabjoin = encode 'utf8', $in_perl_internal_utf8; while having your Arabic strings in the perl's internal representaion. If there are new implementations to the shaping algoritm, I think they should evaluate against this arabjoin.pl script/algorithm by Roman Czyborra, using e.g. the Benchmark module for the developer's tests. If anything needs improvement in arabjoin, it is the clear programming interface and 'file-encoding-independent' storage of the Unicode data, simply using string interpolation and the \x{...} construct or so. Further to improve is the optionality/scope of the non-compulsory ligatures, and the efficiency of it. The shaping itself is solved excellently. So, if there are any modules to appear at CPAN, I would like them to address the above issues, so that they are really reusable. Thanks, Otakar Smrz _______________________________________________ Developer mailing list [EMAIL PROTECTED] http://lists.arabeyes.org/mailman/listinfo/developer

