Re: intelligent lexically encoding

2005-09-07 Thread Dan Kogai

On Sep 08, 2005, at 11:22 , Jerzy Giergiel wrote:
sorry for bugging people here with a trivial question. I need to  
convert from MacRoman encoding to asci (7-bit). Encode package  
simply replaces out of range characters with a question mark. I  
need something intelligent lexically speaking. For example aacute  
should be converted to a. Any suggestions?


Maybe you need to implement your own fallback method.
FYI Encode already has fallback methods as follows.

  $ascii = encode(ascii, $utf8, $fallbacks);

  where;

  $fallback is  á (U+00E1) will be
  
  Encode::FB_PERLQQ \x{00E1}
  Encode::HTMLCREF  #225;
  Encode::XMLCREF   #xe1;
  

If any of that will suffice, go ahead use it.  If it does not, you  
have go go like this;


$ascii = $utf8;
$ascii =~ s/([^\x00-\x7f])/your_own_fallback($1)/eg;

Hope that helps.

Dan the Encode Maintainer




Re: intelligent lexically encoding

2005-09-07 Thread Jerzy Giergiel
Neither of those fallbacks is OK, I want á converted to accent  
stripped version of itself i.e. a. The second solution isn't very  
helpful either, it's basically tr replacement table which is not much  
fun to write when majority of upper 128 characters need to be  
converted. There's gotta be a simpler and more elegant solution.   
thanks anyway.



sorry for bugging people here with a trivial question. I need to  
convert from MacRoman encoding to asci (7-bit). Encode package  
simply replaces out of range characters with a question mark. I  
need something intelligent lexically speaking. For example aacute  
should be converted to a. Any suggestions?




Maybe you need to implement your own fallback method.
FYI Encode already has fallback methods as follows.

  $ascii = encode(ascii, $utf8, $fallbacks);

  where;

  $fallback is  á (U+00E1) will be
  
  Encode::FB_PERLQQ \x{00E1}
  Encode::HTMLCREF  #225;
  Encode::XMLCREF   #xe1;
  

If any of that will suffice, go ahead use it.  If it does not, you  
have go go like this;


$ascii = $utf8;
$ascii =~ s/([^\x00-\x7f])/your_own_fallback($1)/eg;

Hope that helps.

Dan the Encode Maintainer