On Sat, Nov 17, 2001 at 09:13:08AM +0100, Philip Newton wrote:
> On Sat, 17 Nov 2001 00:06:32 -0500, in perl.fwp you wrote:
> 
> > We cheat a little by assuming that anything between space and ~ is
> > printable
>            , which is unlikely to work on EBCDIC.

PostgreSQL doesn't support EBCDIC.  It does support EUC_*, Unicode,
Mule, Latin[1-5], KOI8, WIN and ALT.  I'm not going to get into
multi-byte character sets.  POSIX::isprint() doesn't.

Anyhow, for <INSERT WEIRD CHARACTER SET HERE> you can always do
something like:

    %Printable = (
        ASCII       => '\x20-\x7E',
        EBCDIC      => 'ib-m',
        INTERGALACTIC_CHARACTER_SET_OF_LUV => '\xxx-\ooo'
    );
    $Printable{"LATIN$_"} = $Printable{ASCII} for 1..6;

    my $gp_set = $Printable{$Curr_Char_Set};

    sub u2p_dw_cached_isprint {
        my($str) = shift;
        $str =~ s/([^$gp_set])/isprint($1) ? $1 : $U2P{$1}/oge;
        return $str;
    }

The whole thing hinges on having a known set of locale independent
printable characters to avoid having to run each and every one through
isprint() every time.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>    http://www.pobox.com/~schwern/
Perl Quality Assurance      <[EMAIL PROTECTED]>         Kwalitee Is Job One
MERV GRIFFIN!

Reply via email to