On Sat, Nov 17, 2001 at 09:13:08AM +0100, Philip Newton wrote:
> On Sat, 17 Nov 2001 00:06:32 -0500, in perl.fwp you wrote:
>
> > We cheat a little by assuming that anything between space and ~ is
> > printable
> , which is unlikely to work on EBCDIC.
PostgreSQL doesn't support EBCDIC. It does support EUC_*, Unicode,
Mule, Latin[1-5], KOI8, WIN and ALT. I'm not going to get into
multi-byte character sets. POSIX::isprint() doesn't.
Anyhow, for <INSERT WEIRD CHARACTER SET HERE> you can always do
something like:
%Printable = (
ASCII => '\x20-\x7E',
EBCDIC => 'ib-m',
INTERGALACTIC_CHARACTER_SET_OF_LUV => '\xxx-\ooo'
);
$Printable{"LATIN$_"} = $Printable{ASCII} for 1..6;
my $gp_set = $Printable{$Curr_Char_Set};
sub u2p_dw_cached_isprint {
my($str) = shift;
$str =~ s/([^$gp_set])/isprint($1) ? $1 : $U2P{$1}/oge;
return $str;
}
The whole thing hinges on having a known set of locale independent
printable characters to avoid having to run each and every one through
isprint() every time.
--
Michael G. Schwern <[EMAIL PROTECTED]> http://www.pobox.com/~schwern/
Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One
MERV GRIFFIN!