This is a review of  "plperl encoding issues"

Your database uses one encoding, and passes data to perl in the same encoding, 
which perl is not prepared for (it assumes UTF-8).  This patch makes sure data 
is encoded into UTF-8 before its passed to plperl then converts the response 
from UTF-8 back to the database encoding for storage.

My test:

ptest2=# create database ptest2 encoding 'EUC_JP' template template0;

I created a simple perl function that reverses the string.  I don't know Japanese so I 
found a tattoo website that had sayings in Japanese... I picked: "I am awesome".

create or replace function preverse(x text) returns text as $$
        my $tmp = reverse($_[0]);
        return $tmp;
$$ LANGUAGE plperl;

Before the patch:

ptest2=#select preverse('私はよだれを垂らす');

(1 row)

It is also possible to generate invalid characters.  This function pulls off 
the last character in the string... assuming its UTF-8

create or replace function plastchar(x text) returns text as $$
        my $tmp = substr($_[0], -1);
        return $tmp;
$$ LANGUAGE plperl;

ptest2=# select plastchar('私はよだれを垂らす');

ERROR:  invalid byte sequence for encoding "EUC_JP": 0xb9
CONTEXT:  PL/Perl function "plastchar"

Because the string was not UTF-8, perl got confused and returned an invalid 

After the patch:
The exact same plperl functions work fine:

ptest2=# select preverse('私はよだれを垂らす');

(1 row)

ptest2=# select plastchar('私はよだれを垂らす');

(1 row)

This is a bug fix, not for performance, however, as noted by the author, many 
encodings will be very UTF-8'ish and the overhead will be very small.  For 
those encodings that would need converted, you'd need to do the same convert  
inside your perl function anyway before you could use the data.  The processing 
has just moved from inside your perl func to inside PG.

The Patch:
Applies clean to git head as of January 15 2011.  PG built with 
--enable-cassert and --enable-debug seems to run fine with no errors.

I don't think regression tests cover plperl, so understandable there are no 
tests in the patch.

There is no manual updates in the patch either, and I think there should be.  I 
think it should be made clear
that data (varchar, text, etc.  but not bytea) will be passed to perl as UTF-8, 
regardless of database encoding.  Also that "use utf8;" is always loaded and in 

Code Review:
I am not qualified.  Looking through the patch, I'm reminded of the old saying: "Any 
sufficently advanced perl XS code is indistinguishable from magic"  :-)

Other Remarks:
- Yes I know... it was a joke.
- I sure hope this posts to the news group ok
- My terminal (konsole) had a hard time displaying Japanese, so I used psql's 
\i and \o to read/write files that kwrite show'd/encoded correctly via EUC_JP

Looks good.  Looks needed.  Needs manual updates.

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to