-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> When reading file names with e.g. Umlauts from a directory, either via
> readdir() or glob() and storing them in a db these strings are not
> correctly returned from the DB. This does not appear when the strings are
> ordinary Perl Strings.

I'm pretty sure this is because of a known problem with Perl, in that it 
doesn't treat globs and the like as utf-8 when they should be. To illustrate, 
I modified the original script a bit and added:

use utf8;
use Data::Peek;

Then I took a look at the same named file, both provided directly in 
the script, and from the glob. Note the difference via DDump($file):

SV = PV(0x8c0f0e8) at 0x8cd6430
  REFCNT = 2
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x8cd9bf8 "./files/K\303\266ln"\0 [UTF8 "./files/K\x{f6}ln"]
  CUR = 14
  LEN = 16

SV = PV(0x8c0f0d0) at 0x8cd62c8
  REFCNT = 2
  FLAGS = (POK,pPOK)
  PV = 0x8cd9c38 "./files/K\303\266ln"\0
  CUR = 13
  LEN = 16

The first one, which Perl recognizes as a UTF8 string, goes into 
and comes out of the database just fine. The second (via glob) 
does not. Ideally Perl would be smart enough to set UTF8 on for 
such filenames, but it does not. I'm not sure there is anything 
DBD::Pg could sensibly do. One solution to the problem at hand 
may be to simply upgrade the string yourself before handing it 
off to the database, like so:

utf8::upgrade($file);

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201308292202
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlIf/VcACgkQvJuQZxSWSshmSQCg7//0IBH3+GeBtmM6PHIRw9qO
F6IAnA0ylRdrgh8xplMwNTn3h+Iqvi7J
=yPxj
-----END PGP SIGNATURE-----


Reply via email to