Re: What to do with UTF-8 data?

2003-09-10 Thread Steve Hay
Chuck Fox wrote:

Steve,

I am a Sybase DBA, but in situations like this, I have declared the 
column on the table to be varbinary or binary and stored the data 
directly without conversion.  Don't know if MySql supports this datatype. 
One can declare a column VARCHAR(n) BINARY for a similar thing to 
Sybase's VARBINARY(n), but unfortunately it makes no difference.  If I 
store the data directly without conversion into such a column then, as 
far as I can make out, each character's bytes get stored exactly as if I 
had converted to bytes beforehand.  And when I retrieve the data (again 
without conversion) I just get octet sequences into my Perl scalars - 
not flagged, UTF-8 character strings as I would like.

This is all exactly the same as what I get with a VARCHAR(n) column.

- Steve



Re: What to do with UTF-8 data?

2003-09-10 Thread Peter J. Holzer
On 2003-09-10 08:33:03 +0100, Steve Hay wrote:
 Chuck Fox wrote:
 I am a Sybase DBA, but in situations like this, I have declared the 
 column on the table to be varbinary or binary and stored the data 
 directly without conversion.  Don't know if MySql supports this datatype. 
 
 One can declare a column VARCHAR(n) BINARY for a similar thing to 
 Sybase's VARBINARY(n), but unfortunately it makes no difference.  If I 
 store the data directly without conversion into such a column then, as 
 far as I can make out, each character's bytes get stored exactly as if I 
 had converted to bytes beforehand.  And when I retrieve the data (again 
 without conversion) I just get octet sequences into my Perl scalars - 
 not flagged, UTF-8 character strings as I would like.
 
 This is all exactly the same as what I get with a VARCHAR(n) column.

Yes, but the difference is that with a binary column the database knows
that the data is not character data in a charset it knows. For example
the MySQL manual states:

  Values in CHAR and VARCHAR columns are sorted and compared in
  case-insensitive fashion, unless the BINARY attribute was specified
  when the table was created.

This doesn't work with UTF-8 data of course, because the individual
bytes of a multi-byte-character are not whole characters and hence don't
have a uppercase or lowercase equivalent. So two strings which should
compare equal generally won't, and sometimes strings which should not
compare equal will. If you just tell the database this is binary data,
not character data it won't try to do case conversions on it and it
(hopefully) will stop you doing them in SQL code (You can and must do
them in perl).

hp

Disclaimer: I haven't actually used MySQL for some time, this is general
advice, not targeted at the specific way MySQL compares strings.

-- 
   _  | Peter J. Holzer  | Unser Universum wäre betrüblich
|_|_) | Sysadmin WSR / LUGA  | unbedeutend, hätte es nicht jeder
| |   | [EMAIL PROTECTED]| Generation neue Probleme bereit.
__/   | http://www.hjp.at/   |  -- Seneca, naturales quaestiones


pgp0.pgp
Description: PGP signature


Re: What to do with UTF-8 data?

2003-09-10 Thread Bart Lateur
On Wed, 10 Sep 2003 08:33:03 +0100, Steve Hay wrote:

And when I retrieve the data (again 
without conversion) I just get octet sequences into my Perl scalars - 
not flagged, UTF-8 character strings as I would like.

If you're *sure* that this is UTF-8, only perl doesn't flag it as such,
you can set the flag yourself. In perl 5.8.x, you can use the Encode
module, one of the functions documented near the bottom, here the
_utf8_on(STRING). That's an inplace modifying function, so you use it
like

_utf8_on($string_that_should_be_utf8);

For perl 5.6.x (and likely for 5.8 too), you can achieve the same effect
by using pack() this way:

$flagged_as_utf8 = pack U0a*, $string_that_should_be_utf8;


Earlier perls than 5.6 don't have this UTF8 flag, nor do they accept the
U template.

-- 
Bart.


Re: What to do with UTF-8 data?

2003-09-10 Thread Steve Hay
Bart Lateur wrote:

On Wed, 10 Sep 2003 08:33:03 +0100, Steve Hay wrote:

 

And when I retrieve the data (again 
without conversion) I just get octet sequences into my Perl scalars - 
not flagged, UTF-8 character strings as I would like.
   

If you're *sure* that this is UTF-8, only perl doesn't flag it as such,
you can set the flag yourself. In perl 5.8.x, you can use the Encode
module, one of the functions documented near the bottom, here the
_utf8_on(STRING). That's an inplace modifying function, so you use it
like
	_utf8_on($string_that_should_be_utf8);

Yep, that's an alternative to the $now_is_utf8 = 
Encode::decode_utf8($should_be_utf8) call that I described in my 
original posting.

But the question was: How can I arrange for such conversions to be 
performed automatically by DBI whenever it receives or returns data?

- Steve



Re: Problem with proxy server upgrade and storable

2003-09-10 Thread Tim Bunce
Old versions of Storable were more fussy about version mismatch
than newer ones. So I think on this occasion you do need to upgrade
Storable on clients and servers.

Tim.

On Tue, Sep 09, 2003 at 05:30:43PM -0700, Douglas Smith wrote:
 
 Hello All-
 
 I manage a use of a DBI proxyserver to give people access to a local
 database worldwide, and I would like to update the proxy server to
 use perl 5.8.0 and the latest DBI and related modules.
 
 But after I have done this on a test server, and try to connect from an
 existing client, I get this error:
 
 Cannot log in to DBI::ProxyServer: Storable binary image v2.6 more recent than I am 
 (v2.4)
 at blib/lib/Storable.pm (autosplit into blib/lib/auto/Storable/thaw.al) line 355
 
 Since I also upgraded the Storable module along with the DBI.  
 
 Does this mean I have to insist that all remote clients have to upgrade
 right away to be able to keep using the server?  Shouldn't Storable be
 backward compat. enough that this is not needed?  That a client with
 an older version of Storable should be able to use a server with a newer
 version of Storable?
 
 I hate to have to make everyone who wants to continue use, go and track
 down the versions of various modules and re-install and upgrade.  I was
 hoping to upgrade to support the use of newer versions of perl and 
 modules without insisting that current older versions that are working 
 also need to upgrade.
 
 Am I stuck, is there a way around this?
 
 Douglas
 
 -- 
 ---
 Douglas A. Smith  [EMAIL PROTECTED]
 Office: Bld 280, Rm 157   (650)926-2369
 ---


Re: Can't locate object method TIEHASH when used in Safe compartment

2003-09-10 Thread Tim Bunce
I wouldn't hold out much hope of getting DBI and Safe to work
together like that. Safe is pretty much a failed experiment.  Still
useful in some cases but generally painful to use and always difficult
to prove how much 'safety' you end up with.

If you really want to press on then at least upgrade to perl 5.8.1
(release candidate 5 should be available any day now), and don't
expect much help from others as few people have delved into the
depths of Safe.  The [EMAIL PROTECTED] list is probably the
best place.

Good luck.

Tim.

On Tue, Sep 09, 2003 at 07:44:07PM -0700, Peter wrote:
 Is this a bug or something else?  Can someone please give me some 
 direction about where to look or go to solve this problem?
 
 The error message is:
 Can't locate object method TIEHASH via package DBI::st at 
 /usr/lib/perl5/site_perl/5.6.1/i386-linux/DBI.pm line 1053.
 
 I managed to reproduce the error in this simple test program:
 #!/usr/bin/perl -w
 use strict;
 use integer;
 
 use DBI;
 use Safe;
 
 our $dbh = DBI-connect(dbi:Pg:dbname=mydb, 'username', 'password');
 $dbh-{AutoCommit} = 0; # Don't commit changes to the db until we're done
 $dbh-{RaiseError} = 1; # die if there's a db error (thereby preventing 
 us from writing partial data to the db)
 
 our $safe = new Safe;
 $safe-share_from('main',['$dbh']);
 
 our $safecode = END;
 \$dbh-prepare(SELECT * FROM mytable);
 END
 
 $safe-reval($safecode,1);
 die $@ if $@;
 
 exit;
 
 
 Thanks in advance,
 
 Peter
 


Re: What to do with UTF-8 data?

2003-09-10 Thread Bart Lateur
On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote:

But the question was: How can I arrange for such conversions to be 
performed automatically by DBI whenever it receives or returns data?

Well, there are two options... either does the dtabase somewhere stores
the flag indicating that some string is in UTF8, or you have to add that
information yourself. For the latter, I don't know if it'll actually
work, but it seems like an appropriate way to do it: add a BOM marker
at the start of the string.

http://www.unicode.org/unicode/faq/utf_bom.html#22 (and below)

-- 
Bart.


Re: What to do with UTF-8 data?

2003-09-10 Thread Steve Hay
Bart Lateur wrote:

On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote:

 

But the question was: How can I arrange for such conversions to be 
performed automatically by DBI whenever it receives or returns data?
   

Well, there are two options... either does the dtabase somewhere stores
the flag indicating that some string is in UTF8, or you have to add that
information yourself. For the latter, I don't know if it'll actually
work, but it seems like an appropriate way to do it: add a BOM marker
at the start of the string.
I don't think the MySQL 3.x stores any flag to indicate that a string is 
UTF8, and even if it did I'm not aware of anything in DBI or DBD-mysql 
that would make use of it, e.g. to decode data flagged in such a way 
into Perl's internal format.

Adding a BOM myself to the string seems to have problems of its own (see 
http://www.unicode.org/unicode/faq/utf_bom.html#27), and again I'm not 
aware of DBI / DBD-mysql having anything in them that would make use of 
such a BOM.  Please correct me if I'm wrong - that could be just the 
sort of thing that I'm looking for here.

- Steve




Re: Can't locate object method TIEHASH when used in Safe compartment

2003-09-10 Thread Peter
Thanks a lot for your timely response Tim.  I think at this point I've 
given up on using Safe but when I have more time I may try to do it by 
isolating all the DBI code outside of Safe and see how that goes.

Regards, Peter

Tim Bunce wrote:
I wouldn't hold out much hope of getting DBI and Safe to work
together like that. Safe is pretty much a failed experiment.  Still
useful in some cases but generally painful to use and always difficult
to prove how much 'safety' you end up with.
If you really want to press on then at least upgrade to perl 5.8.1
(release candidate 5 should be available any day now), and don't
expect much help from others as few people have delved into the
depths of Safe.  The [EMAIL PROTECTED] list is probably the
best place.
Good luck.

Tim.

On Tue, Sep 09, 2003 at 07:44:07PM -0700, Peter wrote:

Is this a bug or something else?  Can someone please give me some 
direction about where to look or go to solve this problem?

The error message is:
Can't locate object method TIEHASH via package DBI::st at 
/usr/lib/perl5/site_perl/5.6.1/i386-linux/DBI.pm line 1053.

I managed to reproduce the error in this simple test program:
#!/usr/bin/perl -w
use strict;
use integer;
use DBI;
use Safe;
our $dbh = DBI-connect(dbi:Pg:dbname=mydb, 'username', 'password');
$dbh-{AutoCommit} = 0; # Don't commit changes to the db until we're done
$dbh-{RaiseError} = 1; # die if there's a db error (thereby preventing 
us from writing partial data to the db)

our $safe = new Safe;
$safe-share_from('main',['$dbh']);
our $safecode = END;
\$dbh-prepare(SELECT * FROM mytable);
END
$safe-reval($safecode,1);
die $@ if $@;
exit;

Thanks in advance,

Peter






Re: What to do with UTF-8 data?

2003-09-10 Thread Peter J. Holzer
On 2003-09-10 10:40:29 +0100, Steve Hay wrote:
 But the question was: How can I arrange for such conversions to be 
 performed automatically by DBI whenever it receives or returns data?

You could subclass DBI or DBD::MySQL and replace all methods with
wrappers which perform the conversion. I'm not convinced that this is a
good idea, though. I'd rather try to the the conversion in application
specific layer above DBI.

hp

-- 
   _  | Peter J. Holzer  | Unser Universum wäre betrüblich
|_|_) | Sysadmin WSR / LUGA  | unbedeutend, hätte es nicht jeder
| |   | [EMAIL PROTECTED]| Generation neue Probleme bereit.
__/   | http://www.hjp.at/   |  -- Seneca, naturales quaestiones


pgp0.pgp
Description: PGP signature


Re: What to do with UTF-8 data?

2003-09-10 Thread Peter J. Holzer
On 2003-09-10 12:14:25 +0200, Bart Lateur wrote:
 On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote:
 
 But the question was: How can I arrange for such conversions to be 
 performed automatically by DBI whenever it receives or returns data?
 
 Well, there are two options... either does the dtabase somewhere stores
 the flag indicating that some string is in UTF8, or you have to add that
 information yourself. For the latter, I don't know if it'll actually
 work, but it seems like an appropriate way to do it: add a BOM marker
 at the start of the string.

That doesn't help Steve. He already knows that the data is UTF-8, he
doesn't need the marker to distinguish between UTF-8 and Latin-X. 

His problem is that when he selects from the database, he has to
manually convert from utf-8 to perl-internal:

while (my ($foo, $bar) = $sth-fetchrow_array()) {
$foo = decode_utf8($foo);
$bar = decode_utf8($bar);

# do something with foo and bar
}

and he wants to happen the decode step automatically. 

Since MySQL 4.1 does support UTF-8: Is it possible to upgrade from MySQL
3.23 to 4.1? 

hp

-- 
   _  | Peter J. Holzer  | Unser Universum wäre betrüblich
|_|_) | Sysadmin WSR / LUGA  | unbedeutend, hätte es nicht jeder
| |   | [EMAIL PROTECTED]| Generation neue Probleme bereit.
__/   | http://www.hjp.at/   |  -- Seneca, naturales quaestiones


pgp0.pgp
Description: PGP signature


Re: What to do with UTF-8 data?

2003-09-10 Thread Tim Bunce
On Wed, Sep 10, 2003 at 11:42:23AM +0100, Steve Hay wrote:
 Bart Lateur wrote:
 
 On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote:
 
 But the question was: How can I arrange for such conversions to be 
 performed automatically by DBI whenever it receives or returns data?
 
 Well, there are two options... either does the dtabase somewhere stores
 the flag indicating that some string is in UTF8, or you have to add that
 information yourself. For the latter, I don't know if it'll actually
 work, but it seems like an appropriate way to do it: add a BOM marker
 at the start of the string.

 I don't think the MySQL 3.x stores any flag to indicate that a string is 
 UTF8, and even if it did I'm not aware of anything in DBI or DBD-mysql 
 that would make use of it, e.g. to decode data flagged in such a way 
 into Perl's internal format.
 
 Adding a BOM myself to the string seems to have problems of its own (see 
 http://www.unicode.org/unicode/faq/utf_bom.html#27), and again I'm not 
 aware of DBI / DBD-mysql having anything in them that would make use of 
 such a BOM.  Please correct me if I'm wrong - that could be just the 
 sort of thing that I'm looking for here.

Basically it should be the job of the drivers to set the uft8 flag on
data being retrieved if it is utf8. I believe that the new mysql v4.1
protocol does provide information about the characterset of each colum.
DBD::mysql can use that.

For people stuck with older versions of mysql, a driver private
option could be used to indicate that all char fields are utf8,
or have some way of indicating that per-column, such as

$sth-bind_col(1, undef, { mysql_charset = 'utf8' });

Tim.


Re: [PATCH] Three DBD::Oracle Makefile.PL bugs on HP-UX/Oracle 9i

2003-09-10 Thread Tim Bunce
On Wed, Sep 10, 2003 at 12:18:55PM +0200, Jean-Louis Leroy wrote:
 I have encountered three problems with the latest (1.14) release of
 DBD::Oracle on HP-UX 11 with Oracle 9.2 and perl 5.6.1.

Thanks. I'd be grateful if you could rework the patch over my
current development version of Makefile.PL, which I've attached.

Tim.


Makefile.PL.gz
Description: application/gunzip


Re: What to do with UTF-8 data?

2003-09-10 Thread Steve Hay
Peter J. Holzer wrote:

Since MySQL 4.1 does support UTF-8: Is it possible to upgrade from MySQL
3.23 to 4.1? 

I might be able to upgrade to 4.0 (I fact, I really ought to...), but I 
don't fancy 4.1 just yet -- it's still an alpha release :-(

- Steve



Re: What to do with UTF-8 data?

2003-09-10 Thread Steve Hay
Peter J. Holzer wrote:

On 2003-09-10 10:40:29 +0100, Steve Hay wrote:
 

But the question was: How can I arrange for such conversions to be 
performed automatically by DBI whenever it receives or returns data?
   

You could subclass DBI or DBD::MySQL and replace all methods with
wrappers which perform the conversion. I'm not convinced that this is a
good idea, though. I'd rather try to the the conversion in application
specific layer above DBI.
Sub-classing DBI wouldn't help me since I'm using Class::DBI which won't 
know to use my DBI sub-class.

Sub-classing DBD-mysql might be more feasible, though, since I specify 
what driver to use.

Actually, I had hoped to find some appropriate hooks in Class::DBI, but 
I don't see any.  There are select and before_set triggers, but the 
latter are per-column, which would be a real pain to set up.

- Steve



Re: What to do with UTF-8 data?

2003-09-10 Thread Steve Hay
Tim Bunce wrote:

On Wed, Sep 10, 2003 at 11:42:23AM +0100, Steve Hay wrote:
 

Bart Lateur wrote:

   

On Wed, 10 Sep 2003 10:40:29 +0100, Steve Hay wrote:

 

But the question was: How can I arrange for such conversions to be 
performed automatically by DBI whenever it receives or returns data?
   

Well, there are two options... either does the dtabase somewhere stores
the flag indicating that some string is in UTF8, or you have to add that
information yourself. For the latter, I don't know if it'll actually
work, but it seems like an appropriate way to do it: add a BOM marker
at the start of the string.
 

I don't think the MySQL 3.x stores any flag to indicate that a string is 
UTF8, and even if it did I'm not aware of anything in DBI or DBD-mysql 
that would make use of it, e.g. to decode data flagged in such a way 
into Perl's internal format.

Adding a BOM myself to the string seems to have problems of its own (see 
http://www.unicode.org/unicode/faq/utf_bom.html#27), and again I'm not 
aware of DBI / DBD-mysql having anything in them that would make use of 
such a BOM.  Please correct me if I'm wrong - that could be just the 
sort of thing that I'm looking for here.
   

Basically it should be the job of the drivers to set the uft8 flag on
data being retrieved if it is utf8. I believe that the new mysql v4.1
protocol does provide information about the characterset of each colum.
DBD::mysql can use that.
Ah.  In that case, I should get onto the DBD-mysql people to look for 
assistance.  I was thinking that DBI itself would be adding some kind of 
UTF-8 support.

For people stuck with older versions of mysql, a driver private
option could be used to indicate that all char fields are utf8,
or have some way of indicating that per-column, such as
	$sth-bind_col(1, undef, { mysql_charset = 'utf8' });

OK, I'll pass this suggestion on to the DBD-mysql maintainer(s).

Thanks,
- Steve


Trying to reach sybperl list, no luck

2003-09-10 Thread Matthew . Persico
Sorry for the intrusion, but can someone familar with sybase please reply? The
emails to the sybperl list and owner keep bouncing back.

Thanks

--
Subject: Compiling Sybperl 2.15 and Sybase 12.5, 64 bit client

There is a 50-50 chance that this is my problem, i.e, a busted installation, but
I'll throw this out there just in case.

1) While compiling Sybperl 2.15 and Sybase 12.5, 64 bit client, the config.pl
script couldn't determine the CTLib Client version because it is performs as
trings command on libct.a and we only had a libct.so file. If that's a busted
installation, never mind. If not, then maybe config.pl should read

my $version = `strings $lib/libct.a $lib/libct.so`;


2) Installation couldn't find libbrk because is it named libbrk_64. Is that a
busted installation (no libbrk link or no explicit file with that name), or a
bad configuration check?

Note that DBD-Sybase compiled and tested with no complaints. Weird, since they
both use CTLib, yes?

Disclosure Note: I've bcc'ed the collegue I am working with so he's in the loop
but does not have his email harvsted.
--
Matthew Persico
Vice President
Lazard
30 Rockefeller Plaza
New York, NY 10020, USA
Phone Number: 212 632 8255
Fax Number: 212 332 5904
Email: [EMAIL PROTECTED]














Help: Require journal references to software designed to use DBI similar

2003-09-10 Thread Ron Savage
Hi Folks

I've been asked by my university's committee to supply such journal references for a 
master's proposal.

If you have any ideas, please forward privately.

TIA.
--
Cheers
Ron Savage, [EMAIL PROTECTED] on 11/09/2003
http://savage.net.au/index.html