Re: UTF-8 flags (again)

2004-09-08 Thread David Wheeler
On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote:
I was thinking of doing at least 0 on that list for DBI 1.44.
I'd especially like to do
  $dbh-{SetUTF8} = 2;
And be done with it.
I'll take a look. Patches welcome, of course!
Hey, if I knew any C...I can paste these from Encode.xs, at least:
_utf8_on(sv)
SV *sv
CODE:
{
if (SvPOK(sv)) {
SV *rsv = newSViv(SvUTF8(sv));
RETVAL = rsv;
SvUTF8_on(sv);
} else {
RETVAL = PL_sv_undef;
}
}
OUTPUT:
RETVAL
SV *
_utf8_off(sv)
SV *sv
CODE:
{
if (SvPOK(sv)) {
SV *rsv = newSViv(SvUTF8(sv));
RETVAL = rsv;
SvUTF8_off(sv);
} else {
RETVAL = PL_sv_undef;
}
}
OUTPUT:
RETVAL
PS: I assume that if I do:
  my $data = $utf8_data;
where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is
that correct?
Yes.
Great, I figured as much. Thanks!
David


Re: UTF-8 flags (again)

2004-09-08 Thread Tim Bunce
On Wed, Sep 08, 2004 at 09:15:36AM -0700, David Wheeler wrote:
 On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote:
 
 I was thinking of doing at least 0 on that list for DBI 1.44.
 
 I'd especially like to do
 
   $dbh-{SetUTF8} = 2;
 
 And be done with it.
 
 I'll take a look. Patches welcome, of course!
 
 Hey, if I knew any C...I can paste these from Encode.xs, at least:

That's the trivial bit :) The fiddly bit is handling the SetUTF8 attribute
(and corresponding bit flags to make it fast enough).

But thanks anyway :)

Tim.


Re: UTF-8 flags (again)

2004-09-08 Thread David Wheeler
On Sep 8, 2004, at 12:58 PM, Tim Bunce wrote:
That's the trivial bit :) The fiddly bit is handling the SetUTF8 
attribute
(and corresponding bit flags to make it fast enough).

But thanks anyway :)
Ah well, sorry I can't be more help...
Regards,
David


Re: UTF-8 flags (again)

2004-08-08 Thread Tim Bunce
On Sun, Aug 08, 2004 at 06:15:39PM +0100, Matt Sergeant wrote:
 On 8 Aug 2004, at 17:35, David Wheeler wrote:
 
 On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote:
 
 i.e. for every fetch call, you need to do:
 
   SvUTF8_off(AvARRAY(av)[i]);
 
 Now, people using your DBD can decide to upgrade the variable if they 
 wish to, but most people who don't need to will be unaffected.

Or, more generally, explicitly call either SvUTF8_off or SvUTF8_on as
appropriate, but be sure to call one of them for each field.

Meanwhile I think it would be wise for the DBI to explicitly do SvUTF8_off
on the elements of the internal row buffer before each row is fetched.
That would avoid the utf8 flag 'leaking' from one row to the next.
I'll do that for DBI 1.44.

 I think that this is fine as long as there's an easy way to upgrade 
 the variable. I could use Encode::_utf8_on(), but that seems like more 
 overhead than is necessary unless I've loaded Encode for some other 
 use already. Perhaps there could be a module or even a DBI method that 
 does the equivalent?
 
   # Psudeocode;
   sub utf8_on { SvUTF8_on($_[0]) }
 
 Certainly fairly easy to export that from the DBI.

I'll do that (and utf8_off) for DBI 1.44.

 Tim and I talked about long term plans for this, where the user might 
 specify in advance which columns he'd like UTF-8 turned on for, or some 
 (I thought horrible) heuristic method where the DBD automagically 
 decides to turn on the flag if it detects data that it can turn into 
 UTF-8 - but that sounds like a world of pain to me.

Sure, but some apps/drivers may need the choice.

I'm thinking in terms of something like $sth-{SetUTF8}-[$index] = $mode

0: Force SvUTF8_off regardless
undef: Do nothing (leave it up to the driver)
1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off
2: Force SvUTF8_on regardless

(with a way to set it via bind_col as well)

And perhaps a $dbh-{SetUTF8} = $mode; to provide a default.

Umm, it's just dawned on me that the persistance of the utf8 flag
across sv_set functions means I could implement all but 1 in DBI v1.
(Option 1 requires looking at the value that's just been set and
that not simple/efficient for DBI v1.)

 Better IMHO would be an extension to bind_col - it should be trivial to 
 add an attribute in there. The downside being that not many people use 
 bind_col.

Those that need to control utf8 settings need to make code changes anyway.

Tim.