Re: APR::Table lost UTF8 flag

Boris Zentner Fri, 24 Sep 2004 00:16:42 -0700

Hi,

Am 24.09.2004 um 02:37 schrieb Stas Bekman:

Boris Zentner wrote:
Hi,
Am 24.09.2004 um 01:50 schrieb Stas Bekman:
Boris Zentner wrote:
Hi, Whenever I store something into a APR::Table I get different data back. Even with a recent mod_perl I use
APR::Table stores plain strings. It doesn't know what type of a string it has stored. If you store utf8 data you need to tell perl that the string is utf8, by calling Encode::encode. It's exactly the same as getting some UTF8 data from the client, you need explicitly decode it to make Perl see it as UTF8 data.
I do understand that the data has no utf flag if I got them, but if I store data I get it wrong back. Also the utf8 flag is not for the whole table, it is for the string. In the example I have already a string in utf8 data ($utf8). The problem is that this string lose the information and I do not know if it is a plain string or if it is one, that lost the information. If I have to record somewhere else the utf8 flags, why should I use APR::Table at all? I can store the data better in a perl array or hash, that did not loose the data. In my case Im not sure if I know which data is utf and which is not and double encoded data is even wrong. In my real case, I get the data from libapreq2 in raw format and my desire is to convert some of them inplace to utf8. Then the rest of my application can rely on libapreq's functions. But if the flag is lost, my output is wrong. If I have to copy the data from libapreq and track them myself, why should I use libapreq2 or any other module, that use APR::Table? At the end a little quiz, here are four numbers, two are octal the others are decimal, witch are the octals? ;-) 2 55 26 17
I don't think you can always guess correctly, though I think Encode has some functions that try doing that. Why do you use APR::Table and not pnotes or some perl hash? It's possible that there is a problem and we could handle the in and out of the table in some better way, but your data will need to be converted back and forth every time you need it. Isn't that a waste?

No, it is a big win. I do not want to convert back and forward, I just want get back what I put in. I convert it one time, then the data can be used as often as you want without any conversion.

I use the CGI like params methods from libapreq2, they use APR::Table. In the next step the data gets converted once:

 my $apr_table = $apr->params;
 $_ = Encode::decode( 'iso-8859-1', $_ ) for ( values %$apr_table );

after this simplified step, all, expect binary data is is in utf8 and need no more conversion. Now this $apr object is passed out of control to a/the user module.

$utf_string .= $apr->param('parameter'); # at this point parameter is upgraded to utf a second time from perl and the result is wrong

so my options are

- do not use libapr2 anymore - force the user to know which data is utf8 - subclass libapr and use a CGI object to hold the data. This let me think about why using libapr at all, if it is slower than CGI since I need it to get the flags correct.

Frankly, seeing the nightmare happening at p5p, I don't think we even want to venture into trying to handle UTF-8 internally on behalf of the user. I hope that we could stay out of it.

I know about this quiet well, I live in europe wit some chars outside usascii. I have a regular headache from the auto-conversion of perl. But perl does it right most of the time now-days, far far more errors are the loose of the utf flag.

--
Boris


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: APR::Table lost UTF8 flag

Reply via email to