Hi,
Am 26.09.2004 um 02:09 schrieb Stas Bekman:
Boris Zentner wrote: [...]May be expicitly setting the utf8 flag will work as a temp solution for you? If all your data is stored in utf8, then it will perfectly fine.No, this will not work in my case.
Below is the revision of your test program, that does what you want. (If you didn't know, you can use APR:: classes outside of Apache, so you can run this program from the command line):This aproach will not work for me, since the get part is out of my control. And I do not know which of the vars in the table are already utf8 at get time.
My current cure is to copy the table to a object, that works like APR::Table. With the wast of copy all the data even if it is not used.
Understood.
Another idea to get the encoding right is:
All data in the table get the right encoding ( some data is utf some not ).
You mean, what I suggested?
No,
<quote>
It's possible that if APR::Table sees a UTF8 flag on set(), it should encode it back to the utf8 string before storing it, since as you've pointed out there is no way to restore the UTF8 flag back. But also as you've pointed out, it's a huge waste of resources. since you will need to decoded it again, when you get() it and before you can use it.
</quote>
No, I disagree, there is no waste of resource. If the utf flag is not set, APR::Table has not more to do than it does currently. And if the flag is set it is desired to do the conversion.
I think it makes sense, but as I've shown in the previous example, the setting of the utf8 flag may do the trick, in which case they won't
Sure, setting the flag makes sense, but it is not usable if some data is in utf and some is not.
- The user has to store the flag elsewhere just to know where to set the flag.
- The user has to write extra code to set the flag.
Thats a waste of resource and cries for errors.
want to have the cycles wasted to convert the data back and force. Since APR::Table will have to do that and then users will need to decode it back, I think we should defer that to users. They should handle a utf8 string to APR::Table, and free APR::Table from doing work, that will be a waste for many.
I think this is completely wrong, since perl converts data to utf the result is wrong whenever a utf string is used together with a utf string with lost information. The reason is that the string with the lost information is upgraded to utf a second time (on the fly).
Require all operations on table objects to have a cleared utf8 flag even if the data is in utf8.
You mean, when creating and returning a perl variable to the user? It never has utf8 at the moment, since we don't try to set it. Or did you mean something else?
Yes, something else. The idea is to let newer perl act like 5.6.0 since this is the last release where all data is binary and APR::Table and friends work correct. Sure, I know APR::Table is not alone with this problem, HTML::Parser, DBD::SQLite, DBD::Pg and TT.
I mean if I can get any data provider, to clear the utf8 flag ( or ignore it ) then perl did not try to upgrade and the result is as expected ( I know length, chr, ord, regex and so on did not work as correct then ). But this works also only if I never make a mistake. If any data is back with the utf8 flag all results are broken.
Note that this approach is not currently used, nor it is a desired solution, it is just possible in my case and it works with any datasource.
In that case, I can use APR::TABLE and all other ill-formed data providers. But if any datasource like databases or scalars have the utf8 flag set all data is busted again. So I have to fight against perls auto conversion ( use bytes is not helpfull since I do not controll all source ). And I fear to lose somewhere.
I don't know. May be you can convince Joe that Apreq's APR::Table subclass could handle that utf8'ness.
That would be best for all users with charsets outside of us-ascii.
In any case APR::Table is not designed for anything but storing and returning strings. The only problem that you are correct about is that it silently accepts PVs with UTF8 flag on and then returns the new variable, w/o it. For example we could croak if we see a perl PV with UTF8 on. But then as mentioned before for some people it'll be fine for them to have the utf8 flag set manually on retrieval.
That is no option, since the user has to know when to set the flag, basically store the flag elsewhere in the code or program flow. The fact that we agree is from my understanding that the information is lost and can not be regenerated later. I can not copy, rename or move anything in the table without copying/modify the other datastore for the flag.
At the moment I can't see any better solution, but leaving it as it is. Unless someone convinces me otherwise.
I try hard, my main argument is APR::Table can only store bytes < 128 reliable for any perl > 5.6.0.
-- Boris
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]