Am 2010-10-19 um 23:45 schrieb Paul McNett:
>> Knowing Unicode is essential, no excuses. (Even if you're probably a
>> far better programmer than me in every other regard.)
Sorry for being rude.
I struggled with encodings myself too often and am glad if I don't
have to cope with low-level database stuff...
(e.g. I had to handle texts from different authors on different OSs
with different browsers that were entered in a PHP web interface,
mostly copy-pasted from MS Word, i.e. including strange formatting,
even in plain text input elements; and even if the database was set to
UTF-8, there was a lot of crap in different encodings in there; I got
the data via PHP-UTF-encoded JSON, which just didn't work with
characters out of Latin-1, some Ӓ entities etc.; PHP sucks...)
> And I never remember if we want
> to decode() or encode(), because they both seem like code to me.
You can decode a byte string to unicode.
You can encode a unicode as byte string.
Remember, a unicode object is not an UTF-8 encoded string!
In a unicode object "Ä" is one character, in UTF-8 it's two bytes.
(Internally a unicode object is handled in UTF-16, AFAIK, but that
doesn't matter.
In Python 3 unicode objects replace the strings, and I read that would
make handling low-level encoding stuff like in HTTP libraries and
database interfaces really hard.)
> So now I have the app not crashing with that data field, however I
> don't get child
> opening records for that order and instead get the console output:
>
> {{{
> 2010-10-19 14:18:11 - ERROR - Error fetching records: Could not
> decode to UTF-8
> column 'special_instructions' with text 'Route for Crank 6 ½”
> to 12” x 2 ½” high
> x ½” deep, Handle 13 ½” to 15” x 1” wide x ½”
> deep'
> }}}
I don't know if that's your case, but it might be that the 2-4 bytes
that make up some UTF-8 character are itself accidentally encoded,
e.g. if UTF-8 code was handled as Latin-1.
Hard to explain, hard to resolve...
I mean: If I have a text in UTF-8 and read it as Latin-1, it contains
those sets of "strange characters". If I save that file as UTF-8 and
read it again as UTF-8, I get "strange characters" back, of course.
(And if you read it in Latin-1, there are even more "strange
characters". And if you save that as UTF-8...) Been there, cursed
that. And try to revert that, esp. if the text was edited inbetween
and some characters are correctly encoded...
(I can't remember how I solved it - after string.decode('utf-8') I
have an unicode object and can't decode that again...)
Ok, since you got the right characters on your Mac, as you said in the
other mail, I don't think that's your problem.
And with Dabo you can't get the typical web form problems.
> Now at least dCursorMixin.execute() seems to be working correctly,
> but somewhere on
> saving to the database (which expects utf-8) we are failing to
> decode from whatever
> encoding the text was in and to encode into utf-8 for saving to the
> database. Else,
> why were non-UTF8 chars saved?
>
> Incidentally, Jeff's suggestion of setting the sqlite connection's
> con.text_factory =
> str makes the requery() work without error, only a warning that it
> fell back to
> latin-1 because utf-8 didn't work. I need to study more of our code
> to work out what
> the difference is, but because of this I'm wondering about always
> setting
> text_factory to str since it seems to do the right thing for me.
Hm, I think that means that you are saving UTF-8 bytes in Latin-1
encoding now.
That will probably work as long as you always go the same way and not
e.g. try to work with the sqlite3 CLI or SQLiteMan or the like.
Did you check the bytes of your faulty strings if they're correct
Unicode code points? Maybe there are some invisible characters in there?
It doesn't help if I copy the strings from your mail, of course.
Greetlings from Lake Constance!
Hraban
---
http://www.fiee.net
https://www.cacert.org (I'm an assurer)
_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/dabo-dev
Searchable Archives: http://leafe.com/archives/search/dabo-dev
This message:
http://leafe.com/archives/byMID/[email protected]