Now I have some breakfast in me, to be clear it appears that UTF-8 byte stream is being interpreted as Latin1 and then converted to unicode...
Marc On 16/03/2009, at 6:25 AM, Marc Boschma wrote: > excuse the typo: > On 16/03/2009, at 6:23 AM, Marc Boschma wrote: > >> Just looking at http://jeppesn.dk/utf-8.html , I found the >> following lines: >> Character Latin1 Unicode UTF-8 Latin1 >> code interpr. >> ç E7 00 E7 C3 A7 ç >> à is C38C, § is C2 A7 > à is C383 >> So it appears that somewhere there is a translation to Latin 1 >> going on. >> Hopefully that helps some what... >> Regards, >> Marc >> >> On 16/03/2009, at 1:08 AM, Derek Chen-Becker wrote: >> >>> This is really interesting. I've narrowed it down to something on >>> form submission. The database shows gibberish, too, and if I >>> manually enter the correct value in the DB it works fine on >>> display. If I print the UTF-8 byte values of the string I get from >>> the browser for my description when I submit a cedilla (ç), I see: >>> >>> INFO - Submitted desc bytes = c3 83 c2 a7 >>> >>> A cedilla is c3 a7 in UTF-8, so I'm not sure where the "83 c2" is >>> coming from. I googled around a bit and I found other people >>> having the same issue but it wasn't clear in those posts what the >>> cause was. I did a packet capture just as a sanity check, and >>> here's what I got: >>> >>> POST / HTTP/1.1 >>> ... headers here ... >>> >>> F956759623045OFT >>> = >>> true >>> &F956759623046BU5 >>> =1&F9567596230472LR=2009%2F03%2F18&F956759623048IZR= >>> %C3%A7&F956759623049S3E=3&F956759623050E25=test >>> >>> As you can see, the (url encoded) value of the F956759623048IZR >>> field (description) is %C3%A7, so something isn't properly >>> converting that. Helpers.urlDecode seems to be working properly: >>> >>> scala> Helpers.urlDecode("F956759623048IZR=%C3%A7") >>> res1: java.lang.String = F956759623048IZR=ç >>> >>> So I have no idea where this is coming from. All I know is that >>> between the actual POST and when my submit function is called, >>> something is tweaking the string. I'm going to dig some more, but >>> I wanted to post this in case it triggers any thoughts out there. >>> >>> Derek >>> >>> PS - I just found this: >>> >>> http://mail-archives.apache.org/mod_mbox/struts-dev/200604.mbox/%3c3769847.1145910729808.javamail.j...@brutus%3e >>> >>> May be related? >>> >>> On Sun, Mar 15, 2009 at 7:26 AM, Derek Chen-Becker <dchenbec...@gmail.com >>> > wrote: >>> OK, I can replicate this in our PocketChange app (also going >>> against a PostgreSQL DB). Let me dig a bit. >>> >>> Derek >>> >>> >>> On Sun, Mar 15, 2009 at 3:58 AM, Charles F. Munat <c...@munat.com> >>> wrote: >>> >>> This might help, but I don't think I was clear. I have an online >>> form. >>> My clients enter text into it. Their text has characters like a c >>> with a >>> cedilla. That text gets saved into a PostgreSQL database (UTF-8) >>> varchar >>> field via JPA/Hibernate. >>> >>> Then I pull it back out and dump it into a template, and it comes >>> out >>> gibberish. If I try using ç instead, I get &cedil; back >>> out. >>> >>> Here is what I have: >>> >>> "name" -> SHtml.text(thing.name, thing.name = _, ("size", "40")) >>> >>> If I enter "cachaça" in the field, I get cachaça back out. The >>> weird >>> thing is that sometimes when I copy and paste text from another >>> document >>> into the form, it works. But if I use the keyboard, it fails every >>> time. >>> >>> I'll play around with this. Thanks. >>> >>> Chas. >>> >>> Derek Chen-Becker wrote: >>> > Oops, forgot scala.xml.Unparsed, too: >>> > >>> > scala> val m = <span>a{ scala.xml.Unparsed("ç") }b</span> >>> > m: scala.xml.Elem = <span>açb</span> >>> > >>> > That one might be what you're looking for. >>> > >>> > Derek >>> > >>> > On Sat, Mar 14, 2009 at 9:57 PM, Derek Chen-Becker >>> > <dchenbec...@gmail.com <mailto:dchenbec...@gmail.com>> wrote: >>> > >>> > I think it depends on how you're embedding them in the XML: >>> > >>> > scala> val m = <span>açb</span> >>> > m: scala.xml.Elem = <span>açb</span> >>> > >>> > scala> val m = <span>a{"ç"}b</span> >>> > m: scala.xml.Elem = <span>a&ccedil;b</span> >>> > >>> > scala> val m = <span>a{"ç"}b</span> >>> > m: scala.xml.Elem = <span>açb</span> >>> > >>> > That last one was input using dead keys (alt+,) on my linux >>> (USA >>> > International with dead keys) layout. Let me know if this >>> doesn't >>> > help; if not, could you send the code/template that's having >>> issues? >>> > >>> > Derek >>> > >>> > >>> > On Sat, Mar 14, 2009 at 6:36 PM, Charles F. Munat <c...@munat.com >>> > <mailto:c...@munat.com>> wrote: >>> > >>> > >>> > I have a site that uses a lot of "special" characters (a >>> remarkably >>> > biased description, since there is nothing "special" >>> about accented >>> > characters to the people who use them daily). In >>> particular, I >>> > need the >>> > c with cedilla and the n with the tilde. >>> > >>> > These characters are being input to a database (UTF-8) >>> via an online >>> > form, then spit back out onto the page. >>> > >>> > It's a fucking disaster. Apparently, everything goes >>> through the xml >>> > parser, which is great, except when I try to enter these >>> as entity >>> > references, such as ç, the parser changes & to >>> & and >>> > I get >>> > the literal ç back out again. >>> > >>> > When I type ç using the keyboard (or copy and paste it >>> from a >>> > page or a >>> > text editor), I get gibberish. >>> > >>> > Anyone know the trick to getting around this? I need >>> everything >>> > from e >>> > acute to e grave to trademark and registered trademark >>> symbols, >>> > and I >>> > need to enter them this way. >>> > >>> > Thanks for any help. If I can get this to work, I'll add >>> an >>> > explanation >>> > to the wiki. >>> > >>> > Chas. >>> > >>> > >>> > >>> > >>> > >>> > > >>> >>> >>> >>> >>> >>> >>> >> >> >> >> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Lift" group. To post to this group, send email to liftweb@googlegroups.com To unsubscribe from this group, send email to liftweb+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/liftweb?hl=en -~----------~----~----~----~------~----~------~--~---