Just looking at http://jeppesn.dk/utf-8.html , I found the following
lines:
Character Latin1 Unicode UTF-8 Latin1
code interpr.
ç E7 00 E7 C3 A7 ç
à is C38C, § is C2 A7
So it appears that somewhere there is a translation to Latin 1 going on.
Hopefully that helps some what...
Regards,
Marc
On 16/03/2009, at 1:08 AM, Derek Chen-Becker wrote:
> This is really interesting. I've narrowed it down to something on
> form submission. The database shows gibberish, too, and if I
> manually enter the correct value in the DB it works fine on display.
> If I print the UTF-8 byte values of the string I get from the
> browser for my description when I submit a cedilla (ç), I see:
>
> INFO - Submitted desc bytes = c3 83 c2 a7
>
> A cedilla is c3 a7 in UTF-8, so I'm not sure where the "83 c2" is
> coming from. I googled around a bit and I found other people having
> the same issue but it wasn't clear in those posts what the cause
> was. I did a packet capture just as a sanity check, and here's what
> I got:
>
> POST / HTTP/1.1
> ... headers here ...
>
> F956759623045OFT
> =
> true
> &F956759623046BU5=1&F9567596230472LR=2009%2F03%2F18&F956759623048IZR=
> %C3%A7&F956759623049S3E=3&F956759623050E25=test
>
> As you can see, the (url encoded) value of the F956759623048IZR
> field (description) is %C3%A7, so something isn't properly
> converting that. Helpers.urlDecode seems to be working properly:
>
> scala> Helpers.urlDecode("F956759623048IZR=%C3%A7")
> res1: java.lang.String = F956759623048IZR=ç
>
> So I have no idea where this is coming from. All I know is that
> between the actual POST and when my submit function is called,
> something is tweaking the string. I'm going to dig some more, but I
> wanted to post this in case it triggers any thoughts out there.
>
> Derek
>
> PS - I just found this:
>
> http://mail-archives.apache.org/mod_mbox/struts-dev/200604.mbox/%3c3769847.1145910729808.javamail.j...@brutus%3e
>
> May be related?
>
> On Sun, Mar 15, 2009 at 7:26 AM, Derek Chen-Becker <[email protected]
> > wrote:
> OK, I can replicate this in our PocketChange app (also going against
> a PostgreSQL DB). Let me dig a bit.
>
> Derek
>
>
> On Sun, Mar 15, 2009 at 3:58 AM, Charles F. Munat <[email protected]>
> wrote:
>
> This might help, but I don't think I was clear. I have an online form.
> My clients enter text into it. Their text has characters like a c
> with a
> cedilla. That text gets saved into a PostgreSQL database (UTF-8)
> varchar
> field via JPA/Hibernate.
>
> Then I pull it back out and dump it into a template, and it comes out
> gibberish. If I try using ç instead, I get &cedil; back
> out.
>
> Here is what I have:
>
> "name" -> SHtml.text(thing.name, thing.name = _, ("size", "40"))
>
> If I enter "cachaça" in the field, I get cachaça back out. The weird
> thing is that sometimes when I copy and paste text from another
> document
> into the form, it works. But if I use the keyboard, it fails every
> time.
>
> I'll play around with this. Thanks.
>
> Chas.
>
> Derek Chen-Becker wrote:
> > Oops, forgot scala.xml.Unparsed, too:
> >
> > scala> val m = <span>a{ scala.xml.Unparsed("ç") }b</span>
> > m: scala.xml.Elem = <span>açb</span>
> >
> > That one might be what you're looking for.
> >
> > Derek
> >
> > On Sat, Mar 14, 2009 at 9:57 PM, Derek Chen-Becker
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> > I think it depends on how you're embedding them in the XML:
> >
> > scala> val m = <span>açb</span>
> > m: scala.xml.Elem = <span>açb</span>
> >
> > scala> val m = <span>a{"ç"}b</span>
> > m: scala.xml.Elem = <span>a&ccedil;b</span>
> >
> > scala> val m = <span>a{"ç"}b</span>
> > m: scala.xml.Elem = <span>açb</span>
> >
> > That last one was input using dead keys (alt+,) on my linux (USA
> > International with dead keys) layout. Let me know if this
> doesn't
> > help; if not, could you send the code/template that's having
> issues?
> >
> > Derek
> >
> >
> > On Sat, Mar 14, 2009 at 6:36 PM, Charles F. Munat
> <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> > I have a site that uses a lot of "special" characters (a
> remarkably
> > biased description, since there is nothing "special" about
> accented
> > characters to the people who use them daily). In
> particular, I
> > need the
> > c with cedilla and the n with the tilde.
> >
> > These characters are being input to a database (UTF-8) via
> an online
> > form, then spit back out onto the page.
> >
> > It's a fucking disaster. Apparently, everything goes
> through the xml
> > parser, which is great, except when I try to enter these
> as entity
> > references, such as ç, the parser changes & to
> & and
> > I get
> > the literal ç back out again.
> >
> > When I type ç using the keyboard (or copy and paste it
> from a
> > page or a
> > text editor), I get gibberish.
> >
> > Anyone know the trick to getting around this? I need
> everything
> > from e
> > acute to e grave to trademark and registered trademark
> symbols,
> > and I
> > need to enter them this way.
> >
> > Thanks for any help. If I can get this to work, I'll add an
> > explanation
> > to the wiki.
> >
> > Chas.
> >
> >
> >
> >
> >
> > >
>
>
>
>
>
> >
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Lift" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/liftweb?hl=en
-~----------~----~----~----~------~----~------~--~---