The scala XML syntax automatically converts any "&" in embedded strings to
"&". You have to put the string inside a scala.xml.Unparsed node to
prevent that from happening.

Derek

On Sun, Mar 15, 2009 at 1:59 PM, Charles F. Munat <[email protected]> wrote:

>
> That was my thinking. It doesn't explain why &ccedil; in gets changed to
> &amp;ccedil;, but it explains why ç in becomes ç out. So I think there
> are two separate issues here.
>
> The ç can be created in two different ways in UTF-8. One is the single
> "c with a cedilla" character. The second is a c character followed by a
> cedilla character. I am not sure how UTF-8 indicates that these two
> characters should be displayed as one. Neither am I sure that this has
> anything to do with the problem. Maybe it is simply that something is
> assuming Latin1 input even though the input is UTF-8.
>
> It is definitely on the front end, because it is stored in the database
> as ç.
>
> When I use &ccedil; instead, the problem is that it is *not* converted
> to ç as it goes into the database, and then on the way out the XML
> interpreter does not recognize it as a character entity reference and so
> converts the & to &amp;.
>
> Chas.
>
> Marc Boschma wrote:
> > Now I have some breakfast in me, to be clear it appears that UTF-8 byte
> > stream is being interpreted as Latin1 and then converted to unicode...
> >
> > Marc
> > On 16/03/2009, at 6:25 AM, Marc Boschma wrote:
> >
> >> excuse the typo:
> >> On 16/03/2009, at 6:23 AM, Marc Boschma wrote:
> >>
> >>> Just looking at http://jeppesn.dk/utf-8.html , I found the following
> >>> lines:
> >>> Character   Latin1  Unicode         UTF-8   Latin1
> >>>                     code
>  interpr.
> >>> ç                   E7              00 E7           C3 A7   ç
> >>> Ã is C38C, § is C2 A7
> >> Ã is C383
> >>> So it appears that somewhere there is a translation to Latin 1 going
> on.
> >>> Hopefully that helps some what...
> >>> Regards,
> >>> Marc
> >>>
> >>> On 16/03/2009, at 1:08 AM, Derek Chen-Becker wrote:
> >>>
> >>>> This is really interesting. I've narrowed it down to something on
> >>>> form submission. The database shows gibberish, too, and if I
> >>>> manually enter the correct value in the DB it works fine on display.
> >>>> If I print the UTF-8 byte values of the string I get from the
> >>>> browser for my description when I submit a cedilla (ç), I see:
> >>>>
> >>>> INFO - Submitted desc bytes = c3 83 c2 a7
> >>>>
> >>>> A cedilla is c3 a7 in UTF-8, so I'm not sure where the "83 c2" is
> >>>> coming from. I googled around a bit and I found other people having
> >>>> the same issue but it wasn't clear in those posts what the cause
> >>>> was. I did a packet capture just as a sanity check, and here's what
> >>>> I got:
> >>>>
> >>>> POST / HTTP/1.1
> >>>> ... headers here ...
> >>>>
> >>>>
> F956759623045OFT=true&F956759623046BU5=1&F9567596230472LR=2009%2F03%2F18&F956759623048IZR=%C3%A7&F956759623049S3E=3&F956759623050E25=test
> >>>>
> >>>> As you can see, the (url encoded) value of the F956759623048IZR
> >>>> field (description) is %C3%A7, so something isn't properly
> >>>> converting that. Helpers.urlDecode seems to be working properly:
> >>>>
> >>>> scala> Helpers.urlDecode("F956759623048IZR=%C3%A7")
> >>>> res1: java.lang.String = F956759623048IZR=ç
> >>>>
> >>>> So I have no idea where this is coming from. All I know is that
> >>>> between the actual POST and when my submit function is called,
> >>>> something is tweaking the string. I'm going to dig some more, but I
> >>>> wanted to post this in case it triggers any thoughts out there.
> >>>>
> >>>> Derek
> >>>>
> >>>> PS - I just found this:
> >>>>
> >>>>
> http://mail-archives.apache.org/mod_mbox/struts-dev/200604.mbox/%3c3769847.1145910729808.javamail.j...@brutus%3e
> >>>>
> >>>> May be related?
> >>>>
> >>>> On Sun, Mar 15, 2009 at 7:26 AM, Derek Chen-Becker
> >>>> <[email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>     OK, I can replicate this in our PocketChange app (also going
> >>>>     against a PostgreSQL DB). Let me dig a bit.
> >>>>
> >>>>     Derek
> >>>>
> >>>>
> >>>>     On Sun, Mar 15, 2009 at 3:58 AM, Charles F. Munat
> >>>>     <[email protected] <mailto:[email protected]>> wrote:
> >>>>
> >>>>
> >>>>         This might help, but I don't think I was clear. I have an
> >>>>         online form.
> >>>>         My clients enter text into it. Their text has characters
> >>>>         like a c with a
> >>>>         cedilla. That text gets saved into a PostgreSQL database
> >>>>         (UTF-8) varchar
> >>>>         field via JPA/Hibernate.
> >>>>
> >>>>         Then I pull it back out and dump it into a template, and it
> >>>>         comes out
> >>>>         gibberish. If I try using &ccedil; instead, I get
> >>>>         &amp;cedil; back out.
> >>>>
> >>>>         Here is what I have:
> >>>>
> >>>>         "name" -> SHtml.text(thing.name <http://thing.name>,
> >>>>         thing.name <http://thing.name> = _, ("size", "40"))
> >>>>
> >>>>         If I enter "cachaça" in the field, I get cachaça back out.
> >>>>         The weird
> >>>>         thing is that sometimes when I copy and paste text from
> >>>>         another document
> >>>>         into the form, it works. But if I use the keyboard, it fails
> >>>>         every time.
> >>>>
> >>>>         I'll play around with this. Thanks.
> >>>>
> >>>>         Chas.
> >>>>
> >>>>         Derek Chen-Becker wrote:
> >>>>         > Oops, forgot scala.xml.Unparsed, too:
> >>>>         >
> >>>>         > scala> val m = <span>a{ scala.xml.Unparsed("&ccedil;")
> >>>>         }b</span>
> >>>>         > m: scala.xml.Elem = <span>a&ccedil;b</span>
> >>>>         >
> >>>>         > That one might be what you're looking for.
> >>>>         >
> >>>>         > Derek
> >>>>         >
> >>>>         > On Sat, Mar 14, 2009 at 9:57 PM, Derek Chen-Becker
> >>>>         > <[email protected] <mailto:[email protected]>
> >>>>         <mailto:[email protected]
> >>>>         <mailto:[email protected]>>> wrote:
> >>>>         >
> >>>>         >     I think it depends on how you're embedding them in the
> >>>>         XML:
> >>>>         >
> >>>>         >     scala> val m = <span>a&ccedil;b</span>
> >>>>         >     m: scala.xml.Elem = <span>a&ccedil;b</span>
> >>>>         >
> >>>>         >     scala> val m = <span>a{"&ccedil;"}b</span>
> >>>>         >     m: scala.xml.Elem = <span>a&amp;ccedil;b</span>
> >>>>         >
> >>>>         >     scala> val m = <span>a{"ç"}b</span>
> >>>>         >     m: scala.xml.Elem = <span>açb</span>
> >>>>         >
> >>>>         >     That last one was input using dead keys (alt+,) on my
> >>>>         linux (USA
> >>>>         >     International with dead keys) layout. Let me know if
> >>>>         this doesn't
> >>>>         >     help; if not, could you send the code/template that's
> >>>>         having issues?
> >>>>         >
> >>>>         >     Derek
> >>>>         >
> >>>>         >
> >>>>         >     On Sat, Mar 14, 2009 at 6:36 PM, Charles F. Munat
> >>>>         <[email protected] <mailto:[email protected]>
> >>>>         >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>         >
> >>>>         >
> >>>>         >         I have a site that uses a lot of "special"
> >>>>         characters (a remarkably
> >>>>         >         biased description, since there is nothing
> >>>>         "special" about accented
> >>>>         >         characters to the people who use them daily). In
> >>>>         particular, I
> >>>>         >         need the
> >>>>         >         c with cedilla and the n with the tilde.
> >>>>         >
> >>>>         >         These characters are being input to a database
> >>>>         (UTF-8) via an online
> >>>>         >         form, then spit back out onto the page.
> >>>>         >
> >>>>         >         It's a fucking disaster. Apparently, everything
> >>>>         goes through the xml
> >>>>         >         parser, which is great, except when I try to enter
> >>>>         these as entity
> >>>>         >         references, such as &ccedil;, the parser changes &
> >>>>         to &amp; and
> >>>>         >         I get
> >>>>         >         the literal &ccedil; back out again.
> >>>>         >
> >>>>         >         When I type ç using the keyboard (or copy and
> >>>>         paste it from a
> >>>>         >         page or a
> >>>>         >         text editor), I get gibberish.
> >>>>         >
> >>>>         >         Anyone know the trick to getting around this? I
> >>>>         need everything
> >>>>         >         from e
> >>>>         >         acute to e grave to trademark and registered
> >>>>         trademark symbols,
> >>>>         >         and I
> >>>>         >         need to enter them this way.
> >>>>         >
> >>>>         >         Thanks for any help. If I can get this to work,
> >>>>         I'll add an
> >>>>         >         explanation
> >>>>         >         to the wiki.
> >>>>         >
> >>>>         >         Chas.
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>         >
> >>>>         > >
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> >
> > >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Lift" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/liftweb?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to