I just went back and tried changing it in the database itself, and that worked fine. So now I have a workaround, but it's one that creates a huge amount of work for me... :-(
Chas. Charles F. Munat wrote: > Oh, sorry, Derek. My bad. I didn't mean to imply that you were saying > that the situation was optimal. I understood where you were coming from. > Actually, I wasn't really addressing your comment after my first > sentence. I should have made that clear. Haven't had my coffee yet... > > This is kind of important to me. I have a site that is sponsored by some > big liquor companies. Many of them are European, and then the Brazilian > ones are all selling cachaça. Eliminating accents and changing ç to c > does not make them happy, which does not make my client happy. And I > can't explain to them why I can't help it because their sites all work > fine with ç. So I spent more than 40 hours this week, mostly between > midnight and 6 AM, inputing data that my client could have input > themselves because I didn't want them to have to deal with this problem. > That was above and beyond the 40+ hours I spent programming. > > Now I have to go back and change all those after we figure this out. So > it's a pretty major issue for me at the moment. > > I'm thinking that as a workaround, I can go change things directly in > the database and see if that helps. Ugh. That's gonna mean another week > of no sleep. > > Can you point me to the spot in Lift code where this all happens? I'd > love to be part of the solution instead of just the guy who points > things out. > > Chas. > > Derek Chen-Becker wrote: >> Sorry, I'm not suggesting that this is the appropriate method for users; >> they should just be able to type. I was just trying to explain why the >> "&" is getting expanded. I think that the current behavior is not really >> what anyone wants, and hopefully we can fix it in a transparent manner. >> >> Derek >> >> On Sun, Mar 15, 2009 at 2:38 PM, Charles F. Munat <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> Unfortunately, there is no easy way to do that with user input. But the >> use of character entity references is problematic in itself. I can't >> teach all my site's users all the references they will need, nor is it >> really reasonable to expect, for example, an international group of >> users to have to hand code every accented character. >> >> There must be a way to input UTF-8 and have it come out properly. I've >> set the keyboard on my Mac to U.S. Extended, which makes everything >> UTF-8. I note that *most* of the keyboards available for the Mac are >> UTF-8 (though the default U.S. keyboard is Roman, and there are many >> European keyboards that are Roman or Cyrillic). >> >> Ideally, Lift would recognize the character encoding and act >> appropriately. (I'd be happy to convert everything to UTF-8.) Another >> possibility, much less preferred but at least workable, would be to add >> the ability for the user to select the character encoding (they could >> use trial and error if they weren't sure). >> >> But the upshot is that someone with a keyboard set to UTF-8 (which >> includes much of the world) should be able to use that keyboard and have >> it come out the same way it went in. I have no idea how to accomplish >> this, however, as I don't know how that part of Lift works. >> >> Chas. >> >> Derek Chen-Becker wrote: >> > The scala XML syntax automatically converts any "&" in embedded >> strings >> > to "&". You have to put the string inside a >> scala.xml.Unparsed node >> > to prevent that from happening. >> > >> > Derek >> > >> > On Sun, Mar 15, 2009 at 1:59 PM, Charles F. Munat <[email protected] >> <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>>> wrote: >> > >> > >> > That was my thinking. It doesn't explain why ç in gets >> changed to >> > &ccedil;, but it explains why ç in becomes ç out. So I >> think there >> > are two separate issues here. >> > >> > The ç can be created in two different ways in UTF-8. One is >> the single >> > "c with a cedilla" character. The second is a c character >> followed by a >> > cedilla character. I am not sure how UTF-8 indicates that >> these two >> > characters should be displayed as one. Neither am I sure that >> this has >> > anything to do with the problem. Maybe it is simply that >> something is >> > assuming Latin1 input even though the input is UTF-8. >> > >> > It is definitely on the front end, because it is stored in >> the database >> > as ç. >> > >> > When I use ç instead, the problem is that it is *not* >> converted >> > to ç as it goes into the database, and then on the way out >> the XML >> > interpreter does not recognize it as a character entity >> reference and so >> > converts the & to &. >> > >> > Chas. >> > >> > Marc Boschma wrote: >> > > Now I have some breakfast in me, to be clear it appears that >> > UTF-8 byte >> > > stream is being interpreted as Latin1 and then converted to >> > unicode... >> > > >> > > Marc >> > > On 16/03/2009, at 6:25 AM, Marc Boschma wrote: >> > > >> > >> excuse the typo: >> > >> On 16/03/2009, at 6:23 AM, Marc Boschma wrote: >> > >> >> > >>> Just looking at http://jeppesn.dk/utf-8.html , I found the >> > following >> > >>> lines: >> > >>> Character Latin1 Unicode UTF-8 Latin1 >> > >>> code >> > interpr. >> > >>> ç E7 00 E7 C3 >> A7 ç >> > >>> à is C38C, § is C2 A7 >> > >> à is C383 >> > >>> So it appears that somewhere there is a translation to >> Latin 1 >> > going on. >> > >>> Hopefully that helps some what... >> > >>> Regards, >> > >>> Marc >> > >>> >> > >>> On 16/03/2009, at 1:08 AM, Derek Chen-Becker wrote: >> > >>> >> > >>>> This is really interesting. I've narrowed it down to >> something on >> > >>>> form submission. The database shows gibberish, too, and >> if I >> > >>>> manually enter the correct value in the DB it works fine on >> > display. >> > >>>> If I print the UTF-8 byte values of the string I get >> from the >> > >>>> browser for my description when I submit a cedilla (ç), >> I see: >> > >>>> >> > >>>> INFO - Submitted desc bytes = c3 83 c2 a7 >> > >>>> >> > >>>> A cedilla is c3 a7 in UTF-8, so I'm not sure where the >> "83 c2" is >> > >>>> coming from. I googled around a bit and I found other >> people >> > having >> > >>>> the same issue but it wasn't clear in those posts what >> the cause >> > >>>> was. I did a packet capture just as a sanity check, and >> here's >> > what >> > >>>> I got: >> > >>>> >> > >>>> POST / HTTP/1.1 >> > >>>> ... headers here ... >> > >>>> >> > >>>> >> > >> >> F956759623045OFT=true&F956759623046BU5=1&F9567596230472LR=2009%2F03%2F18&F956759623048IZR=%C3%A7&F956759623049S3E=3&F956759623050E25=test >> > >>>> >> > >>>> As you can see, the (url encoded) value of the >> F956759623048IZR >> > >>>> field (description) is %C3%A7, so something isn't properly >> > >>>> converting that. Helpers.urlDecode seems to be working >> properly: >> > >>>> >> > >>>> scala> Helpers.urlDecode("F956759623048IZR=%C3%A7") >> > >>>> res1: java.lang.String = F956759623048IZR=ç >> > >>>> >> > >>>> So I have no idea where this is coming from. All I know >> is that >> > >>>> between the actual POST and when my submit function is >> called, >> > >>>> something is tweaking the string. I'm going to dig some >> more, >> > but I >> > >>>> wanted to post this in case it triggers any thoughts >> out there. >> > >>>> >> > >>>> Derek >> > >>>> >> > >>>> PS - I just found this: >> > >>>> >> > >>>> >> > >> >> http://mail-archives.apache.org/mod_mbox/struts-dev/200604.mbox/%3c3769847.1145910729808.javamail.j...@brutus%3e >> > >>>> >> > >>>> May be related? >> > >>>> >> > >>>> On Sun, Mar 15, 2009 at 7:26 AM, Derek Chen-Becker >> > >>>> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> > <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> > >>>> >> > >>>> OK, I can replicate this in our PocketChange app >> (also going >> > >>>> against a PostgreSQL DB). Let me dig a bit. >> > >>>> >> > >>>> Derek >> > >>>> >> > >>>> >> > >>>> On Sun, Mar 15, 2009 at 3:58 AM, Charles F. Munat >> > >>>> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> > <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> > >>>> >> > >>>> >> > >>>> This might help, but I don't think I was clear. >> I have an >> > >>>> online form. >> > >>>> My clients enter text into it. Their text has >> characters >> > >>>> like a c with a >> > >>>> cedilla. That text gets saved into a PostgreSQL >> database >> > >>>> (UTF-8) varchar >> > >>>> field via JPA/Hibernate. >> > >>>> >> > >>>> Then I pull it back out and dump it into a >> template, >> > and it >> > >>>> comes out >> > >>>> gibberish. If I try using ç instead, I get >> > >>>> &cedil; back out. >> > >>>> >> > >>>> Here is what I have: >> > >>>> >> > >>>> "name" -> SHtml.text(thing.name >> <http://thing.name> <http://thing.name> >> > <http://thing.name>, >> > >>>> thing.name <http://thing.name> >> <http://thing.name> <http://thing.name> = >> > _, ("size", "40")) >> > >>>> >> > >>>> If I enter "cachaça" in the field, I get >> cachaça back >> > out. >> > >>>> The weird >> > >>>> thing is that sometimes when I copy and paste >> text from >> > >>>> another document >> > >>>> into the form, it works. But if I use the >> keyboard, it >> > fails >> > >>>> every time. >> > >>>> >> > >>>> I'll play around with this. Thanks. >> > >>>> >> > >>>> Chas. >> > >>>> >> > >>>> Derek Chen-Becker wrote: >> > >>>> > Oops, forgot scala.xml.Unparsed, too: >> > >>>> > >> > >>>> > scala> val m = <span>a{ >> scala.xml.Unparsed("ç") >> > >>>> }b</span> >> > >>>> > m: scala.xml.Elem = <span>açb</span> >> > >>>> > >> > >>>> > That one might be what you're looking for. >> > >>>> > >> > >>>> > Derek >> > >>>> > >> > >>>> > On Sat, Mar 14, 2009 at 9:57 PM, Derek >> Chen-Becker >> > >>>> > <[email protected] >> <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>>> >> > >>>> <mailto:[email protected] >> <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>> >> > >>>> <mailto:[email protected] >> <mailto:[email protected]> >> > <mailto:[email protected] >> <mailto:[email protected]>>>>> wrote: >> > >>>> > >> > >>>> > I think it depends on how you're >> embedding them >> > in the >> > >>>> XML: >> > >>>> > >> > >>>> > scala> val m = <span>açb</span> >> > >>>> > m: scala.xml.Elem = <span>açb</span> >> > >>>> > >> > >>>> > scala> val m = <span>a{"ç"}b</span> >> > >>>> > m: scala.xml.Elem = >> <span>a&ccedil;b</span> >> > >>>> > >> > >>>> > scala> val m = <span>a{"ç"}b</span> >> > >>>> > m: scala.xml.Elem = <span>açb</span> >> > >>>> > >> > >>>> > That last one was input using dead keys >> (alt+,) >> > on my >> > >>>> linux (USA >> > >>>> > International with dead keys) layout. Let >> me know if >> > >>>> this doesn't >> > >>>> > help; if not, could you send the >> code/template >> > that's >> > >>>> having issues? >> > >>>> > >> > >>>> > Derek >> > >>>> > >> > >>>> > >> > >>>> > On Sat, Mar 14, 2009 at 6:36 PM, Charles >> F. Munat >> > >>>> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> > <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> > >>>> > <mailto:[email protected] >> <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> >> > <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: >> > >>>> > >> > >>>> > >> > >>>> > I have a site that uses a lot of >> "special" >> > >>>> characters (a remarkably >> > >>>> > biased description, since there is >> nothing >> > >>>> "special" about accented >> > >>>> > characters to the people who use them >> daily). In >> > >>>> particular, I >> > >>>> > need the >> > >>>> > c with cedilla and the n with the tilde. >> > >>>> > >> > >>>> > These characters are being input to a >> database >> > >>>> (UTF-8) via an online >> > >>>> > form, then spit back out onto the page. >> > >>>> > >> > >>>> > It's a fucking disaster. Apparently, >> everything >> > >>>> goes through the xml >> > >>>> > parser, which is great, except when I >> try to >> > enter >> > >>>> these as entity >> > >>>> > references, such as ç, the parser >> > changes & >> > >>>> to & and >> > >>>> > I get >> > >>>> > the literal ç back out again. >> > >>>> > >> > >>>> > When I type ç using the keyboard (or >> copy and >> > >>>> paste it from a >> > >>>> > page or a >> > >>>> > text editor), I get gibberish. >> > >>>> > >> > >>>> > Anyone know the trick to getting >> around this? I >> > >>>> need everything >> > >>>> > from e >> > >>>> > acute to e grave to trademark and >> registered >> > >>>> trademark symbols, >> > >>>> > and I >> > >>>> > need to enter them this way. >> > >>>> > >> > >>>> > Thanks for any help. If I can get >> this to work, >> > >>>> I'll add an >> > >>>> > explanation >> > >>>> > to the wiki. >> > >>>> > >> > >>>> > Chas. >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > > >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >> >> > >> >> > >> >> > >> >> > > >> > > >> > > > >> > >> > >> > >> > >> > > >> >> >> >> > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Lift" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/liftweb?hl=en -~----------~----~----~----~------~----~------~--~---
