I was able to solve the problem i believe. The real problem was the fact that i couldn't use unsupported JRE functions in my transfer obeject, so i had to pull the encoding/decoding process out of that object and stick it in a utility function. From there i keep the string from the browser all the way to the server and to the persistence layer. There i do a kind of "switcharoo", I take the string decode it to byte array and stuff it into the database blob. I do the exact opposite on the way out of the database, gather the bytes, run them through the encoding util function and stuff the string into the transfer object. One thing still stands out to me, is that the encoding/decoding process I am using the 'windows-1258' char set. Using this to encode and decode preserves the mangled characters when one pastes from a word doc. Most, if not all, folks using this webapp will be coming from windows machines, so i think i am safe with the 'hardcoded' (blasphemy i know) windows-1258 encoding.
Thanks to everyone for their input, it helped me arrive at my solution. On May 6, 11:23 am, David Given <[email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 06/05/10 16:08, undertow wrote: > > > Thank you for confirming what i had suspected i would need to do. So > > the idea is, user enters a bunch of text into a textarea via typing it > > all in or cut and paste from somewhere (like Word, ugh and its mangled > > characters). when time comes to ship that text off to the server i > > would then pluck the string out of the textarea stick it in a transfer > > object of sorts. (this is where i am a little fuzzy) I would then > > take the input string do a getBytes() on it and then push that array > > of bytes into a blob. would i need to get the bytes with an encoding > > argument? > > I believe so. GWT ought to get the string from the browser in UTF-16 --- > as that's what Strings are defined to be. You can then ship it back to > the server, as a String, and it should Just Work. Then you get to do the > charset conversion on the server. > > > e.g. txt.getBytes("ISO-8859-1"). This method seems to work > > ok, but if user had pasted from ms word into the text box things still > > come out mangled. > > I'm quite willing to believe that there are web browser bugs with all > this. It may be worth verifying that GWT is actually getting a valid > string from the browser (by going through and listing all the codepoints > in the string). > > In addition, if Word is using all kinds of whacky non-ISO-8859-1 > characters such as unbreaking spaces and quotation marks, then > getBytes() might be replacing them with ? signs --- how is it being mangled? > > - -- > ┌─── dg@cowlark.com ─────http://www.cowlark.com───── > │ > │ life←{ ↑1 ⍵∨.^3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵ } > │ --- Conway's Game Of Life, in one line of APL > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/ > > iEYEARECAAYFAkvi+w4ACgkQf9E0noFvlziutgCghRKvCoszHy+H0ONz6CnbNtSw > FL4AoKv2Jo0V1wznq4awrMVgzzaNXDuQ > =+bWt > -----END PGP SIGNATURE----- > > -- > You received this message because you are subscribed to the Google Groups > "Google Web Toolkit" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group > athttp://groups.google.com/group/google-web-toolkit?hl=en. -- You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en.
