Jeremy Quinn pisze:
I am trying to solve a nasty request transcoding bug, that I found
while working on CForms.
Join the club! Discovered character encoding problems two days ago in
a project based on Cocoon 2.1.x. Tried to fight it yesterday and gave up.
You work with 2.1 ?? I am shocked :)
Stay cool, it's only because this project is going to be migrated to 2.2. Actually Mavenization and
migration to 2.2 is my main job here.
What about you? Have you already become convinced to Cocoon 2.2? Have you got it running and can you
develop on top of it?
A change like this while simplifying our codebase, could cause utter
havoc to users ..... I don't know if unicode really is a practical
superset of every other possible encoding.
Sorry, I do not think I know enough about this either.
Ok. Anyway just for record what wikipedia says[1] about UTF-8:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode.
It is able to represent any character in the Unicode standard, yet the initial encoding of byte
codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons, it
is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters
are stored or streamed.
So it can represent anything from Unicdoe, let's have a look at Unicode[2]
itself:
In computing, Unicode is an industry standard allowing computers to consistently represent and
manipulate text expressed in most of the world's writing systems. Developed in tandem with the
Universal Character Set standard and published in book form as The Unicode Standard, Unicode
consists of a repertoire of more than 100,000 characters [...]
If Unicode can handle 100 000 of characters then I guess anyone will have a hard times to find any
character not correctly encoded by Unicode.
Yes, I was expecting that.
Upgrading CForms upload widget is on my long list ..... I guess you just
bumped it forward a few places :)
Nice. :)
Still I'm interested in your work on CForms especially when it comes to the /server/ side where I
feel quite comfortable. Even I'm busy with my work here at my company and I have some other Cocoon
stuff to do I would like to support you on your effort.
I see only two small obstacles:
1. As I have already seen it at ApacheCon you have some nice work in your computer. The problem is
that if you keep it on your computer then nobody can test it and eventually help you with this
stuff. Any reason to not commit your work that you already have to some public place?
Otherwise any collaboration is rather difficult.
2. I prefer to work with C2.2 (trunk) because it's simpler than 2.1 and it's much easier to
develop/test anything here. Any chances that you will switch with your work to trunk?
There is even bug report about this issue:
https://issues.apache.org/jira/browse/COCOON-1917
Another interesting option would be to replace our own handling of
multipart requests with commons-upload code, see:
https://issues.apache.org/jira/browse/COCOON-1325
What do you think about the last proposal?
I need a bit of time to dig into this .....
Now I'm going to test fix proposed by you...
I've tested it (combined with fix from COCOON-1917) and on the server side everything looks correct
now. The only problem is that browser sometimes does not behave correctly.
I noticed that sometimes when I enter non-latin characters to the text field they get escaped by a
browser.
So when I enter something like:
światło
the browser posts to the server such value:
światło
(additionally there is parameter: dojo.transport=xmlhttp)
Since I don't know how these things are handled on the client side I'm not sure
how to fix it.
Any ideas?
Many thanks!
You welcome!
--
Grzegorz Kossakowski