On 25 Jul 2008, at 13:54, Grzegorz Kossakowski wrote:

Jeremy Quinn pisze:
I am trying to solve a nasty request transcoding bug, that I found while working on CForms.

Join the club! Discovered character encoding problems two days ago in a project based on Cocoon 2.1.x. Tried to fight it yesterday and gave up.
You work with 2.1 ?? I am shocked :)

Stay cool, it's only because this project is going to be migrated to 2.2. Actually Mavenization and migration to 2.2 is my main job here.

:)

What about you? Have you already become convinced to Cocoon 2.2? Have you got it running and can you develop on top of it?

I still have all of the notes and the builds we did (thanks!).
But I am still doing the work in 2.1, as (if I remember properly) we did not manage to make a build that would edit live at the level of the cforms block itself. Correct me if I am wrong, but it seems easier to setup 2.1 so that edits made to the built-in resources of the block are immediately live without re-building.

A change like this while simplifying our codebase, could cause utter havoc to users ..... I don't know if unicode really is a practical superset of every other possible encoding.
Sorry, I do not think I know enough about this either.

Ok. Anyway just for record what wikipedia says[1] about UTF-8:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.

So it can represent anything from Unicdoe, let's have a look at Unicode[2] itself: In computing, Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a repertoire of more than 100,000 characters [...]

If Unicode can handle 100 000 of characters then I guess anyone will have a hard times to find any character not correctly encoded by Unicode.

Yes, I know that is the 'party line' about unicode :)
But TBH, I don't know if it really covers every possible obscure case.

Yes, I was expecting that.
Upgrading CForms upload widget is on my long list ..... I guess you just bumped it forward a few places :)

Nice. :)

... but I am still bogged down with subtle differences in format interpretation between Java and Dojo, with validating number fields, it's a minefield ...... blog entry half written ;)

Still I'm interested in your work on CForms especially when it comes to the /server/ side where I feel quite comfortable. Even I'm busy with my work here at my company and I have some other Cocoon stuff to do I would like to support you on your effort.

Great !

I see only two small obstacles:
1. As I have already seen it at ApacheCon you have some nice work in your computer. The problem is that if you keep it on your computer then nobody can test it and eventually help you with this stuff. Any reason to not commit your work that you already have to some public place?

There are a few problems that have stopped me doing this so far :
1) too lazy (so far) to set up and maintain some kind of branch/ sandbox ;) 2) I cannot commit anything to head yet, because lots of stuff is still completely broken and/or still has to be re-written to the new APIs. The work has already taken me several months, and there are several more to go ..... it is unpredictable how much longer this will take, I'd mess up Cocoon's release cycles .....

Otherwise any collaboration is rather difficult.

Agreed.
What would you propose?
The work involves having two or three custom blocks, forms and ajax (atm, I have dojotoolkit as a block). If you are serious about getting involved, I'd be prepared to make the extra effort to collaborate.

2. I prefer to work with C2.2 (trunk) because it's simpler than 2.1 and it's much easier to develop/test anything here. Any chances that you will switch with your work to trunk?

You find 2.2 simpler, I find 2.1 simpler :)
If we could find the right way to collaborate, you can work on 2.2- specific issues, and I can work on 2.1.

One of the major problems with 2.2 is the loss of the 'system pipelines' that in 2.1 provide a set of static URIs for loading cforms and dojo resources; coupled to the fact that /someone/ misunderstanding dojo APIs thought it necessary to introduce a resource-path for use by cforms widgets client-side.

I can hopefully help you over-come these problems.

This is the current JS Loader for 2.1.12-dev :
<script src="/_cocoon/resources/dojotoolkit/dojo/dojo.js" type="text/ javascript" djConfig="isDebug: true, locale: 'en_GB', parseOnLoad: true"></script>
<script type="text/javascript">
dojo.require("dojo.parser");
dojo.registerModulePath("cocoon.forms", "../../forms/js"); dojo.registerModulePath("cocoon.ajax", "../../ajax/js"); dojo.require("cocoon.forms.common"); dojo.addOnLoad(cocoon.forms.callOnLoadHandlers);
</script>

(ignoring paths to css for now ....)

We have a system pipeline "/_cocoon/resources/ .... " which is used as a prefix to load dojo from the dojotoolkit block.

Then we register two modules, forms and ajax, using a path that is relative to where dojo was loaded from.

One point that was missed by the /someone/ above, was that once a module is registered, you can get a url to it like this :

var imgSrc = dojo.moduleUrl("cocoon.forms","images/blah.png");

i.e. it is not necessary to provide it specifically to the client as it is currently done : cocoon.resourcesUri = "<xsl:value-of select="$resources-uri"/>"

But TBH, except for a few exceptions like custom data-source urls (dynamic selectionlists etc.) there should be no need to reference anything like this ..... templates should be embedded in widgets, images used in widgets should be loaded via css (where relative references work internally) etc. etc.

So, the system path is not available in 2.2. The dojotolkit, forms and ajax blocks could have any URI. So we need a standard way for an application block to tell it's form-rendering pipeline the paths to these blocks. Presumably this should be the responsibility of the application's sitemap.

It should not be necessary to re-write any URIs (!!).

Furthermore, this provision of paths to blocks, needs to take into account the fact that in production people will most likely want to do stuff like :
1) acquire dojo from CDNs like AOL, Google etc.
2) build custom minimised JS libs to support their apps
3) load their own custom modules, override css etc.
4) lots of stuff we have not thought of yet ;)

ATM, while I am developing cforms, my dojotoolkit block is a special build, everything uncompressed, unpackaged, etc. with like 180 sets of locales etc. etc. Some complex forms are loading over 100 separate assets.

The modularity of dojo (and by using dojo.require) means that only what is needed by a page is loaded, which is great. But in production, you will want to heavily reduce the number of files ..... specially the 404s you get 'hunting' the locale tree. It is a bit of a contradiction .....

I have not really begun to think seriously about how this should be done yet.

If we could collaborate on a way to cleanly solve this, so that ideally the basic technique is the same for both 2.1 and 2.2, that would be really useful for me :)

There is even bug report about this issue:
https://issues.apache.org/jira/browse/COCOON-1917

Another interesting option would be to replace our own handling of multipart requests with commons-upload code, see:
https://issues.apache.org/jira/browse/COCOON-1325

What do you think about the last proposal?
I need a bit of time to dig into this .....
Now I'm going to test fix proposed by you...

I've tested it (combined with fix from COCOON-1917) and on the server side everything looks correct now.

Great !!!

The only problem is that browser sometimes does not behave correctly.

I noticed that sometimes when I enter non-latin characters to the text field they get escaped by a browser.

So when I enter something like:
światło

the browser posts to the server such value:
&#347;wiat&#322;o

Yes, I see this a lot.
I also see UTF-8 encoding like this : %E2%82%AC (which is the 3 byte encoding for the Euro symbol).

I have not found this encoding to be a problem.
What problem does this cause you?

(additionally there is parameter: dojo.transport=xmlhttp)

This is one of the standard parameters that CForms has to add to form submits.

CForms uses 3 different transports, depending on context:

1) ajax-off : normal whole page submit
2) ajax-on  : xmlhttp
3) ajax-on + form contains a 'file' field : iframe-transport

Unfortunately, the response to each of these needs to be serialized differently, hence the need to a very complicated sitemap for cforms and this special parameter.

Since I don't know how these things are handled on the client side I'm not sure how to fix it.

Any ideas?

I need more details of what problem it causes ....

Thanks

regards Jeremy



Reply via email to