I've submitted a patch to Caja which allows HTML entities in URIs during ResolveUriStage.
http://codereview.appspot.com/1945041/show Thanks, Chirag On Thu, Aug 5, 2010 at 9:13 AM, Gagandeep singh <[email protected]> wrote: > +fargo, zhoresh > > Hi Chirag > > I have had similar doubts some time back when i thought that proxied / > concatenated url's emitted by shindig should not have & in them. > However, somewhere <http://htmlhelp.com/tools/validator/problems.html> i > read that when emitting html content, & should be escaped to & > Basically, this acts as a marker for the browser so that it treats it > correctly as "&". > > I think what browsers (like chrome) end up doing when they read a url from > the html is that they unescape it once. You might have seen this behavior in > chrome when you load an escaped url and it unescapes it once (FF does not > change the url location, but it might be doing the same thing). > > However the problem you describe seems to be a caja parsing problem. RFC > 3986 describes the syntax of a URI, but the transport mechanism here is > html. And html says that whenever you want the user agent to the interpret > symbol "&", you need to return "&". > > Also, i don't know if this is not a bug in shindig with Neko :) > In shindig, DOM parsing and serialization is avoided by maintaining > contentString, contentBytes and document. This bug might very well appear in > shindig as well. > > John and Ziv can provide more info. > > On Wed, Aug 4, 2010 at 11:22 PM, Chirag Shah <[email protected]> wrote: > >> Hey, >> >> For a gadget that includes the content-rewrite feature, ConcatVisitor >> will eventually concat the source urls and escape the "&" symbol with >> "&". >> Since Apache Shindig uses the NekoSimplifiedHtmlParser, these entities >> are preserved. >> >> Now over in CajaContentRewriter, Shindig passes the dom over to >> DefaultGadgetRewriter#rewriteContent and Caja reparses this dom. >> >> DefaultGadgetRewriter#compileGadget line 211 from caja-r4135-src.jar >> compiler.addInput(AncestorChain.instance(new Dom(content)), baseUri); >> >> When Caja reparses this dom, the source URL in our script node gets >> URL encoded (RFC 3986), and the "&" gets rewritten to "&%3b" >> >> This finally causes Caja to log an error saying "Unable to cajole >> gadget" during RewriteHtmlStage. >> >> This looks like one of those differences between Neko and Caja in >> parsing HTML documents, but I'm not entirely sure which one is at >> fault. >> >> Example gadget: >> <?xml version="1.0" encoding="UTF-8"?> >> <Module> >> <ModulePrefs title=""> >> <Require feature="content-rewrite"><Param >> name="include-urls">.*</Param></Require> >> <Require feature="caja"/> >> </ModulePrefs> >> <Content type="html"> >> <![CDATA[<script type="text/javascript" >> src="http://chiarg.com/test.js"></script>]]> >> </Content> >> </Module> >> >> Error: >> INFO: Unable to cajole gadget: >> com.google.caja.opensocial.GadgetRewriteException: Gadget has compile >> errors >> Checkpoint: LegacyNamespaceFixupStage at T+0.146571 seconds >> Checkpoint: ResolveUriStage at T+0.158071 seconds >> Checkpoint: RewriteHtmlStage at T+0.161643 seconds >> >> http://localhost:8080/gadgets/concat?container=default&%3bgadget=http%3A%2F%2Fchiarg.com%2Fopensocial%2Fcaja.xml&%3bdebug=0&%3bnocache=1&%3btype=js&%3b1=http%3A%2F%2Fchiarg.com%2Ftest.js:1+1 >> - 2: Unexpected token < >> >
