Looks nice. Thanks for making this change Chirag. On Wed, Aug 11, 2010 at 4:56 AM, Chirag Shah <[email protected]> wrote:
> I've submitted a patch to Caja which allows HTML entities in URIs > during ResolveUriStage. > > http://codereview.appspot.com/1945041/show > > Thanks, > Chirag > > On Thu, Aug 5, 2010 at 9:13 AM, Gagandeep singh <[email protected]> > wrote: > > +fargo, zhoresh > > > > Hi Chirag > > > > I have had similar doubts some time back when i thought that proxied / > > concatenated url's emitted by shindig should not have & in them. > > However, somewhere <http://htmlhelp.com/tools/validator/problems.html> i > > read that when emitting html content, & should be escaped to & > > Basically, this acts as a marker for the browser so that it treats it > > correctly as "&". > > > > I think what browsers (like chrome) end up doing when they read a url > from > > the html is that they unescape it once. You might have seen this behavior > in > > chrome when you load an escaped url and it unescapes it once (FF does not > > change the url location, but it might be doing the same thing). > > > > However the problem you describe seems to be a caja parsing problem. RFC > > 3986 describes the syntax of a URI, but the transport mechanism here is > > html. And html says that whenever you want the user agent to the > interpret > > symbol "&", you need to return "&". > > > > Also, i don't know if this is not a bug in shindig with Neko :) > > In shindig, DOM parsing and serialization is avoided by maintaining > > contentString, contentBytes and document. This bug might very well appear > in > > shindig as well. > > > > John and Ziv can provide more info. > > > > On Wed, Aug 4, 2010 at 11:22 PM, Chirag Shah <[email protected]> > wrote: > > > >> Hey, > >> > >> For a gadget that includes the content-rewrite feature, ConcatVisitor > >> will eventually concat the source urls and escape the "&" symbol with > >> "&". > >> Since Apache Shindig uses the NekoSimplifiedHtmlParser, these entities > >> are preserved. > >> > >> Now over in CajaContentRewriter, Shindig passes the dom over to > >> DefaultGadgetRewriter#rewriteContent and Caja reparses this dom. > >> > >> DefaultGadgetRewriter#compileGadget line 211 from caja-r4135-src.jar > >> compiler.addInput(AncestorChain.instance(new Dom(content)), baseUri); > >> > >> When Caja reparses this dom, the source URL in our script node gets > >> URL encoded (RFC 3986), and the "&" gets rewritten to "&%3b" > >> > >> This finally causes Caja to log an error saying "Unable to cajole > >> gadget" during RewriteHtmlStage. > >> > >> This looks like one of those differences between Neko and Caja in > >> parsing HTML documents, but I'm not entirely sure which one is at > >> fault. > >> > >> Example gadget: > >> <?xml version="1.0" encoding="UTF-8"?> > >> <Module> > >> <ModulePrefs title=""> > >> <Require feature="content-rewrite"><Param > >> name="include-urls">.*</Param></Require> > >> <Require feature="caja"/> > >> </ModulePrefs> > >> <Content type="html"> > >> <![CDATA[<script type="text/javascript" > >> src="http://chiarg.com/test.js"></script>]]> > >> </Content> > >> </Module> > >> > >> Error: > >> INFO: Unable to cajole gadget: > >> com.google.caja.opensocial.GadgetRewriteException: Gadget has compile > >> errors > >> Checkpoint: LegacyNamespaceFixupStage at T+0.146571 seconds > >> Checkpoint: ResolveUriStage at T+0.158071 seconds > >> Checkpoint: RewriteHtmlStage at T+0.161643 seconds > >> > >> > http://localhost:8080/gadgets/concat?container=default&%3bgadget=http%3A%2F%2Fchiarg.com%2Fopensocial%2Fcaja.xml&%3bdebug=0&%3bnocache=1&%3btype=js&%3b1=http%3A%2F%2Fchiarg.com%2Ftest.js:1+1 > >> - 2: Unexpected token < > >> > > >
