Looks nice. Thanks for making this change Chirag.

On Wed, Aug 11, 2010 at 4:56 AM, Chirag Shah <[email protected]> wrote:

> I've submitted a patch to Caja which allows HTML entities in URIs
> during ResolveUriStage.
>
> http://codereview.appspot.com/1945041/show
>
> Thanks,
> Chirag
>
> On Thu, Aug 5, 2010 at 9:13 AM, Gagandeep singh <[email protected]>
> wrote:
> > +fargo, zhoresh
> >
> > Hi Chirag
> >
> > I have had similar doubts some time back when i thought that proxied /
> > concatenated url's emitted by shindig should not have &amp; in them.
> > However, somewhere <http://htmlhelp.com/tools/validator/problems.html> i
> > read that when emitting html content, & should be escaped to &amp;
> > Basically, this acts as a marker for the browser so that it treats it
> > correctly as "&".
> >
> > I think what browsers (like chrome) end up doing when they read a url
> from
> > the html is that they unescape it once. You might have seen this behavior
> in
> > chrome when you load an escaped url and it unescapes it once (FF does not
> > change the url location, but it might be doing the same thing).
> >
> > However the problem you describe seems to be a caja parsing problem. RFC
> > 3986 describes the syntax of a URI, but the transport mechanism here is
> > html. And html says that whenever you want the user agent to the
> interpret
> > symbol "&", you need to return "&amp;".
> >
> > Also, i don't know if this is not a bug in shindig with Neko :)
> > In shindig, DOM parsing and serialization is avoided by maintaining
> > contentString, contentBytes and document. This bug might very well appear
> in
> > shindig as well.
> >
> > John and Ziv can provide more info.
> >
> > On Wed, Aug 4, 2010 at 11:22 PM, Chirag Shah <[email protected]>
> wrote:
> >
> >> Hey,
> >>
> >> For a gadget that includes the content-rewrite feature, ConcatVisitor
> >> will eventually concat the source urls and escape the "&" symbol with
> >> "&amp;".
> >> Since Apache Shindig uses the NekoSimplifiedHtmlParser, these entities
> >> are preserved.
> >>
> >> Now over in CajaContentRewriter, Shindig passes the dom over to
> >> DefaultGadgetRewriter#rewriteContent and Caja reparses this dom.
> >>
> >> DefaultGadgetRewriter#compileGadget line 211 from caja-r4135-src.jar
> >>    compiler.addInput(AncestorChain.instance(new Dom(content)), baseUri);
> >>
> >> When Caja reparses this dom, the source URL in our script node gets
> >> URL encoded (RFC 3986), and the "&amp;" gets rewritten to "&amp%3b"
> >>
> >> This finally causes Caja to log an error saying "Unable to cajole
> >> gadget" during RewriteHtmlStage.
> >>
> >> This looks like one of those differences between Neko and Caja in
> >> parsing HTML documents, but I'm not entirely sure which one is at
> >> fault.
> >>
> >> Example gadget:
> >> <?xml version="1.0" encoding="UTF-8"?>
> >> <Module>
> >>  <ModulePrefs title="">
> >>    <Require feature="content-rewrite"><Param
> >> name="include-urls">.*</Param></Require>
> >>    <Require feature="caja"/>
> >>  </ModulePrefs>
> >>  <Content type="html">
> >>   <![CDATA[<script type="text/javascript"
> >> src="http://chiarg.com/test.js";></script>]]>
> >>  </Content>
> >> </Module>
> >>
> >> Error:
> >> INFO: Unable to cajole gadget:
> >> com.google.caja.opensocial.GadgetRewriteException: Gadget has compile
> >> errors
> >> Checkpoint: LegacyNamespaceFixupStage at T+0.146571 seconds
> >> Checkpoint: ResolveUriStage at T+0.158071 seconds
> >> Checkpoint: RewriteHtmlStage at T+0.161643 seconds
> >>
> >>
> http://localhost:8080/gadgets/concat?container=default&amp%3bgadget=http%3A%2F%2Fchiarg.com%2Fopensocial%2Fcaja.xml&amp%3bdebug=0&amp%3bnocache=1&amp%3btype=js&amp%3b1=http%3A%2F%2Fchiarg.com%2Ftest.js:1+1
> >> - 2: Unexpected token <
> >>
> >
>

Reply via email to