+fargo, zhoresh

Hi Chirag

I have had similar doubts some time back when i thought that proxied /
concatenated url's emitted by shindig should not have & in them.
However, somewhere <http://htmlhelp.com/tools/validator/problems.html> i
read that when emitting html content, & should be escaped to &amp;
Basically, this acts as a marker for the browser so that it treats it
correctly as "&".

I think what browsers (like chrome) end up doing when they read a url from
the html is that they unescape it once. You might have seen this behavior in
chrome when you load an escaped url and it unescapes it once (FF does not
change the url location, but it might be doing the same thing).

However the problem you describe seems to be a caja parsing problem. RFC
3986 describes the syntax of a URI, but the transport mechanism here is
html. And html says that whenever you want the user agent to the interpret
symbol "&", you need to return "&amp;".

Also, i don't know if this is not a bug in shindig with Neko :)
In shindig, DOM parsing and serialization is avoided by maintaining
contentString, contentBytes and document. This bug might very well appear in
shindig as well.

John and Ziv can provide more info.

On Wed, Aug 4, 2010 at 11:22 PM, Chirag Shah <[email protected]> wrote:

> Hey,
>
> For a gadget that includes the content-rewrite feature, ConcatVisitor
> will eventually concat the source urls and escape the "&" symbol with
> "&amp;".
> Since Apache Shindig uses the NekoSimplifiedHtmlParser, these entities
> are preserved.
>
> Now over in CajaContentRewriter, Shindig passes the dom over to
> DefaultGadgetRewriter#rewriteContent and Caja reparses this dom.
>
> DefaultGadgetRewriter#compileGadget line 211 from caja-r4135-src.jar
>    compiler.addInput(AncestorChain.instance(new Dom(content)), baseUri);
>
> When Caja reparses this dom, the source URL in our script node gets
> URL encoded (RFC 3986), and the "&amp;" gets rewritten to "&amp%3b"
>
> This finally causes Caja to log an error saying "Unable to cajole
> gadget" during RewriteHtmlStage.
>
> This looks like one of those differences between Neko and Caja in
> parsing HTML documents, but I'm not entirely sure which one is at
> fault.
>
> Example gadget:
> <?xml version="1.0" encoding="UTF-8"?>
> <Module>
>  <ModulePrefs title="">
>    <Require feature="content-rewrite"><Param
> name="include-urls">.*</Param></Require>
>    <Require feature="caja"/>
>  </ModulePrefs>
>  <Content type="html">
>   <![CDATA[<script type="text/javascript"
> src="http://chiarg.com/test.js";></script>]]>
>  </Content>
> </Module>
>
> Error:
> INFO: Unable to cajole gadget:
> com.google.caja.opensocial.GadgetRewriteException: Gadget has compile
> errors
> Checkpoint: LegacyNamespaceFixupStage at T+0.146571 seconds
> Checkpoint: ResolveUriStage at T+0.158071 seconds
> Checkpoint: RewriteHtmlStage at T+0.161643 seconds
>
> http://localhost:8080/gadgets/concat?container=default&amp%3bgadget=http%3A%2F%2Fchiarg.com%2Fopensocial%2Fcaja.xml&amp%3bdebug=0&amp%3bnocache=1&amp%3btype=js&amp%3b1=http%3A%2F%2Fchiarg.com%2Ftest.js:1+1
> - 2: Unexpected token <
>

Reply via email to