I've submitted a patch to Caja which allows HTML entities in URIs
during ResolveUriStage.

http://codereview.appspot.com/1945041/show

Thanks,
Chirag

On Thu, Aug 5, 2010 at 9:13 AM, Gagandeep singh <[email protected]> wrote:
> +fargo, zhoresh
>
> Hi Chirag
>
> I have had similar doubts some time back when i thought that proxied /
> concatenated url's emitted by shindig should not have &amp; in them.
> However, somewhere <http://htmlhelp.com/tools/validator/problems.html> i
> read that when emitting html content, & should be escaped to &amp;
> Basically, this acts as a marker for the browser so that it treats it
> correctly as "&".
>
> I think what browsers (like chrome) end up doing when they read a url from
> the html is that they unescape it once. You might have seen this behavior in
> chrome when you load an escaped url and it unescapes it once (FF does not
> change the url location, but it might be doing the same thing).
>
> However the problem you describe seems to be a caja parsing problem. RFC
> 3986 describes the syntax of a URI, but the transport mechanism here is
> html. And html says that whenever you want the user agent to the interpret
> symbol "&", you need to return "&amp;".
>
> Also, i don't know if this is not a bug in shindig with Neko :)
> In shindig, DOM parsing and serialization is avoided by maintaining
> contentString, contentBytes and document. This bug might very well appear in
> shindig as well.
>
> John and Ziv can provide more info.
>
> On Wed, Aug 4, 2010 at 11:22 PM, Chirag Shah <[email protected]> wrote:
>
>> Hey,
>>
>> For a gadget that includes the content-rewrite feature, ConcatVisitor
>> will eventually concat the source urls and escape the "&" symbol with
>> "&amp;".
>> Since Apache Shindig uses the NekoSimplifiedHtmlParser, these entities
>> are preserved.
>>
>> Now over in CajaContentRewriter, Shindig passes the dom over to
>> DefaultGadgetRewriter#rewriteContent and Caja reparses this dom.
>>
>> DefaultGadgetRewriter#compileGadget line 211 from caja-r4135-src.jar
>>    compiler.addInput(AncestorChain.instance(new Dom(content)), baseUri);
>>
>> When Caja reparses this dom, the source URL in our script node gets
>> URL encoded (RFC 3986), and the "&amp;" gets rewritten to "&amp%3b"
>>
>> This finally causes Caja to log an error saying "Unable to cajole
>> gadget" during RewriteHtmlStage.
>>
>> This looks like one of those differences between Neko and Caja in
>> parsing HTML documents, but I'm not entirely sure which one is at
>> fault.
>>
>> Example gadget:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <Module>
>>  <ModulePrefs title="">
>>    <Require feature="content-rewrite"><Param
>> name="include-urls">.*</Param></Require>
>>    <Require feature="caja"/>
>>  </ModulePrefs>
>>  <Content type="html">
>>   <![CDATA[<script type="text/javascript"
>> src="http://chiarg.com/test.js";></script>]]>
>>  </Content>
>> </Module>
>>
>> Error:
>> INFO: Unable to cajole gadget:
>> com.google.caja.opensocial.GadgetRewriteException: Gadget has compile
>> errors
>> Checkpoint: LegacyNamespaceFixupStage at T+0.146571 seconds
>> Checkpoint: ResolveUriStage at T+0.158071 seconds
>> Checkpoint: RewriteHtmlStage at T+0.161643 seconds
>>
>> http://localhost:8080/gadgets/concat?container=default&amp%3bgadget=http%3A%2F%2Fchiarg.com%2Fopensocial%2Fcaja.xml&amp%3bdebug=0&amp%3bnocache=1&amp%3btype=js&amp%3b1=http%3A%2F%2Fchiarg.com%2Ftest.js:1+1
>> - 2: Unexpected token <
>>
>

Reply via email to