Re: [Generateds-users] improper CDATA handling.

Dave Kuhlman Sun, 08 Feb 2015 19:39:07 -0800

On Fri, Feb 06, 2015 at 01:05:42AM +0000, George David wrote:
> Hi Dave,
> 
> I created a xsd that has an element called script. The intent is to allow
> users to send us javascript that is encoded with CDATA tags.
> 
> In the attached files you can see that I set the script variable as follows:
> cdataObj = Cdata()
> 
> script='''<![CDATA[
>     var x, text;
> 
>     // Get the value of the input field with id="numb"
>     x = document.getElementById("numb  one").value;
> 
>     // If x is Not a Number or less than one or greater than 10
>     if (isNaN(x) || x < 1 || x > 10) {
>         text = "Input not valid";
>     } else {
>         text = "Input OK";
>     }
>     document.getElementById("demo").innerHTML = text;
> ]]>'''
> cdataObj.set_script(script)
> 
> I exported it:
> 
> cdataObj.export(sys.stdout, 0, name_='cdata')
> 
> And got the following:
> 
> <cdata:cdata xmlns:cdata="urn:cdata">
>     <cdata:script>&lt;![CDATA[
>     var x, text;
> 
>     // Get the value of the input field with id="numb"
>     x = document.getElementById("numb  one").value;
> 
>     // If x is Not a Number or less than one or greater than 10
>     if (isNaN(x) || x &lt; 1 || x &gt; 10) {
>         text = "Input not valid";
>     } else {
>         text = "Input OK";
>     }
>     document.getElementById("demo").innerHTML = text;
> ]]&gt;</cdata:script>
> </cdata:cdata>
> 
> Note that the CDATA wrappers have been encoded <![CDATA[ has been changed
> to &lt;![CDATA[ and ]]> has been changed to ]]&gt;


George,

Good to hear from you again.

One solution to the above is to use a more intelligent replacement.
The attached patch uses the re module and two regular expressions to
replace (escape) "<" and ">" without replacing "<![CDATA[" and "]]>".

> 
> Also notice that the < and > signs in the java script have also been
> encoded. I believe there should be code to check for the CDATA tags and not
> xml encode it if they exist. I'll try to track this down in the code but I
> wanted to make sure this wasn't done on purpose.
> 
> There is another problem with CDATA. If I create an xml string with CDATA
> and parse it like this:

Re: the missing CDATA wrappers:

The problem is that when the generated code uses lxml to parse an
XML instance doc, lxml strips away the "<![CDATA[" and "]]>".  I
don't believe that we can even tell that they were there in the
first place.  The attached script (cdata_demo.py) attempts to
demonstrate this.

So, after that XML instance doc has been parsed, there is no way to
tell that the CDATA tags were there in the first place.

Wait ... I did one more Web search ...

It's even the case that lxml has a special provision for this issue.
I found this: http://lxml.de/api.html#cdata

(It's incredible what kind of hidden information you can find with a
Web search engine.  You should try one sometime.  But, seriously, ...)

However, when you use ``element.text`` to capture the text data, the
CDATA tags are still missing, even though when you use
``etree.tostring(some_element)`` they are there.

I haven't figured out how to deal with this, yet.  I'll think a bit
more on it.

If you can think of a work-around for this, please let me know.

On an unrelated subject -- generateDS.py does not handle multiple
namespaces in the same XML schema, in particular when ``<xs:import
...>`` is used.  I've had several reports about this.  If I recall
correctly, you contributed the code that implements
--one-file-per-xsd.  I'm wondering if that might be helpful in some
of these situations.  If you have any comments or suggestions about
this, I'd be interested in hearing them.

And, have you had any experience with lxml.objectify?
(http://lxml.de/objectify.html)  I'm wondering whether it might
solve some of these problems (in particular the namespaces and CDATA
issues) better that generateDS.py does.  Maybe we can learn
something from it.

More later.

Dave

> 
> xml='''
> <cdata:cdata xmlns:cdata="urn:cdata">
>     <cdata:script><![CDATA[
>     var x, text;
> 
>     // Get the value of the input field with id="numb"
>     x = document.getElementById("numb  one").value;
> 
>     // If x is Not a Number or less than one or greater than 10
>     if (isNaN(x) || x &lt; 1 || x &gt; 10) {
>         text = "Input not valid";
>     } else {
>         text = "Input OK";
>     }
>     document.getElementById("demo").innerHTML = text;
> ]]></cdata:script>
> </cdata:cdata>
> '''
> cdata.parseString(xml)
> 
> It incorrectly strips out the CDATA tags:
> 
> parseString spits out xml with the CDATA tags removed.:
> 
> <?xml version="1.0" ?>
> <cdata:cdata xmlns:cdata="urn:cdata">
>     <cdata:script>
>     var x, text;
> 
>     // Get the value of the input field with id="numb"
>     x = document.getElementById("numb  one").value;
> 
>     // If x is Not a Number or less than one or greater than 10
>     if (isNaN(x) || x &amp;lt; 1 || x &amp;gt; 10) {
>         text = "Input not valid";
>     } else {
>         text = "Input OK";
>     }
>     document.getElementById("demo").innerHTML = text;
> </cdata:script>
> </cdata:cdata>
> 
> 
> And on printing the script specifically, I also don't have CDATA tags
> anymore and the < and > are xml encoded.
> 
> print cdataObj.get_script()
> 
>     var x, text;
> 
>     // Get the value of the input field with id="numb"
>     x = document.getElementById("numb  one").value;
> 
>     // If x is Not a Number or less than one or greater than 10
>     if (isNaN(x) || x &lt; 1 || x &gt; 10) {
>         text = "Input not valid";
>     } else {
>         text = "Input OK";
>     }
>     document.getElementById("demo").innerHTML = text;
> 
> 
> I'll see if I can track this down also. If you could give me a hint of
> where to look that would be helpful.
> 
> Thanks,
> George


-- 

Dave Kuhlman
http://www.davekuhlman.org

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
generateds-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/generateds-users

Re: [Generateds-users] improper CDATA handling.

Reply via email to