Re: [Generateds-users] generateDS, CDATA and export

Adrian Cook Wed, 06 May 2015 19:40:12 -0700

Hi Dave and list,

Sorry for the delay in getting back to you. I've just integrated the new
change for CDATA and it works perfectly. Thanks so much for working through
this, it's working a treat so far on the examples, I'll be pushing to
staging to test with a wider variety of input xml files but can't see that
being an issue now.


As you point out I dow now have issues with the xs:any but you've given me
links on dealing with that.

Thanks again, super!

Adrian



On Wed, May 6, 2015 at 2:24 PM, Dave Kuhlman <[email protected]>
wrote:

> On Mon, Apr 27, 2015 at 01:10:03PM +1000, Adrian Cook wrote:
> > Hi Dave, attached is a test case that exhibits the missing CDATA.
> >
> > Cheers
> >
>
> Adrian,
>
> Thanks for those test cases.  That was very helpful.
>
> I believe I have a fix.  It seems to do what we want it to do with
> vast3_draft.xsd and sample.xml.
>
> I've attached a patch, which I believe you can apply against that
> last fix I sent to you.  But, it might be easier to get the complete file
> (generateDS.py) at Bitbucket:
>
>     https://bitbucket.org/dkuhlman/generateds
>
> There is still one, unrelated issue with this schema and the example
> XML instance doc (sample.xml) -- sample.xml contains the following
> (after a bit of pretty-printing with xmllint):
>
>     <Extensions>
>         <Extension type="LR-Pricing">
>             <Price model="CPM" currency="USD"
> source="spotxchange"><![CDATA[1]]></Price>
>         </Extension>
>         <Extension type="SpotX-Count">
>             <total_available><![CDATA[1]]></total_available>
>         </Extension>
>     </Extensions>
>
> And, that element type is defined by this in vast3_draft.xsd:
>
>     <xs:complexType name ="Extensions_type">
>       <xs:sequence>
>         <xs:element name="Extension" minOccurs="0" maxOccurs="unbounded">
>           <xs:annotation>
>             <xs:documentation>Any valid XML may be included in the
> Extensions node</xs:documentation>
>           </xs:annotation>
>           <xs:complexType>
>             <xs:sequence>
>               <xs:any minOccurs="0" maxOccurs="unbounded"
> processContents="lax" namespace="##any" />
>             </xs:sequence>
>             <xs:anyAttribute namespace="##any" processContents="lax" />
>           </xs:complexType>
>         </xs:element>
>       </xs:sequence>
>     </xs:complexType>
>
> But, generateDS.py cannot handle the xs:any.  It needs to know what
> type of element it is so that it can build it using the Python class
> that was generated from that type definition.  The following section
> of the documentation might give you some help with handling that:
>
>   http://www.davekuhlman.org/generateDS.html#support-for-xs-any
>
> Let me know whether this patch works for you and whether you find
> additional problems.
>
> Dave
>
> >
> >
> > On Mon, Apr 27, 2015 at 1:10 PM, Adrian Cook <[email protected]>
> wrote:
> >
> > > Hi Dave,
> > >
> > > Thanks so much for the update.
> > >
> > > I've built a new parser using the supplied patch.
> > >
> > > For the simple case it's working fine, for more complex XSDs it seems
> to
> > > be hit and miss as to whether the CDATA is held over from parse to
> export.
> > >
> > > I've put together a zip of a test case and emailed it to you
> separately.
> > >
> > > There are simple scripts to create the parser and run it, the output of
> > > the run is the result of an export, you can then compare that to the
> input.
> > >
> > > I've not been able to identify exactly what the issue is but I'll keep
> > > looking at it.
> > >
> > > Cheers
> > >
> > > Adrian
> > >
> > >
> > >
> > > On Sun, Apr 26, 2015 at 1:07 PM, Dave Kuhlman <
> [email protected]>
> > > wrote:
> > >
> > >> On Fri, Apr 24, 2015 at 04:32:17PM +1000, Adrian Cook wrote:
> > >> > Hi Dave Kuhlman and the list,
> > >> >
> > >> > Thanks so much for generateDS, I've been using a generateDS parser
> > >> > successfully on millions of XML files.
> > >> >
> > >> > I have a question regarding CDATA.
> > >> >
> > >> > I'm developing in an environment where I am processing thousands of
> XML
> > >> > files per minute and I've been using generateDS to create the
> parser for
> > >> > processing. These files all contain a lot of CDATA and are from
> third
> > >> > parties.
> > >> >
> > >> > It's been fine up to now because I am only using generateDS to
> parse the
> > >> > xml and make decisions based on that. I now have a requirement to
> mutate
> > >> > the data loaded using parser and export new XML.
> > >> >
> > >> > What I am finding is that the CDATA start and end markup is lost
> from
> > >> the
> > >> > exported text .
> > >> >
> > >> > I've pasted an example at the bottom. This is all pretty much
> vanilla
> > >> use
> > >> > of generateDS, parse and export using some of the unit test XSD and
> XML
> > >> > files.
> > >> >
> > >> > I've read through the list archive and noted a correspondence where
> > >> this is
> > >> > mentioned
> > >> >
> > >> > Are there any plans or approaches to address this in generateDS? I
> can
> > >> see
> > >> > some comments in the list archive where this issue is mentioned and
> that
> > >> > also indicate CDATA is a poor decision and should be avoided,
> > >> unfortunately
> > >> > I cannot change the dependence of my system on CDATA.
> > >>
> > >> Adrian,
> > >>
> > >> Good to hear from you.  I'm glad that generateDS has been useful.
> > >>
> > >> Short story first --
> > >>
> > >> I've patched so that it has support for this.  Specifically, if you
> > >> run generateDS.py with the new command line option
> > >> "--preserve-cdata-tags", then the generated code will preserve the
> > >> CDATA tags, that is the resulting string values will contain
> > >> "<![CDATA[" and "]]>".  And, if you do *not* use the
> > >> "--preserve-cdata-tags" command line option, then the behavior is
> > >> unchanged.
> > >>
> > >> I've attached a patched version of generateDS.py in a separate
> > >> email.  Please let me know if this does what you expect and need.
> > >>
> > >> I'm going on vacation for 3 days next week.  I've got a chance to go
> > >> car campling on the north California, USA coast near Ft. Bragg.
> > >> But, I'll look into this some more when I return.
> > >>
> > >> In the meantime, thanks for reporting this.
> > >>
> > >> And now, the long story -- You can ignore the following unless you
> > >> want to learn more (maybe) about CDATA and my thinking while trying
> > >> to work my way through this.
> > >>
> > >> So, let's try to be (pedantically) specific about what the problem
> > >> is:
> > >>
> > >> 1. generateDS handles CDATA on import/parsing (actually lxml does
> > >>    this for us).  Good.
> > >>
> > >> 2. generateDS handles text on output/export even when there are
> > >>    special characters in CDATA sections by escaping those special
> > >>    characters as XML entities (e.g. "&lt;").  Good.
> > >>
> > >> 3. generateDS does *not* preserve CDATA sections on output/export.
> > >>    Bad for some applications.
> > >>
> > >> There are difficulties with handling item 3, above.  Lxml normally
> > >> throws away the CDATA tags when it parses a document.  I thought
> > >> there was no way around this.  However, while thinking about your
> > >> question, this morning, I decided to do one more Web search.
> > >> Actually, George David, another list member who has done some work
> > >> on this, had earlier pointing me at this ability, but I did not read
> > >> carefully enough.  Anyway, I found that there is a way to preserve
> > >> those CDATA tags by creating a special parser:
> > >>
> > >>     from lxml import etree
> > >>
> > >>     def test():
> > >>         p = etree.XMLParser(strip_cdata=False)
> > >>         d1 = etree.parse('test01.xml', parser=p)
> > >>         r1 = d1.getroot()
> > >>         print etree.tostring(r1)
> > >>
> > >>     test()
> > >>
> > >> That would seem to suggest that all we have to do in the generated
> > >> code is to create a special "strip_cdata=False" parser, which would
> > >> be a simple 1 or 2 line change.  But ...
> > >>
> > >> There is still a problem.  The only way to get the text with the
> > >> CDATA tags included is to serialize the element, and when you do so,
> > >> you get the surrounding XML tags as well.  For example, with the
> > >> sample data that you include below, when you do this:
> > >>
> > >>     etree.tostring(element)
> > >>
> > >> we'd get something like:
> > >>
> > >>     <script><![CDATA[ccc < ddd & eee]]></script>
> > >>
> > >> So, in order to capture the text *with* CDATA tags, we'd have to do
> > >> something like the following:
> > >>
> > >>     value1 = etree.tostring(element).strip()
> > >>     mo = re.search(r'^<.+?>(.*)<.+>$', value1)
> > >>     value2 = mo.group(1)
> > >>
> > >> Ick.  Maybe even: Yuck.
> > >>
> > >> OK.  I'm over-reacting.  And, it can be made prettier by
> > >> pre-compiling the regular expression.
> > >>
> > >> I'd rather not make that general change, since this feature is
> > >> seldom needed, I believe.  Most users will not want to deal with the
> > >> <![CDATA[" and "]]>".
> > >>
> > >> We could add (yet) another command line option to turn on this special
> > >> behavior.
> > >>
> > >> OK.  I gave it a shot.  Added the command line option
> > >> ("--preserve-cdata-tags").  Seems to work, but definitely needs more
> > >> testing.  I've attached a patched version of generateDS.py (in a
> > >> separate email so as not to shove a large email into the list).
> > >>
> > >> Memo to Dave -- From now on, do less whining, and write more code.
> > >> Although, it is a good idea to think these things through, first.
> > >>
> > >> Let me know if you really do need this behavior.  Also, let me know
> > >> if I've really implemented the behavior that you need.  Then, I'll
> > >> work on it a bit more, do more testing, create a unit test, etc.
> > >>
> > >> And, you can find more information about handling CDATA sections
> > >> here:
> > >>
> > >> - http://lxml.de/api.html#cdata
> > >>
> > >> - http://lxml.de/parsing.html#parser-options
> > >>
> > >> - http://lxml.de/FAQ.html#parsing-and-serialisation
> > >>
> > >> -
> > >>
> https://mailman-mail5.webfaction.com/pipermail/lxml/2015-February/007409.html
> > >>
> > >>   This comment at the end of the above email thread:
> > >>
> > >>      "I wouldn't bother. CDATA[] is more of a convenience
> > >>       work-around when you are manually editing XML. In generated
> > >>       XML, it's not very useful."
> > >>
> > >>   should be a caution to us not to get too enthusiastic about using
> > >>   CDATA section, although it sounds like in your case, it's needed.
> > >>
> > >> Sorry, for being so wordy.  I needed to get myself to think this
> > >> through.
> > >>
> > >> Dave
> > >>
> > >> >
> > >> > Thanks in advance for any pointers.
> > >> >
> > >> > Adrian Cook
> > >> >
> > >> > Source XML:
> > >> >
> > >> > <cdataListType>
> > >> >     <cdatalist>
> > >> >         <script><![CDATA[ccc < ddd & eee]]></script>
> > >> >     </cdatalist>
> > >> >     <cdatalist>
> > >> >         <script>aaa &lt; bbb <![CDATA[ccc < ddd]]> eee &lt; &amp;
> > >> > fff&lt;<![CDATA[ggg < & hhh]]>&amp; iii &lt; jjj</script>
> > >> >     </cdatalist>
> > >> > </cdataListType>
> > >> >
> > >> > After export:
> > >> >
> > >> > <cdataListType>
> > >> >     <cdatalist>
> > >> >         <script>ccc &lt; ddd &amp; eee</script>
> > >> >     </cdatalist>
> > >> >     <cdatalist>
> > >> >         <script>aaa &lt; bbb ccc &lt; ddd eee &lt; &amp; fff&lt;ggg
> &lt;
> > >> > &amp; hhh&amp; iii &lt; jjj</script>
> > >> >     </cdatalist>
> > >> > </cdataListType>
> > >>
> > >> --
> > >>
> > >> Dave Kuhlman
> > >> http://www.davekuhlman.org
> > >>
> > >>
> > >
> --
>
> Dave Kuhlman
> http://www.davekuhlman.org
>

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
generateds-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/generateds-users

Re: [Generateds-users] generateDS, CDATA and export

Reply via email to