[Generateds-users] Problem with ampersand in string simple content of complex type

Ardan Patwardhan Fri, 13 Nov 2015 09:40:25 -0800

Dear Dave and all

Thanks for providing such a useful package - I have been using for over an year 
on a number of projects! However recently I stumbled upon the following 
problem. In a schema that I am using I have a complex type with simple content 
described as follows:
<xs:complexType name="sciSpeciesType">
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute name="ncbiTaxId" type="xs:integer"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>


I had a user who had filled in an XML with the following data:
<sciSpeciesStrain>Armache &amp; Anger (AA) PBMCs</sciSpeciesStrain>

I used generateDS to generate a class to read this and to write it out and the 
output was as follows:
<sciSpeciesStrain>Armache & Anger (AA) PBMCs</sciSpeciesStrain>

The '&' symbol being illegal in XML in this form caused downstream issues when 
the file was re-read.

I tested a simple type string or token with '&amp;' and this was written out as 
I expected as '&amp;'

Looking through the generateDS code I found that when a simple string or token 
is being written out, the value is escaped using the quote_xml function, but 
this was not the case for simple content of a complex type.

So I modified the line 2725 in generateDS.py (2.17a0):
 wrt("            outfile.write(str(self.valueOf_).encode("
                    "ExternalEncoding))\n")

To >>

 wrt("            outfile.write((quote_xml(self.valueOf_) if 
type(self.valueOf_) is str else str(self.valueOf_)).encode("
                    "ExternalEncoding))\n")

In

def generateExportFn(wrt, prefix, element, namespace, nameSpacesDef):
    childCount = countChildren(element, 0)
    name = element.getName()
    base = element.getBase()
    
    wrt("    def export(self, outfile, level, namespace_='%s', "
        "name_='%s', namespacedef_='%s', pretty_print=True):\n" %
        (namespace, name, nameSpacesDef))
    wrt('        if pretty_print:\n')
    wrt("            eol_ = '\\n'\n")
    wrt('        else:\n')
    wrt("            eol_ = ''\n")
    # We need to be able to export the original tag name.
    wrt("        if self.original_tagname_ is not None:\n")
    wrt("            name_ = self.original_tagname_\n")
    wrt('        showIndent(outfile, level, pretty_print)\n')
    wrt("        outfile.write('<%s%s%s' % (namespace_, name_, "
        "namespacedef_ and ' ' + namespacedef_ or '', ))\n")
    wrt("        already_processed = set()\n")
    wrt("        self.exportAttributes(outfile, level, "
        "already_processed, namespace_, name_='%s')\n" %
        (name, ))
    # fix_abstract
    if base and base in ElementDict:
        base_element = ElementDict[base]
        # fix_derived
        if base_element.isAbstract():
            pass
    if childCount == 0 and element.isMixed():
        wrt("        outfile.write('>')\n")
        wrt("        self.exportChildren(outfile, level + 1, "
            "namespace_, name_, pretty_print=pretty_print)\n")
        wrt("        outfile.write('</%s%s>%s' % (namespace_, name_, eol_))\n")
    else:
        wrt("        if self.hasContent_():\n")
        # Added to keep value on the same line as the tag no children.
        if element.getSimpleContent():
            wrt("            outfile.write('>')\n")
            if not element.isMixed():
>>                wrt("            outfile.write((quote_xml(self.valueOf_) if 
>> type(self.valueOf_) is str else str(self.valueOf_)).encode("
                    "ExternalEncoding))\n")
        else:
….


This is a quick hack and I am sure there are better ways of doing this. It 
solved my problem but I would appreciate your feedback.

Many thanks and best wishes


------------------------------------------------------------------------------
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

[Generateds-users] Problem with ampersand in string simple content of complex type

Reply via email to