On Fri, Dec 23, 2016 at 04:00:56PM -0800, Clint Pitzak wrote:

Clint,

I believe I see what the problem is.  That restriction contains a
vertical bar.  generateDS.py generates code that turns that
restriction value into a regular expression, and then uses the
Python re (regular expression) module to test for a match.  But, a
vertical bar means something special in a regular expression.

I'll have to give this a little thought.  Why did I use the re
module if we just want to test for equality?  That seems like
over-kill when a simple test for equality using the "==" operator
would accomplish what we really want.  Or, was there some reason for
using the re module that I do not remember now?

Strict equality is the test that we want, right?

No.  Wait.  There must be *some* reason why I implemented this with
regular expression matching.  I looked at the XML Schema
documentation again.  Read this from
https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#CreatDt:

        The purchase order schema contains another, more elaborate, example of 
a simple
        type definition. A new simple type called SKU is derived (by 
restriction) from
        the simple type string. Furthermore, we constrain the values of SKU 
using a
        facet called pattern in conjunction with the regular expression
        "\d{3}-[A-Z]{2}" that is read "three digits followed by a hyphen 
followed by
        two upper-case ASCII letters":

        Example
        Defining the Simple Type "SKU"
        <xsd:simpleType name="SKU">
          <xsd:restriction base="xsd:string">
                <xsd:pattern value="\d{3}-[A-Z]{2}"/>
          </xsd:restriction>
        </xsd:simpleType>

That suggests that the pattern really is a regular expression.

Also see
https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#rf-pattern,
where it says:

        4.3.4.4 pattern Validation Rules

        Validation Rule: pattern valid
        A literal in a ·lexical space· is facet-valid with respect to ·pattern· 
if:
        1 the literal is among the set of character sequences denoted by
       the ·regular expression· specified in {value}.

I tried testing with the pattern that generateDS.py  generates, and it confirms
your report.  For example (in the ipython interactive Python shell):

        In [20]: print re.search('^Electronic Materials|ELECTRONIC MATERIALS$', 
'Electronic Materialsz')
        <_sre.SRE_Match object at 0x7f95a87fd4a8>
        In [21]: print re.search('^Electronic Materials|ELECTRONIC MATERIALS$', 
'Electronic Materials')
        <_sre.SRE_Match object at 0x7f95a87fd4a8>
        In [22]: print re.search('^Electronic Materials|ELECTRONIC MATERIALS$', 
'Electronic Material')
        None
        In [23]: print re.search('^Electronic Materials|ELECTRONIC MATERIALS$', 
'Electronic Materials|ELECTRONIC MATERIALS')
        <_sre.SRE_Match object at 0x7f95a87fd4a8>
        In [24]: 
        In [24]: print re.search('^Electronic Materials|ELECTRONIC MATERIALS$', 
'Electronic Materials|ELECTRONIC MATERIALSx')
        <_sre.SRE_Match object at 0x7f95a87fd4a8>
        In [25]: print re.search('^Electronic Materials|ELECTRONIC MATERIALS$', 
'Electronic Materials|ELECTRONIC MATERIAL')
        <_sre.SRE_Match object at 0x7f95a87fd4a8>
        In [26]: 

When re.search returns a match object (_sre.SRE_Match), that means
that it successfully matched.

So, if we take the XML Schema documentation seriously, perhaps that
pattern in the schema should actually be the following (note the
added backslash escape):

        <xsd:simpleType>
                <xsd:restriction base="xsd:string">

                        <!-- ADVANCED ELECTRONICS  -->
                        <xsd:pattern value="Electronic Materials\|ELECTRONIC 
MATERIALS" />

Then we'd get these results:

        In [26]: 
        In [26]: print re.search('^Electronic Materials\|ELECTRONIC 
MATERIALS$', 'Electronic Materials|ELECTRONIC MATERIALS')
        <_sre.SRE_Match object at 0x7f95a87fd4a8>
        In [27]: print re.search('^Electronic Materials\|ELECTRONIC 
MATERIALS$', 'Electronic Materials|ELECTRONIC MATERIALSx')
        None
        In [28]: print re.search('^Electronic Materials\|ELECTRONIC 
MATERIALS$', 'Electronic Materials|ELECTRONIC MATERIAL')
        None
    In [29]: print re.search('^Electronic Materials\|ELECTRONIC MATERIALS$', 
'Electronic Materials')
    None
    In [30]: print re.search('^Electronic Materials\|ELECTRONIC MATERIALS$', 
'Electronic Materialss')
    None

What do you think?  Could the schema actually be wrong?  Not likely,
but ...

Dave


> Hi Dave,
> 
> I wanted to first say amazing python module. Thank you very much.
> 
> I found a bug in the validation that was produced. Specifically the xsd I'm
> using has the restriction:
> 
> <xsd:pattern value="Electronic Materials|ELECTRONIC MATERIALS"/>
> 
> and generateDS allows "Electronic Materialss"
> 
> To reproduce the error you can use generateDS to generate the api from:
> http://www.dtic.mil/dtic/pdf/ird_xml_data_submission.xml
> 
> And then for COISubArea provide the text "Electronic Materialss" which is
> invalid according to the xsd from
> http://www.dtic.mil/dtic/pdf/ird_xml_data_submission.xml
> 
> However, generateDS doesn't warn that its invalid in the validator.
> 
> If you change it to "Electronic Materialsz" it will also not warn.
> 
> If you change it to "Electronic Material" it will warn.
> 
> Thanks again,
> 
>   Clint

-- 

Dave Kuhlman
http://www.davekuhlman.org

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Reply via email to