Dave,

Thank you for the response and excuse me for the late answer.

1) Regarding the utils/collect_schema_locations.py and utils/batch_generate.py.

When I ran the commands you've provided on the xsd file:

$ ./collect_schema_locations.py -f 
energistics/prodml/v2.0/xsd_schemas/DasAcquisition.xsd directives04.json
$ mkdir OnePer3
$ ./batch_generate.py --config=gds02.config directives04.json

I've got an error from which I see what probably was a reason with --one-file-per-xsd argument in my case. I see that there is a circular import in schema files (gmd.xsd imports gts.xsd which imports gml.xsd which again imports gbd.xsd and so on). So utils/batch_generate.py says it cannot find a file with a very long path that comes from that looped reference, like '../../../../gml/3.2.1/../../iso/19139/20070417/gmd/../../../../iso/19139/20070417/gts/../../../../gml/3.2.1/../../iso/19139/20070417/gmd ... etc'.

So here are my comments:

* I noticed that utils/collect_schema_locations.py only collects includes, not imports. * We somehow need to deal with nested imports/includes. The utils/collect_schema_locations.py only collects imports from the input file, but not from the schemas it imports/includes. Probably a solution would be to recursively scan files and put all required schemas in the directives.json file. * Then, circular imports/includes should be supported. I guess that may be a complicated thing. For collecting/loading schema files a solution would probably be to manage a set of already discovered schemas. But not sure how complicated would be to generate classes, we probably need to generate them in a correct order. How complicated do you think that would be?

I can try to help you with this. Just tell me if you have any comments regarding what I've written.

**Update**:

Actually, I've tried to run again generateDS.py on the xsd and it gives the same error (about file not found with circulare import). I'm not sure why it worked before and does not work now on the same file. I've tried both the newest version of generateDS and the old one I've used before, both lead to the same error.

Ok, after trying out more I figured out that only happens when I used absolute path to the xsd file. Do you have any idea why absolute path may be a problem? Then I'm not sure whether my comments are still applicable. As I understand now, circular imports/includes are supported in generateDS.py itself, but somehow not in case of --one-file-per-xsd. Still I see a reason to search for xsd files recursively and make one file for each of those xsd files.

2) Namespace definition behavior.

I see that your new --no-namespace-defs argument works fine. I currently don't see a use case for manually choosing namespace definitions with dictionary, but that could be useful. The reason to have only top-level namespace is to make an xml files more readable and smaller, by using namespace definition only where needed. Though yes, probably in some cases it would be needed to have it not in the top, maybe when using same namespace name for different namespaces in different parts of the xml...

3) Other issues I've mentioned before.

Do you have a plan at which issue to look next? Maybe I can try to investigate another one meanwhile.

4) One more thing I want to mention -- in generated code positional arguments are used for export, __init__ and other functions. When sublclassing it is more convenient when keyword arguments are used, since we can get value of a particular element by its name. I think it could make sense to change positional arguments to keyword arguments at least in the autogenerated code. Though I'm not sure that it would help if a user uses positional arguments.

Best regards,
Eugene

On 27.05.2017 00:20, Dave Kuhlman wrote:
Eugene,

I apologize for taking so long.  And, I do not have fixes for all
the issues that you report.

But, I think I've made some progress.

A few notes are below.

One more issue I've found is that in documentation it is written that
default parameter for export is --export="write literal" but in reality it
is --export="write".
I've updated the doc.  Thanks for reporting it.

2) When I try to use --one-file-per-xsd argument, I get the following error:
*** maxLoops exceeded.  Something is wrong with --one-file-per-xsd.
I believe that the problem occurs when we try to generate modules
from an incomplete schema.  Possibly, it's because when gDS attempts
to generate a class from element type A which extends element type
B, and the definition of element type B is in a part of the schema
that is not included (with xs:include or xs:import), then it cannot
generate the class for A without first generating its super-class,
which is the class for B, which is missing.

As an alternative strategy, I'm working on a replacement for the
--one-file-per-xsd capability.  That option seems too inflexible to
me.  So, what I've done is to implement two scripts to replace that
capability:

1. utils/collect_schema_locations.py -- Scans an XML Schema and
    collects the top level xs:include and xs:import references.  It
    writes them out in a (JSON) file that can be used by
    utils/batch_generate.py.

2. utils/batch_generate.py -- Reads the output file produced by
    collect_schema_locations.py.  For each reference in that file, it
    runs generateDS.py to produce a Python module.

So, I believe that the function and intent of these two scripts is
pretty much the same as the capability provided by
--one-file-per-xsd, *but* these scripts are small and relatively
eash to understand and their function is not hard-wired into
generateDS.py.  And, therefore, I'm hoping that they will be more
usable and will give us more flexibility.  When they do *not* do
what we want, we will be more easily able to modify them.

I've now got these two scripts working.  But I need to do more work on
them.  In particular I need to write some documentation.  And, I need
to make them more easy to use.  Right now they are a bit hard to
work with even for me, and I'm the one who implemented them.

I've attached these two scripts.  If you decide to try them, I'd
welcome your comments.

Here is how you might run them:

$ ./collect_schema_locations.py -f 
energistics/prodml/v2.0/xsd_schemas/DasAcquisition.xsd directives04.json
$ mkdir OnePer3
$ ./batch_generate.py --config=gds02.config directives04.json

And, if gds02.config contains the following:

     [generateds]
     verbose = true
     command = ./generateDS.py
     flags = -f --member-specs=dict
     in-path = energistics/prodml/v2.0/xsd_schemas
     out-path = OnePer3

You would end up with the following files in subdirectory OnePer3/:

     OnePer3/DtsInstrumentBox.py
     OnePer3/FiberOpticalPath.py
     OnePer3/ProdmlCommon.py
     OnePer3/SubProdmlCommon.py

Here is what the directives file that produced the above modules
looks like:

     {
         "directives": [
             {
                 "schema": "DtsInstrumentBox.xsd",
                 "outfile": "DtsInstrumentBox.py",
                 "outsubfile": "",
                 "flags": ""
             },
             {
                 "schema": "ProdmlCommon.xsd",
                 "outfile": "ProdmlCommon.py",
                 "outsubfile": "SubProdmlCommon.py",
                 "flags": ""
             },
             {
                 "schema": "FiberOpticalPath.xsd",
                 "outfile": "FiberOpticalPath.py",
                 "outsubfile": "",
                 "flags": ""
             }
         ]
     }

Note that I manually added the line:

     "outsubfile": "SubProdmlCommon.py",

OK.  I admit.  That process seems a bit complex.  I'll work on that.

Note that if while generating one of the modules using the above
procedure, there is a missing and needed element type definition
(for example, element type A extends element type B and the
definition of element type B is missing), then we'll still get the
error that you reported.  This procedure only lets us narrow and
control the generation of these multiple modules, for example by
editing the directives file that is input to batch_generate.py.

3) Namespace definition behavior -- by default, generateDS puts namespace
definition in every export method of generated classes. That is, every
element in an exported xml that has children would have namespace
definition.  But what if I only want namespace definition in top-level
element?  For example, I want this:
This one is my next task.

I'm thinking perhaps if we had an additional command line option
--no-namespace-defs.  If you use that option, we never export the
namespace definitions.  So, then at the top level you would add the
namespacedef_="xmlns:abc=xxx" and it would not be passed down to
child elements.  I'll see if I can come up with an example for you
to review.

But, before we pursue that approach (a --no-namespace-defs command
line flag), we should ask what our (the user's actually) needs and
goals are?  I'm thinking that perhaps the user needs a more fine
grained control over which elements are generated with which
namespace definitions (xmlns:xx="yyy") and when.  Consider the
following range of possible controls:

1. Enable use to specify namespace definitions to be generated on
    the export of *all* elements.  This is the current capability.
    gDS attempts to automatically detect the needed namespace
    definition and the --namespacedef command line option.

2. Enable the user to request that no namespace definitions are
    generated on the export of any elements.  This might be done with
    a new --no-namespace-defs command line option.

3. Enable the user to specify the namespace definitions to be
    generated on each element type.  This might be done by enabling
    the user to provide a (JSON? XML?) table/dictionary that maps
    element complexType names to namespace definitions (strings of
    the form 'xmlns:xx="yyy" xmlns:zz="www" ...').

Perhaps we need both #2 and #3.  No. 2 is quick and easy.  No. 3
will take me a little longer, but should not be too complex or
difficult, even for me.

So, I'll do some more exploration and will report back later.  If
you have comments or suggestions, I'll welcome them.

[later ...]

OK, here is what I've done:

1. There is now a new command line option for generateDS.py.  When
    --no-namespace-defs is used, the default value for the
    namespacedef_ parameter for each `export` method will be "".
    This means that namespace prefix definitions will be generated
    only for the top level (outer most) element and only when
    explicitly passed in to the call to ``export()``.  Also note that
    the `parse()` function generated near the bottom of each module
    may already do this.

2. Implemented the capability to use a manually edited dictionary
    that enables you to specify the namespace prefix definitions to
    be exported with specific element types.  OK, I realize that the
    same element type can occur at different levels and that you
    might want the namespace prefix definitions on upper ones but not
    lower (enclosed) ones.  Still, this capability gives you more
    control than you have now.

Attached are:

- collect_schema_locations.py -- Collect xs:include and xs:import
   references for batch generation.

- batch_generate.py -- Batch generation of modules.

- directives06.json -- Sample directives file for batch generation
   of modules.

- gds02.config -- Sample configuration file for use with --config
   option to batch_generate.py.

- generatedsnamespaces.py -- Sample module containing a dictionary
   that specifies namespace prefix definitions to be attached to
   specific element types during export.

Yet to be done:

- Add some documentation for the collect_schema_locations.py and
   batch_generate.py scripts.

- Add documentation for the added namespace prefix definition
   command line option and the prefix mapping dictionary module.

- Additional testing -- In particular, I suspect that
   batch_generate.py does not do error reporting in a reasonable way.

Any comments or guidance that you might want to give is welcome.

Dave

On Wed, May 03, 2017 at 02:05:31PM +0300, Eugene Petkevich wrote:
Hello Dave,

Thank you for the quick answer.

Regarding (2), here are the xsd files:
https://www.dropbox.com/s/x5kljbv3gjsem1h/energistics.zip?dl=0 , and the
file that didn't work is 'prodml/v2.0/xsd_schemas/DasAcquisition.xsd' in the
archive.

One more issue I've found is that in documentation it is written that
default parameter for export is --export="write literal" but in reality it
is --export="write".

Regards,
Eugene

On 02.05.2017 01:45, Dave Kuhlman wrote:
Eugene,

Hello.  I'm glad generateDS.py has been helpful.  Thanks for letting
me know.

Here are a few comments:

1. The file gends_user_methods.py is in the source distribution.
     You can find that here:
     https://dkuhl...@bitbucket.org/dkuhlman/generateds

     The documentation is wrong on that.  I'll fix it.

2. With respect to on-file-per-xsd -- In the test directory
     (generateds/tests/ again in the source distribution) there is a
     test that uses that option.  Perhaps you can look at that for
     clues.  The files of interest are:

         generateds/tests/oneper00.xsd
         generateds/tests/oneper02.xsd
         generateds/tests/oneper01.xsd
         generateds/tests/oneper03.xsd

     The unit test when run, generates output modules in subdirectory
     tests/OnePer.

     The command used to run that test is in tests/test.py in method
     test_022_one_per.  Here is that command:

          def test_022_one_per(self):
              cmdTempl = (
                  'python generateDS.py --no-dates --no-versions '
                  '--silence --member-specs=list -f '
                  '--one-file-per-xsd 
--output-directory="tests/Ot_022_one_perePer" '
                  '--module-suffix="One" '
                  '--super=%s2_sup '
                  'tests/%s00.xsd'
              )
              t_ = 'oneper'
              cmd = cmdTempl % (t_, t_, )
              o
              o
              o

     More specifically, about the maxLoops message, that error means
     that you have an element definition that extends another element
     definition, but generateDS.py thinks it should not generate the
     class for the extension because it has not yet generated the
     class for the base/parent.  I've had to work on this once before.
     But, I don't know why that error is happening in your case.

     Do you have a schema that produces this error and that you could
     send me.  If you do, I take a look.

3. With respect to the namespace definition behavior and the
     repeated namespace definitions -- I'll take a look to see how
     this can be done.

6. About parsing from a file-like object -- Actually, if I
     understand you correctly, this already works.  You can pass a
     file object that is open for reading to the generated parse
     functions.  The parameter name is misleading, I suppose.  But,
     lxml.etree.parse does accept either a string file name or a file
     object.

More tomorrow when I have a bit more time.

Thanks for the detailed report.

Dave


On Mon, May 01, 2017 at 11:18:00AM +0300, Eugene Petkevich wrote:
Hello,

Thank you for the GenerateDS library.  I find it very useful.  I have a
couple of things to ask:
[snip]



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Reply via email to