Re: [Generateds-users] generateDS.py infinite loop

George David Fri, 06 Mar 2015 20:39:55 -0800

Hi guys,

It's funny how when you look at your own code after some time, it's much
easier to critique.


For instance, why did I do this:

    while 1:
        if len(PostponedExtensions) <= 0:
            break

instead of this:

    while len(PostponedExtensions) > 0:

To be honest, I really don't remember this specific code segment, but I
think perhaps another solution would be better. I worry about capping the
number of loops because we could potentially prematurely end the
processing. I believe the expectation was that as we generate more classes,
more names would be added to AlreadyGenerated and/or to SimpleTypeDIct. In
order to guard against the infinite loop, we need to detect that
PostponedExtensions is in a state we already encountered. For example:

Assume this is the starting state:
PostponedExtensions: [ A, B, C, D ]

After loop 1, D was not found and inserted at the beginning
PostponedExtensions: [ D, A, B, C ]

After loop 2, C was found and processed
PostponedExtensions: [ D, A, B ]

After loop 3, B was not found and added at the beginning:
PostponedExtensions: [ B, D, A]

After loop 4, A was found and processed:
PostponedExtensions: [ B, D ]

After loop 5, D was not found and added at the beginning
PostponedExtensions: [ D B ]

After loop 6, B was not found and added at the beginning
PostponedExtensions: [ B, D ]

No we see that PostponedExtensions after loop 6 is in the same state as it
was after loop 4 and as a result, we are in an infinite loop. At this point
we should break out of the loop.

To detect the state, we can create a checksum of the state of
PostponedExceptions and keep it in a set. Once we detect a duplicate
checksum, we break out of the loop.

Here is my proposal:

     import hashlib

     #
     # Generate the elements that were postponed because we had not
     #   yet generated their base class.
     checksums = set()

     def isNewState():
        state = reduce(operator.concat, PostponedExtensions)
        sum = hashlib.sha1(state).hexdigest()
        if sum in checksums:
           return False
        checksums.add(sum)
        return True

     while len(PostponedExtensions) > 0:
         if not isNewState():
            break

         element = PostponedExtensions.pop()
         parentName, parent = getParentName(element)
         if parentName:
             if (parentName in AlreadyGenerated or
                     parentName in SimpleTypeDict):
                 generateClasses(wrt, prefix, element, 1)
             else:
                 PostponedExtensions.insert(0, element)

Perhaps a warning would be nice.

The --one-file-per-xsd will assume the schema supplied as the root schema.
In my case I don't have a schema that is a root schema, and so what I do is
run generateds for each XSD that I have. Unfortunately this results in
recreating a lot of schemas.

However, your statement that you should use a root xsd that imports all the
xsds was just the suggestion I needed. I changed my bash script from
running generateds on each xsd, to create a root xsd and only run
generateds on the root xsd. I went from taking over a minute to generate
the python classes to 10 seconds! Thanks. Such an obvious solution, yet it
never came to me.

George

On Fri, Mar 6, 2015 at 6:12 PM Dave Kuhlman <dkuhl...@davekuhlman.org>
wrote:

> On Fri, Mar 06, 2015 at 04:23:36PM +0000, Michael L. Vezie wrote:
> > (Sorry for sending a message this way -- I couldn't find an issue
> > tracker on bitbucket and sf won't let me add a bug)
> >
> > I think I've found a bug in generateDS.py. If I run it as:
> >
> > generateDS.py --one-file-per-xsd --output-dir=py manifest.xsd
> >
> > it works fine. But a different schema file, manifest-ack just hangs
> forever.
> >
> > I think I know where and maybe why.
> > Starting at line 6094 in the current version:
> >
> >
> >
> >     #
> >     # Generate the elements that were postponed because we had not
> >     #   yet generated their base class.
> >     while 1:
> >         if len(PostponedExtensions) <= 0:
> >             break
> >         element = PostponedExtensions.pop()
> >         parentName, parent = getParentName(element)
> >         if parentName:
> >             if (parentName in AlreadyGenerated or
> >                     parentName in SimpleTypeDict):
> >                 generateClasses(wrt, prefix, element, 1)
> >             else:
> >                 PostponedExtensions.insert(0, element)
> >
> >
> > This loop is (in some cases) an infinite loop. For some reason,
> > parentName is not in AlreadyGenerated so element is popped out of
> > PostponedExtensions and inserted back in it forever. If I add it to
> > a different list, then copy that list back to PostponedExtensions
> > after the loop has finished, it seems to work fine.
> >
> > It happens when I run it with --one-file-per-xsd on certain schemas,
> > but not others.
> >
> > If I change the loop to:
> >
> >     #
> >     # Generate the elements that were postponed because we had not
> >     #   yet generated their base class.
> >     nPostponedExtensions=[]
> >     while 1:
> >         if len(PostponedExtensions) <= 0:
> >             break
> >         element = PostponedExtensions.pop()
> >         parentName, parent = getParentName(element)
> >         if parentName:
> >             if (parentName in AlreadyGenerated or
> >                     parentName in SimpleTypeDict):
> >                 generateClasses(wrt, prefix, element, 1)
> >             else:
> >                 nPostponedExtensions.insert(0, element)
> >     for e in nPostponedExtensions:
> >         PostponedExtensions.append(e)
> >
> >
> > (inserting the element in a different list, then copying that list
> > back to the first after the loop has finished), it seems to work
> > fine.
> >
> > Attached are the two schemas, along with a common one they both include.
>
> Michael,
>
> Thanks for catching this error and for alerting me about it.  Your
> fix was very helpful, because it focused me on the area where the
> problem is and what is causing it.
>
> I've made a fix that is perhaps simpler and certainly dumber than
> yours.  The reason that I'd rather use this simpler fix is that the
> code that implements the --one-file-per-xsd feature was added
> by someone else.  Because of that I'm not too clear on what his
> intentions were and I don't want to take a chance on making a change
> that will affect some corner case that he wants to be able to
> handle.
>
> So, I've made a fix so that it no longer goes into an infinite loop
> and so that it loops a maximum of 10 times.
>
> So, I've made a change which causes an exit from that loop after a
> maximum number of iterations.  I admit that it's a kludge to handle
> what I believe is this abnormal situation.
>
> I've attached a patch file.  There are several changes.  The change
> we are concerned with it the one that has "maxLoops" and "loops" in
> it.
>
> This patch will protect us from that infinite loop.  But, an
> additional issue is whether you should be using this one-per feature
> for your schema at all.
>
> My belief is that when you use --one-file-per-xsd, you should give
> it a root XML schema file that has almost nothing in it except
> xs:import statements.  That root schema file sort of like a table of
> contents that tells generateDS.py, when run with --one-file-per-xsd,
> what files to generate and what schema files to use in order to
> generate each one.  I've also attached a Zip file
> (revised_schema.zip) containing schemas that roughly approximates
> your schemas with test02.xsd serving as the root schema.  Maybe that
> will give you hints about what I *believe* is the way to use this
> feature.
>
> So, I suppose my advice is to not use the --one-file-per-xsd feature
> unless you need it and, if you do need it, then you will want to
> design your schema to fit the way it works.
>
> Hope this helps.  Thanks again for helping me on this and also for
> your patience.
>
> Dave
>
> --
>
> Dave Kuhlman
> http://www.davekuhlman.org
> ------------------------------------------------------------
> ------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> generateds-users mailing list
> generateds-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/generateds-users
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Re: [Generateds-users] generateDS.py infinite loop

Reply via email to