Rohan,
I agree with your comment that the user should work with unicode
strings (not bytes) and should pass strings to the generated library
functions. That the user should be working with unicode and not
bytes is more in the spirit of Python 3, if I understand it.
There is a patch attached.
If that does *not* do what you want or expect, you might read the
following. Maybe it will suggest what guidance you should give me.
The attached patch implements alternative 2, below.
Note: the function parseString calls parsexml_, which are both in
the generated library module.
Some alternatives:
1. Modify parseString so that it converts unicode to bytes and
stores that in an io.BytesIO (a file like object) and passes that
to parsexml_. Good, except ...
It seems wasteful to convert from unicode to bytes, especially
for large XML documents and especially since it is likely that
the user decoded from bytes before calling parseString.
2. Create a new implementation of parsexml_ called parsexmlstring_.
This new function will call lxml.etree.fromstring (instead of
lxml.etree.parse, as parsexml_ does). Since
lxml.etree.fromstring parses a string (under both Python 2 and
3), we're good, except ...
The string cannot start with an XML declaration containing an
encoding (for example, "<?xml version="1.0" encoding="UTF-8"?>"),
because lxml.etree.fromstring throws this error:
ValueError: Unicode strings with encoding declaration are not
supported. Please use bytes input or XML fragments without
declaration.
which means that we have to tell the user of parseString that
s/he has to strip off the encoding (i.e. the XML declaration).
I could add a doc string to this new version of parseString to
explain this.
I suppose that this makes sense. If you have a string, it is
likely that you did *not* start with a file containing an XML
declaration containing an encoding. If you did, you'd be using
one of the other parse functions in the generated library module.
Dave
On Tue, Dec 12, 2017 at 01:55:47PM +0100, Rohan Dsa wrote:
> Hi Dave,
> The call was with parameters "generateDS.py -f -o output.py input.xsd"
> Apologies, i'm right now moving apartments (painting and schlepping boxes
> around ) hence the delayed response. My python experience is around 2
> years old or so, so i'm a bit insecure. but here goes.
> Since im python 3 centric, i will pose the question as a python 3
> programmer.
> Is there a requirement that the generateds library works with strings with
> a specific encoding? I thought in the python 3 world, it could just use
> python 3 unicode strings throughout.
> Problems usually come up when reading from outside media like utf-8
> encoded files. but i think in such cases the user should convert this
> utf-8 encoded string into a python 3 unicode string and then pass it to
> the generateds library.Â
> With parsing from strings, my guts say its highly inefficient to keep
> encoding xml strings before sending them to the library unless there is
> some fundamental concept/requirement for it i am unaware of. If it's a
> limitation due to some implementation detail, then its understandable.Â
> With parsing from files, i can imagine, it may make sense to pass an
> encoding hint to the library with a default encoding being utf-8 similar
> to open(filepath, encoding)
> Rohan
> On Fri, Dec 8, 2017 at 9:13 PM, Dave Kuhlman <dkuhl...@davekuhlman.org>
> wrote:
>
> Rohan,
>
> Ah, so it was *not* a module generated with -s; it was one generated
> with -o. Still, what I fixed *was* a bug, and it needed to be fixed
> so that modules generated with -s would run under both Python 2 and
> Python3. Therefore, I'm still thankful that you prodded me into
> finding that.
>
> With respect to your problem, if I understand it correctly, you can
> fix it yourself by doing the following:
>
> Â Â >>> encoded_xml_string = xml_string.encode("utf-8")
> Â Â >>> gds_module.parseString(encoded_xml_string)
>
> Or, since "utf-8" is the default encoding, merely:
>
> Â Â >>> encoded_xml_string = xml_string.encode()
> Â Â >>> gds_module.parseString(encoded_xml_string)
>
> If I'm right about this, the question is whether you should do that
> encoding or whether the generated code (the parseString function)
> should do it for you. One argument in favor of putting the burden
> on the caller (instead of parseString) is that we do not know in
> advance what the encoding of the original string is, although it is
> very likely (maybe even very, very likely) that it is UTF-8.
>
> I can change the parseString generated function so that it does this:
>
> Â Â doc = parsexml_(IOBuffer(inString.encode()), parser)
>
> instead of this:
>
> Â Â doc = parsexml_(IOBuffer(inString), parser)
>
> That's a trivial change to make, but, that removes some flexibility
> from the user might need. See this:
>
> Â Â https://docs.python.org/3/library/codecs.html#standard-encodings
>
> Or, is that stretching things too far? Perhaps no one would ever
> need to use any encoding other than UTF-8.
>
> What do you think?
> Dave
>
> On Fri, Dec 08, 2017 at 03:23:30PM +0100, Rohan Dsa wrote:
> > Thank you Dave,
> >
> > Unfortunately the patch (and the latest generateds version) didn't
> help me
> > either. I get the following error. presumably the reason may be the
> > difference between strings in python 2 and 3? AFAIR, unicode strings
> in
> > python 3 as opposed to byte strings in python 2?
> >
> >Â Â File "c:\evobase2005\main\evopro\dc\dc\hci\operetta_data.py", line
> 15788,
> > in parseString
> >Â Â Â doc = parsexml_(IOBuffer(inString), parser)
> > TypeError: a bytes-like object is required, not 'str'
> >
> >
> > I didn't apply the -s argument either.
> >
> > let me know if i can do anything else to help.
> >
> > Thank you for the library. it generated 16000 LOC and saved me hours
> of
> > typing!
> >
> > Rohan
> >
> >
> >
> >
> > On Fri, Dec 8, 2017 at 12:22 AM, Dave Kuhlman
> <dkuhl...@davekuhlman.org>
> > wrote:
> >
> > > Rohan,
> > >
> > > That looks like a bug. After a quick search, I found that this
> > > error occurred in the parseString function in the generated subclass
> > > modules (that is, the modules generated with the -s command line
> > > option). Is that where you found it?
> > >
> > > I've attached a patch. If the patch does not apply correctly, you
> > > can find the patched version at any of the following:
> > >
> > > - https://bitbucket.org/dkuhlman/generateds
> > >
> > > - https://pypi.python.org/pypi/generateDS
> > >
> > > - https://sourceforge.net/projects/generateds/
> > >
> > > Thank you for reporting this. I appreciate your help.
> > >
> > > Dave
> > >
> > > On Thu, Dec 07, 2017 at 01:51:28PM +0100, Rohan Dsa wrote:
> > > > Dave,
> > > >
> > > > I need to keep changing a line in my auto generated code.
> > > >
> > > > from StringIO import StringIO
> > > >
> > > > to
> > > >
> > > > from io import StringIO
> > > >
> > > > The code runs only under Python3. is this normal or am i missing
> > > something?
> > > > I would have posted this somewhere, just didn't find any
> bugtracker
> > > online.
> > > >
> > > > Rohan
> > >
> > > --
> > >
> > > Dave Kuhlman
> > > http://www.davekuhlman.org
> > >
>
> --
>
> Dave Kuhlman
> http://www.davekuhlman.org
--
Dave Kuhlman
http://www.davekuhlman.org
diff -r a239ed939551 generateDS.py
--- a/generateDS.py Mon Dec 11 13:59:07 2017 -0800
+++ b/generateDS.py Tue Dec 12 14:30:31 2017 -0800
@@ -5051,6 +5051,18 @@
#xmldisable# doc = etree_.parse(infile, parser=parser, **kwargs)
#xmldisable# return doc
+#xmldisable#def parsexmlstring_(instring, parser=None, **kwargs):
+#xmldisable# if parser is None:
+#xmldisable# # Use the lxml ElementTree compatible parser so that, e.g.,
+#xmldisable# # we ignore comments.
+#xmldisable# try:
+#xmldisable# parser = etree_.ETCompatXMLParser()
+#xmldisable# except AttributeError:
+#xmldisable# # fallback to xml.etree
+#xmldisable# parser = etree_.XMLParser()
+#xmldisable# element = etree_.fromstring(instring, parser=parser, **kwargs)
+#xmldisable# return element
+
#
# Namespace prefix definition table (and other attributes, too)
#
@@ -5879,12 +5891,15 @@
def parseString(inString, silence=False):
- if sys.version_info.major == 2:
- from StringIO import StringIO as IOBuffer
- else:
- from io import BytesIO as IOBuffer
-%(preserve_cdata_tags)s doc = parsexml_(IOBuffer(inString), parser)
- rootNode = doc.getroot()
+ '''Parse a string, create the object tree, and export it.
+
+ Arguments:
+ - inString -- A string. This XML fragment should not start
+ with an XML declaration containing an encoding.
+ - silence -- A boolean. If False, export the object.
+ Returns -- The root object in the tree.
+ '''
+%(preserve_cdata_tags)s rootNode= parsexmlstring_(inString, parser)
rootTag, rootClass = get_root_tag(rootNode)
if rootClass is None:
rootTag = '%(rootElement)s'
@@ -5892,7 +5907,6 @@
rootObj = rootClass.factory()
rootObj.build(rootNode)
# Enable Python to collect the space used by the DOM.
- doc = None
#silence# if not silence:
#silence# sys.stdout.write('<?xml version="1.0" ?>\\n')
#silence# rootObj.export(
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users