Hi Lex On 31/12/12 22:31, Lex Trotman wrote: > Hi Stuart, > > Having an end of year cleanout :)
Yep! Thanks for these comments, you've obviously been a lot further along this path than me. My problem with the Python 3 thing is that I can't think of a single reason I (emphasis on "I") would want to other than "gee, that would be nice", so I'm not planning to roll my sleeves up any time soon on this one. Cheers, Stuart > > Couple of comments below for anyone who wants to try it. > > On 31 December 2012 13:33, Stuart Rackham <[email protected]> wrote: >> >> >> On 17/10/12 04:32, Jens Getreu wrote: >>> To follow up: >>> >>> https://groups.google.com/d/topic/asciidoc/0p8l1qD8-40/discussion >>> >>> here my question concerning python 3: >>> When will AsciiDoc be compatible with Python 3? Any plans? >>> What is the workload for an experienced programmer? >> >> I thought I had posted my python3-port notes previously but can't find >> the post, here they are (I don't have any time to do it at the moment >> and imagine it could take some time): >> >> NOTE: These notes are rough and have not been checked or verified. >> >> = AsciiDoc Python 3 port >> >> Read the 'String Changes in 3.0' in 'Learning Python 4th Ed.' first. >> >> I haven't got beyond the planning stage, but here are the proposed >> conventions going forward: >> >> . UTF-8 is the default encoding (no change here). > > As was found on one regex bug, it is intended to be but isn't, Python3 > should be a good deal better in this respect. > >> . All configuration (.conf) files to be UTF-8 encoded (afaik all >> current .conf files are UTF-8). >> . The AsciiDoc 'encoding' attribute sets the encoding of source >> files and output files (no change here). > > Distinction between source and output? If source is cp1251 the output > for HTML should still be UTF-8 IIUC. > >> . The setting of the 'encoding' attribute in AsciiDoc source documents >> is prohibited (you have to set it on the command-line or from >> configuration files). > > Thats error prone, how do I remember that file xyz is cp1251 and file > zyx is UTF-8? It should be in the file, similar to the encoding= in > HTML. > >> >> In theory at least, the last rule (to avoid a Catch-22) would >> introduce a backward incompatibility because currently the User Guide >> states ``The 'encoding' attribute can be set using an AttributeEntry >> inside the document header''. But this is broken anyway in that it >> only applies to character sets that are backward compatible with ASCII >> e.g. ISO-8859-1 (latin-1). > > So long as its compatible up to and including the :encoding: cp1251 > then it should be ok, and that encoding should be on the first line or > two. > >> >> Software should only work with Unicode strings internally, converting >> to a particular encoding on output. >> > > Thats the only way with Python3 IIUC all strings are Unicode code > points, you have to explicitly use "bytes" objects for other > behaviour. > >> Port to Python 3 via 2.6, this is how Django are doing it: >> >> ``deprecate older 2.x releases until our minimum requirement is Python >> 2.6, then to take advantage of the compatibility features in 2.6 to >> carry out the actual porting and achieve Python 3 support'' > > I have found that this is an admirable target, but isn't always > achievable, but maybe my Python programs are somewhat pathological > anyway :) > >> >> The idea will be to have a Python 2.6 version that can be >> automatically converted to Python 3 using `2to3` with a '2to3' AAP >> rule. >> >> 2to3 -w -f idioms -f all a2x3.py >> 2to3 -w -f idioms -f all -x next asciidoc3.py >> >> Use `sys.version_info >= (3, 0)` to test for Python 3. >> >> Need to replace all open() calls with: >> >> def file_open(filename, mode='r', encoding=None): >> if not encoding: >> encoding = document.attributes.get('encoding', 'UTF-8') >> return codecs.open(filename, mode, encoding, errors='strict') >> >> . All AsciiDoc distribution text files are UTF-8 encoded. >> . The 'encoding' attribute sets the encoding of input and output files >> (defaults to UTF-8). >> . The use of the 'encoding' attribute in the document header is prohibited >> ??? unless the encoding of the header is compatible with UTF-8 e.g. >> ISO-8859-1 (latin-1) >> >> What exactly is the encoding of text from stdin on Linux and Windows? >> See: >> >> - stdout encoding is set by the OS environment and is NOT >> sys.getdefaultencoding(), you can read it with sys.stdout.encoding >> but it can only be set externally (see >> >> https://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/) >> Thankfully on Linux this is normally UTF-8. >> Things aren't so simple with Windows >> >> (http://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7). >> > > Yeah, its all kinda broken when using things via pipes :( > >> >> Closing the points of entry: >> . Reader to have 'encoding' attribute so includes get the right >> encoding. > > Assuming they are the same, maybe make that required! > > > Cheers > Lex > > [...] > -- You received this message because you are subscribed to the Google Groups "asciidoc" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/asciidoc?hl=en.
