Hi Stuart, Having an end of year cleanout :)
Couple of comments below for anyone who wants to try it. On 31 December 2012 13:33, Stuart Rackham <[email protected]> wrote: > > > On 17/10/12 04:32, Jens Getreu wrote: >> To follow up: >> >> https://groups.google.com/d/topic/asciidoc/0p8l1qD8-40/discussion >> >> here my question concerning python 3: >> When will AsciiDoc be compatible with Python 3? Any plans? >> What is the workload for an experienced programmer? > > I thought I had posted my python3-port notes previously but can't find > the post, here they are (I don't have any time to do it at the moment > and imagine it could take some time): > > NOTE: These notes are rough and have not been checked or verified. > > = AsciiDoc Python 3 port > > Read the 'String Changes in 3.0' in 'Learning Python 4th Ed.' first. > > I haven't got beyond the planning stage, but here are the proposed > conventions going forward: > > . UTF-8 is the default encoding (no change here). As was found on one regex bug, it is intended to be but isn't, Python3 should be a good deal better in this respect. > . All configuration (.conf) files to be UTF-8 encoded (afaik all > current .conf files are UTF-8). > . The AsciiDoc 'encoding' attribute sets the encoding of source > files and output files (no change here). Distinction between source and output? If source is cp1251 the output for HTML should still be UTF-8 IIUC. > . The setting of the 'encoding' attribute in AsciiDoc source documents > is prohibited (you have to set it on the command-line or from > configuration files). Thats error prone, how do I remember that file xyz is cp1251 and file zyx is UTF-8? It should be in the file, similar to the encoding= in HTML. > > In theory at least, the last rule (to avoid a Catch-22) would > introduce a backward incompatibility because currently the User Guide > states ``The 'encoding' attribute can be set using an AttributeEntry > inside the document header''. But this is broken anyway in that it > only applies to character sets that are backward compatible with ASCII > e.g. ISO-8859-1 (latin-1). So long as its compatible up to and including the :encoding: cp1251 then it should be ok, and that encoding should be on the first line or two. > > Software should only work with Unicode strings internally, converting > to a particular encoding on output. > Thats the only way with Python3 IIUC all strings are Unicode code points, you have to explicitly use "bytes" objects for other behaviour. > Port to Python 3 via 2.6, this is how Django are doing it: > > ``deprecate older 2.x releases until our minimum requirement is Python > 2.6, then to take advantage of the compatibility features in 2.6 to > carry out the actual porting and achieve Python 3 support'' I have found that this is an admirable target, but isn't always achievable, but maybe my Python programs are somewhat pathological anyway :) > > The idea will be to have a Python 2.6 version that can be > automatically converted to Python 3 using `2to3` with a '2to3' AAP > rule. > > 2to3 -w -f idioms -f all a2x3.py > 2to3 -w -f idioms -f all -x next asciidoc3.py > > Use `sys.version_info >= (3, 0)` to test for Python 3. > > Need to replace all open() calls with: > > def file_open(filename, mode='r', encoding=None): > if not encoding: > encoding = document.attributes.get('encoding', 'UTF-8') > return codecs.open(filename, mode, encoding, errors='strict') > > . All AsciiDoc distribution text files are UTF-8 encoded. > . The 'encoding' attribute sets the encoding of input and output files > (defaults to UTF-8). > . The use of the 'encoding' attribute in the document header is prohibited > ??? unless the encoding of the header is compatible with UTF-8 e.g. > ISO-8859-1 (latin-1) > > What exactly is the encoding of text from stdin on Linux and Windows? > See: > > - stdout encoding is set by the OS environment and is NOT > sys.getdefaultencoding(), you can read it with sys.stdout.encoding > but it can only be set externally (see > > https://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/) > Thankfully on Linux this is normally UTF-8. > Things aren't so simple with Windows > > (http://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7). > Yeah, its all kinda broken when using things via pipes :( > > Closing the points of entry: > . Reader to have 'encoding' attribute so includes get the right > encoding. Assuming they are the same, maybe make that required! Cheers Lex [...] -- You received this message because you are subscribed to the Google Groups "asciidoc" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/asciidoc?hl=en.
