Hi Stuart,

Having an end of year cleanout :)

Couple of comments below for anyone who wants to try it.

On 31 December 2012 13:33, Stuart Rackham <[email protected]> wrote:
>
>
> On 17/10/12 04:32, Jens Getreu wrote:
>> To follow up:
>>
>> https://groups.google.com/d/topic/asciidoc/0p8l1qD8-40/discussion
>>
>> here my question concerning python 3:
>> When will AsciiDoc be compatible with Python 3? Any plans?
>> What is the workload for an experienced programmer?
>
> I thought I had posted my python3-port notes previously but can't find
> the post, here they are (I don't have any time to do it at the moment
> and imagine it could take some time):
>
> NOTE: These notes are rough and have not been checked or verified.
>
> = AsciiDoc Python 3 port
>
> Read the 'String Changes in 3.0' in 'Learning Python 4th Ed.' first.
>
> I haven't got beyond the planning stage, but here are the proposed
> conventions going forward:
>
> . UTF-8 is the default encoding (no change here).

As was found on one regex bug, it is intended to be but isn't, Python3
should be a good deal better in this respect.

> . All configuration (.conf) files to be UTF-8 encoded (afaik all
>   current .conf files are UTF-8).
> . The AsciiDoc 'encoding' attribute sets the encoding of source
>   files and output files (no change here).

Distinction between source and output?  If source is cp1251 the output
for HTML should still be UTF-8 IIUC.

> . The setting of the 'encoding' attribute in AsciiDoc source documents
>   is prohibited (you have to set it on the command-line or from
>   configuration files).

Thats error prone, how do I remember that file xyz is cp1251 and file
zyx is UTF-8?  It should be in the file, similar to the encoding= in
HTML.

>
> In theory at least, the last rule (to avoid a Catch-22) would
> introduce a backward incompatibility because currently the User Guide
> states ``The 'encoding' attribute can be set using an AttributeEntry
> inside the document header''. But this is broken anyway in that it
> only applies to character sets that are backward compatible with ASCII
> e.g.  ISO-8859-1 (latin-1).

So long as its compatible up to and including the :encoding: cp1251
then it should be ok, and that encoding should be on the first line or
two.

>
> Software should only work with Unicode strings internally, converting
> to a particular encoding on output.
>

Thats the only way with Python3 IIUC all strings are Unicode code
points, you have to explicitly use "bytes" objects for other
behaviour.

> Port to Python 3 via 2.6, this is how Django are doing it:
>
> ``deprecate older 2.x releases until our minimum requirement is Python
> 2.6, then to take advantage of the compatibility features in 2.6 to
> carry out the actual porting and achieve Python 3 support''

I have found that this is an admirable target, but isn't always
achievable, but maybe my Python programs are somewhat pathological
anyway :)

>
> The idea will be to have a Python 2.6 version that can be
> automatically converted to Python 3 using `2to3` with a '2to3' AAP
> rule.
>
>   2to3 -w -f idioms -f all a2x3.py
>   2to3 -w -f idioms -f all -x next asciidoc3.py
>
> Use `sys.version_info >= (3, 0)` to test for Python 3.
>
> Need to replace all open() calls with:
>
>   def file_open(filename, mode='r', encoding=None):
>       if not encoding:
>           encoding = document.attributes.get('encoding', 'UTF-8')
>       return codecs.open(filename, mode, encoding, errors='strict')
>
> . All AsciiDoc distribution text files are UTF-8 encoded.
> . The 'encoding' attribute sets the encoding of input and output files
>   (defaults to UTF-8).
> . The use of the 'encoding' attribute in the document header is prohibited
>   ???  unless the encoding of the header is compatible with UTF-8 e.g.
>   ISO-8859-1 (latin-1)
>
> What exactly is the encoding of text from stdin on Linux and Windows?
> See:
>
> - stdout encoding is set by the OS environment and is NOT
>   sys.getdefaultencoding(), you can read it with sys.stdout.encoding
>   but it can only be set externally (see
>
> https://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/)
>   Thankfully on Linux this is normally UTF-8.
>   Things aren't so simple with Windows
>
> (http://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7).
>

Yeah, its all kinda broken when using things via pipes :(

>
> Closing the points of entry:
> . Reader to have 'encoding' attribute so includes get the right
>   encoding.

Assuming they are the same, maybe make that required!


Cheers
Lex

[...]

-- 
You received this message because you are subscribed to the Google Groups 
"asciidoc" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/asciidoc?hl=en.

Reply via email to