08-unadorned-strings into lp:duplicity

Aaron Whitehouse Fri, 08 Jun 2018 14:28:22 -0700

The proposal to merge lp:~aaron-whitehouse/duplicity/08-unadorned-strings into 
lp:duplicity has been updated.


Description changed to:

As set out in the Python 3 blueprint: 
https://blueprints.launchpad.net/duplicity/+spec/python3
one of the most time consuming, and least easy to automate, parts of supporting 
both Python 2 and 3 is string literals. This is because simple strings (e.g. a 
= "Hello") will be treated as bytes (e.g. encoded ASCII) in Python 2 and 
Unicode in Python 3. As we are trying to support both Python 2 and Python 3 for 
at least a transition period, we may end up with odd behaviour wherever we have 
an unadorned string.

The versions of Python 2 and 3 we are targeting means that we can "adorn" 
strings with letters to indicate what type of string (u for Unicode, b for 
Bytes and r for Raw/regexes).

An important preliminary step to Python 2/3 support is therefore for us to add 
these adornments to each and every string literal in the code base.

To ensure that we can find these and do not accidentally introduce more 
unadorned strings, this merge request adds a function to our test_code that 
automatically checks all .py files for unadorned strings and gives an error if 
any are found.

The actual work to adorn all of these strings will be substantial, so that is 
not all done in this merge request. Instead, this takes the approach we have 
for many of our other code style checks, where it currently contains a very 
long list of excluded files (which are not checked) and we can remove these 
exceptions as we adorn the strings in each file.

To assist people in finding and correcting all of the unadorned strings in a 
particular file, the new file testing/find_unadorned_strings.py can be executed 
directly with a python file as an argument:
./find_unadorned_strings python_file.py
and it will return a nicely-formatted list of all unadorned strings in the file 
that need to be corrected.

As the codebase is currently Python 2 only, marking strings as Bytes (b" ") 
essentially preserves current behaviour, but it is highly desirable to convert 
as many of these as possible to Unicode strings (u" "), as these will be much 
easier to work with as we transition to Python 3 and it will improve non-ASCII 
support. This will likely require changes to other parts of the code that 
interact with the string. The broad recommended approach for text is to decode 
at the boundaries (e.g. when reading from or writing to files) and use Unicode 
throughout internally. Many built-ins and libraries natively support Unicode, 
so in many cases very little needs to change to the code.

Many helper variables/functions have already been created in duplicity so that 
you can use Unicode wherever possible. For paths, for example, you can use 
Path.uname instead of Path.name.

For more details, see:
https://code.launchpad.net/~aaron-whitehouse/duplicity/08-unadorned-strings/+merge/347721
-- 
Your team duplicity-team is requested to review the proposed merge of 
lp:~aaron-whitehouse/duplicity/08-unadorned-strings into lp:duplicity.

_______________________________________________
Mailing list: https://launchpad.net/~duplicity-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~duplicity-team
More help   : https://help.launchpad.net/ListHelp

[Duplicity-team] [Merge] lp:~aaron-whitehouse/duplicity/08-unadorned-strings into lp:duplicity

Reply via email to