Hello, This is my third report on the work progress on a project PyPI to Debian Repository Converter mentored by Piotr Ożarowski.
Work: ----- Over the past two weeks I’ve worked mainly over designing detailed structure of database and writing ORM corresponding to it. Basing on information from the database, I’ve adjusted most of the holistic algorithm of my program. The main reason of this work was implementation of few new command line options. Through the deeper thoughts of the whole program, I could also make some improvements in plugins’ API. Status of my project described in previous reports included initial implementation of functions that: download packages from PyPI, change their names to comply with Debian policy, copy and extract archives and finally convert and build Debian packages. The next task that I undertook was to enter all useful information, which were received during tool’s runtime, into a database. Database: ---------- In spite of reasonableness the design scheme, additional problem for me was to implement ORM using SQLAlchemy[1] library, because I’ve never used it before. However, the benefits of its use presented to me by my mentor (e.g.: complex object model, easier switch between different types of databases, simplification of queries, more transparent code), convinced me to take a few moments to familiarize myself with documentation and basic use of this library and eventually I’ve successfully applied it. After consideration, many attempts and discussions, tool’s database scheme contains (at the moment) four tables, two main (“processed”, “skipped”) and two auxiliary (“commands”, “names”). Table “processed” contains information about packages (which have been transmitted to convert plugins) generated during operation of the program, such as: * package name (renamed to comply with Debian Policy) * package version (adjusted in a way that dpkg --compare-versions sorts them correctly) * process type (instances of for example: ‘convert’, ‘source package build’, ‘bin package build’) * plugin name of the executing process * return code of process (in order to easily identify packages with the same problem) * stdout (I want to make logs public later) * stderr (might be interesting especially for packages developers) * start time of executing process * end time of executing process * session id (to identify the batch from which logs come from, it’s the start time of the tool) Table “skipped” contains information about packages, which for various reasons couldn’t be converted (i.e. convert plugins were not used). Of course I keep a package name and version, further reason for rejecting the package and session id. Table “commands” contains information about session in which the tool was run, that means options selected from available through the command line: * requested Python version (only packages that support given version were converted) * tarballs path (to a directory with archives downloaded from PyPI) * skip-existing (a boolean indicating if packages already available in Debian should also be converted) * distro (to change the default distribution in generated packages) * package (to convert only selected packages or specific version, if given) * download-only (to download all tarballs from PyPI without other actions) * force-update (to clear all data and make repository from the beginning) All this data is assigned to appropriate session id (which is used in other tables). Table “names” contains mapping of original package name and original version to package name and version obtained by changing the name to comply with Debian policy. When I’ve implement the possibility of writing into and obtaining from the database information about the packages (which ultimately replaced introduced earlier mechanism of statistics, important for me, because from the beginning I tried not to loose any package along the way) I received a guarantee to know the status of each package. Also, it simplified issue resume associated with resume the interrupted program. Support options ---------------- The next step taken by me was attempt to implement support for described above options available through commandline. Unfortunately at this point, I’ve discovered that a set of generators which I’ve previously written require considerable rewrite to be able to effective support those options. First of all I’ve to create two paths of action from different starting points: the first one when the network connection is active and packages have to be downloaded from the PyPI, and the second one when tarballs have been downloaded in front, or even came from another source (private tarballs, not available in PyPI). I’ve determined the next steps in both cases and actions common to both ways and then started quite arduous work involving the shifting functions to the right place. Work has been beneficial, because on this occasion I managed to get a clearer structure of the program (which I hope will result in fewer bugs) and a clearer layout files in the repository. This is also really helpful to implement support of options, such as: --path, --package, --download-only, --force-update. Option --skip-existing will skip packages already present in official Debian repository, it will use system function: “apt-cache search -n python” to generate a list of available packages (respectively for Python 2 and python 3) and on this basis ignores package previously submitted for conversion. Plugins -------- In the course of work I’ve managed to improve a little plugins’ API described in previous report[2] by introduction to the plugin base class methods “is_usable” which validates whether the plugin is installed and can be used. The method checks for availability of commands listed in “required_commands” attribute by default. Summary --------- I am glad that finally my tool has such an important element as a database. I am also sure, that familiarising myself with such a powerful library as SQLAlchemy will pay off in the future. Also finally adding support for new options is an important step, because such a big upheaval in the program better to be behind as soon as possible. Plans ------ Upcoming plans include continued testing and introduced solutions described above. But the most important task, which so far has not met, is to improve many components of my tool which, for now, have basic functionality only, especially the practical implementation of all plugin methods. I will focus on this in the near term. I also plan to start working on the Debian source/binary repository which will contain converted packages. ------ [1] http://www.sqlalchemy.org/ [2] http://lists.debian.org/debian-python/2012/06/msg00039.html -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/[email protected]

