Hola folks, While roasting my laptop under Andalucia's sun, I had time for more deep thinking about the future handling of D-I i18n/l10n (globalisation==g17n) as we have now reached a major milestone with the release of RC1, which hopefully you got out while I was away....
D-I globalisation has grown a lot in the last two years and became somewhat complicated to handle for translators. Moreover, in the same time, the number of supported languages grown : the early translators were most often long-time Debian contributors while some of the recent translators are not. This is likely to continue in the future if I achieve my goal of grabbing more and more translators here and there : the more language we support, the less translators with deep Debian knowledge we will have. So, we need to simplify the g17n infrastructure as much as possible, from the translators point of view. After lot of thinking, I have identified two major things to change in our infrastructure: 1) the translations of D-I package split over dozens of files, one per package 2) the weaknesses of the current "3 stages" system for following translation progress 1) Too much files ----------------- What we call in translators jargon the "first stage" consists of all D-I team maintained packages, thus the packages in the D-I SVN repository. Translators currently work on these files one by one and commit them individually when they are changed. This requires them to either keep a full copy of the D-I source tree or to grab files from statistics pages. This also complicates the work of teams using their own CVS tree such as Arabeyes translators, Norwegian translators from Skolelinux or Russian translators. This also forces translators to repetitively translate the same sentence over several files which most often ends in inconsistent translations from package to package. Finally, this eats a lot of time when translation commits have to be done by myself, as a help to translators who cannot commit files themselves : most often their files are named according to the package they belong to and reputting all these files at the appropriate place is very tedious As a conclusion, the use of a single PO file per language would be a great improvement for all translators. A few tools are already available for working this way : I have commited two prospective scripts in scripts/l10n-utilities before I discovered Petter Reinholdtsen's gettext-helper script which more or less does the needed job and is used by the Norwegian team. So, I have dropped the following plan for transition to a single PO files for all D-I packages translations: 0) write a script for merging all existing PO files to one single file per language in packages/po. This script has already been written by Petter : gettext-helper 1) write a script for collecting all templates strings in packages and create a general template file in packages/po, merge it with the existing single translation files in packages/po, re-spread out translations from this file to all packages debian/po directories 2) set up this script for running periodically under my account on people.debian.org 3) switch the French translations to this new scheme 4) test...test...test 5) progressively switch other languages to this new scheme The new script will be named l10n-sync and has been (or will soon be, depending how this mail goes out) commited to scripts/l10n-utilities. Its logic is described in one of the new documentation files I have also commited in installer/doc/i18n (the file is "technical.txt"). At the end of the migration, translators will only need to work on files in packages/po and will just forget about all other files. Developers will no more need to care about debconf-updatepo and such other stuff when they change or add strings to templates. The l10n-sync script will handle all the magic for updating PO files in debian/po as well as syncing them with translators work from packages/po. Another script will be available to developers so that they can manually sync the PO files for one single package, usually before releasing the package. The logic for the handling of debian/changelog files content will not be changed : the scripts/l10n-changes/output-l10n-changes will still be usable. It will be called from a more general script, designed for package maintainers who wish to update their package's translations immediately. I plan to install the l10n-sync script while I'm still on holidays from Aug 16th to Aug 22th, with lot of time for closely follow its work while I switch the French translations to this new scheme. During the Aug 23-Aug 30 week, I'll try to get a few more languages handled by the l10n-sync script : most probably the languages handled by well skilled translators who will be able to handle possible messes..:-) In the same time, I will work with Dennis Stampfer on adapting the translation statistics infrastucture to this new work method (see the second part of this mail). Finally, during September, all languages will be switched to the new system. If this systemappear to work well, the switch may happen earlier, depending on the next releases process. 2) Weaknesses of the "3 stage" system ------------------------------------- During the last months, we invented the "three stage" system for following translation statistics. This is due to the fact that having a fully translated installation process does not only need translating D-I packages themselves. Several "regular" Debian packages are involved in the installation process, most of them needing translation of the screen(s) they may show to users. Some of these packages are maintained by the D-I team or by regular D-I contributors (base-config, tasksel, popcon...), some others aren't. So, with Dennis Stampfer (and Denis Barbier previously), we grouped together, in statistics pages, the translations statistics: -1st stage : all things shown during the first step of the installation, before the reboot -2nd stage : all things shown or possibly shown after the reboot involving some user input -3rd stage : all things shown or possibly shown to users during the installation, not involving user input This induced some progressivity to translators work as obviously translators needed to complete 1st stage before 2nd stage and then 3rd stage. However, this scheme has currently several weaknesses: 1) the name of "stage" is wrongly chosen. There aren't 3 stages during the installation process. This name is iniherited from times we talked only about 2 stages, with stage2 only including base-config and tasksel 2) for several reasons, we have put in 2nd stage things which indeed pertain to 1st stage : this is the case for iso-codes translations (country names) which are shown by countrychooser, but are currently counted in "2nd stage" As a consequence, some languages for which we claim to have 100% translation may still show English in countrychooser's screens 3) we currently do no take into account the status of "2nd stage" translations when publishing our translation statistics. Some languages for which we claim 100% translation do not have translation for base-config or shadow screens, for instance This may confuse our users who expect their language, but will get English in some of second stage steps. We already had reports about this. 4) Second stage currently includes very different things : some packages for which translations are nearly mandatory such as base-config, shadow debconf or tasksel and some things which have no real consequence on what is shown to users (pppconfig which is very rarely used, shadow programs translations which are not used at all...) 5) Translators have sometimes few indications about which package or which package part should be translated first : for instance, several of them have spent hours translating shadow programs while tasksel or iso-codes remained untranslated 6) Statistics are currently made on most packages CVS or SVN repositories. This is good for giving transaltors a good idea of which work they still have to do. However, this may give a false idea of the real translation status, if some commited translations have not reached the archive yet. As a consequence, the real translation status for each language is sometimes difficult to really appreciate, most often because of the mix between 1st and 2nd "stages" translations. As a conclusion, we need to re-arrange the way we currently build our statistics so that they better reflect the real translation status which will be seen by our users. We also need to be able to say how many translations are complete for the whole installation process in addition to the statistics we currently publish for the "core" Debian Installer. For this, my plan is the following: 1) Rename "stages" to "levels" 2-5) reorganise things between levels so that they better reflect the progressive translation process 6) Except for first level, give two statistics : the status of translated/commited material as well as the status of translation in the Debian archive I have chosen to mention 7 levels in translation status. First of all, this is a number with some high symbolic meaning. I have probably been influenced by my holidays in a place where the three monotheistic religions peacefully coexisted for hundreds of years...:-) These 7 levels will be the following: level 1 : all core D-I packages 1180 strings level 2 : all non core D-I material involved for user interaction screens during a *default priority* installation of a Debian base system with default choices: - base-config (programs and debconf) : 7+112 =119 strings - shadow (debconf) : 25 strings - tasksel (programs, debconf, tasks) : 2+5+102=109 strings - iso-codes (iso_3166) : 404 strings - console-data (debconf) : 89 strings - exim4 (debconf) : 63 strings - popularity-contest (debconf) : 7 strings 816 strings level 3 : all non-core D-I material involved for user interaction screens during any type of installation of a Debian base system. This will include rarely used packages and packages which may display their screens under certain circumstances: - discover1 (debconf) : 11 strings - aptitude (programs) : 723 strings - pppconfig (programs) : 135 strings - console-common (debconf) : 26 strings - dictionaries-common (debconf) : 28 strings - pcmcia-cs (debconf) : 30 strings 953 strings level 4 : all packages which may display messages to the screen during any type of installation of a Debian base system: - discover1 (program) : 83 strings - dpkg (program) : 1006 strings - apt (program) : 459 strings - shadow (program) : 464 strings 2012 strings level 5 : all Debian base system packages (debconf+programs) level 6 : all Debian packages of priority Standard (debconf+programs) level 7 : all other Debian packages (debconf+programs) Obviously, levels 5 to 7 are currently very fuzzy....while level 7 is completely unreachable (let's see it as a kind of translators Grail...) So, all this means splitting out the translation statistics in four real levels for a total of nearly 5000 strings. This also means that after the split, we will be able to publish the statistics for the first two levels and these will give a real idea of the status of the translations for the whole installation process. We may even imagine publishing the statistics for the 4 levels though this may be a bit confusing. This will also give a clearer credit to translators and translation teams who currently have a real complete Debian installation During the next weeks, I will work together with Dennis Stampfer on building a new translation statistics web site with the new scheme. This will be a bit tricky as some packages such as shadow has some of their i18n material in one level and another in another level. The double statistics may also be a bit tricky. These changes will occur while the old system will continue working. They will be made in parallel with changes to the core D-I packages translation system and they will be tested on a few languages first. New documentation ----------------- I have already commited new documentation for the translation process, which takes into account this new scheme. The new document is in installer/doc/i18n. This is a XML document named "i18n.xml". A very small build script is provided for compiling it to an HTML document. I'm just learning about DocBook, XSL stuff and I will probably soon provide some better build script for building a text file as well. The new document includes parts for translators as well as parts for maintainers. All D-I contributors, and more particularly translators, are invited to read it carefully. This is a very long and detailed document but I have tried to put there as much information as possible. The document is based on the old "translation.txt" file for which references should now be gradually removed on D-I Web site(s) and documentations. -- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]