I sent these ideas to Jim Gettys, who suggested that I send them to the development and localization mailing lists.
------ Summary: * Write/ Edit primary documentation according to an explicit set of writing conventions designed to minimize ambiguity and complexity in order to facilitate translation. * Treat this English documentation as source code which is meant to be translated/compiled into user languages. * Use/Create collaboration tools to make translation, distribution, and maintenance of docs more efficient. ------ Assumptions: Some of those doing translation will not be professional translators fully bilingual in English and the target language. They might be any of the following: * a village teacher who speaks the target language as her first language (L1) and English as a weak second language (L2); * a missionary who speaks English as L1 or L2 (in the case of a French missionary in Africa, for example) and the target language as a weak L3; * a professional translator who speaks a non-English L1, reads and writes the target language as L2, and knows English as just a subject that he or she studied in school and uses for travel; * a native L1 speaker of the target language who has immigrated to a foreign country in which English is spoken as a primary or secondary language. Many of the translators are not going to be career translators, so rather than having the translator accommodate the source text, the source text should accommodate the translator. Documentation translation is particularly difficult because of how documentation is usually created. Often docs are written grudgingly at the end of the project, and docs are rarely written to a uniform format or set of conventions. There is little reflection on what kind of docs are needed, and docs are usually not edited before they are sent off for transl and publishing. The conventional approach to translation is that, when a novel or academic article is translated, it is the burden of the translator to accommodate the original, and if the original is unclear, this lack of clarity is translated into the target texts because the target text must be a mirror of the original. I know this from direct experience, having been the translator for many doc jobs from Japanese companies. The originals are often incomprehensible because of ambiguity and inconsistency, as in the following examples: * different sections of the docs are written by different people using different terminology for the same processes and entities; * unconfident writers are too brief, assuming background info and context to which the translator does not have access; * more confident writers use too many idioms and colorful expressions, rambling on and on in extended and poorly-organized complex sentences; * section divisions and overall organization are inconsistent, forcing the translator to restructure the original before beginning the translation; * ambiguities inherent in the language itself (like the absence of gendered pronouns and explicit sentence subjects in Japanese) also complicate the translation, forcing the translator to contact the writer of the original, thus slowing the process and degrading translator motivation and confidence. Ambiguity is the biggest obstacle to translation. If it is a rush job (and it always is), and especially if the translation is being handled by a middleman like a publisher or web design firm (and these days it almost always is), the translator usually retreats to literal translation in the face of ambiguity because there is no way to contact the author (middlemen don't want the translator to know how much the client is being billed for translation) or no time to wait for the reply. When the text is unclear, the translator has no choice but to translate the ambiguity itself. In the case of OLPC documentation, ambiguity should be avoided at all costs. Anything that interferes with teachers and students using the notebooks should be avoided, and bad docs would certainly be frustrating and demotivating for the educators and pupils. In order to have translations that are as clear as possible, we must have source-docs that are as clear as possible. ------ Reconception of documentation/ translation as parallel to computer programming: The OLPC team uses English as a common working language, but the users will be using translations, so the English documentation can be seen as not a product in and of itself but as the source for all translations. The English-language "source docs" should be written to a set of conventions meant to reduce ambiguity and ensure consistency, even when doing so necessitates violating conventional English writing style. The set of documentation standards I am proposing is similar to the set of coding conventions a programmer follows. The "source docs" (though written in English) should be seen as source code which is then compiled (or translated) into the many languages needed to support the users. Likewise, the source-docs should include explicit comments and extra-textual blocks to clarify ambiguity introduced by the writing style or inherent in the language itself, much in the same way that a good programmer includes comments in source code to compensate for the lack of explanatory devices in the code itself. Looping through a multi-array doesn't tell you WHY you need to do so or how it plays into the next code block, just as being told that the subject of a sentence is "Suzuki-san" does not tell you if Suzuki is a "she" or a "he". Most techs have had the experience of having to maintain a code base which did not include sufficient comments: while "read the friendly code" or "use the source" might be good ways to learn to program, this kind of detective work is not an efficient use of time and effort. ------ Doc writing conventions: Some linguistic research has been done on "simplified English" as a subset of English to use for low-level learners, and I think that it might be a good place to look for ways to simplify the source_docs. But just thinking intuitively, I have cooked up the following suggestions in order to generate discussion: * Pronouns. o Use the first-person singular pronoun "I" to represent the author of the docs, o the second-person singular pronoun "you" to represent the reader of the docs, and o the first-person plural pronoun "we" to represent the OLPC project. o Examples. "We have designed a screen that switches to black-and-white to conserve energy. I will explain how to switch your screen to black-and-white. First, you press the X button on your keyboard...." Because we want the docs to be easily translated and easily understood, the tone should be personal, using "I" for the voice of the writer. This will be easier for amateur translators to translate and easier for younger readers to understand. This will also help the writer avoid the passive construction, which is very difficult for some non-native English speakers to understand. * Lists. o Use tables to explain parallel relationships, comparisons, the composition of an entity, and categorical relationships. o Use numbered lists to explain the stages of a process, the steps in a sequence, or anything that has an inherent spatial or temporal order or expresses precedence. Do not use numbered lists if the numbers do not relate to some inherent property of the items. A grocery list should not be numbered, unless the order in which the items are purchased is important. o Use bulleted lists for lists that do not have inherent order or precedence. The grocery list would be bulleted. * All comma sequences should have a comma before the last conjunction, i.e. "I like to read books, eat shrimp, and run marathons," rather than, "I like to read books, eat shrimp and run marathons." It is fashionable right now to leave out the last comma, but doing so puts the onus of comprehension on the reader. While this is a nit-picky detail, OLPC source-docs should do as much of the work as possible so that translation and comprehension are as easy as possible. * Use parentheses to include supplemental information like the gender of human agents, steps in a sequence, the target of a pronoun, etc. when there is any ambiguity. * Many languages, including Japanese, represent non-native names in a native writing system. In Japanese, foreign names are written in a phonetic script called katakana, and my name is pronounced Kuupaa Maikeru. The result is that there is a loss of data; the orthography of my name (the spelling in English) is lost to any Japanese-to-English translator, as is the proper pronunciation. I suggest that all source-docs have personal names written in the alphabet and followed by the pronunciation written in IPA (International Phonetic Alphabet) in parentheses behind it. Then translators should be told to always put the original orthography in parentheses after the name that they are using, so that my name would be "<katakana>Kuupaa Maikeru</katakana> (<alpha>Micheal Cooper</alpha>)" in a Japanese translation. * Insert a table that acts as a glossary of terms and their definitions at the beginning of each text. These would be the key nouns and verbs used in the text, terms that need to have clear meanings and consistent translations. The translators would be required to keep culminative lists in OO Calc or such of these key terms so that, in the case that the translator changes or a group of translators is doing the job, the key terms can be kept consistent. If we know ahead of time that there will be translator teams, this could be covered by a webapp or by Google spreadsheets. * Idioms and culture-specific metaphors and references should be avoided or used sparingly. Of course, terminology that originated in cultural metaphor, like "kill a process" and "reboot the server" would be treated as key terms and added to the glossary to be translated consistently, but more creative and expressive language ("you can type like a banshee", "students will be on it like white on rice", "resulting in a Mickey Mouse, vanilla solution to the problem") should be curtailed. * Use words, mathematical symbols, and visuals to reinforce and enhance purely verbal explanations with conceptual representations of information (I am thinking Edward Tufte here), i.e. (poor example, but here goes) "I will show you how to teach your students to create multimedia presentations. <in box> Sound + Pictures = Multimedia </in box>." I think you get the idea, though. * The source-docs be organized so that each section and each paragraph is identified by a number and that the translators be required to maintain this organization so that paragraph 61 in the Yoruba translation is paragraph 61 in the source-docs. By doing so, it will be easier to modify the translations when changes are made to the source-docs. This would imply some kind of web-based app to store and manage the docs. I am looking at the way we translate in my organization and thinking about what would be a good online tool to coordinate translations. There are many proprietary tools with vast hoards of features and complications which cost 1-2 thousand dollars per user, but they are not suitable for OLPC. I think OLPC docs-trans would do well with a lighter, simpler application. If the list doesn't mind, I would like to post the resulting thoughts at a later date so that there can be an exchange of ideas. I apologize for the length, and I hope these ideas can be of help. Micheal Cooper, Japan _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel