Nguyen Vu Hung a ?crit : > The users have 2 choices: > 1. Convert TCVN3 encoded MS .doc file into UTF-8 encoded .odt > 2. Convert an .odt with TCVN3 encoding into UTF-8 encoed .odt. > No. On my side, they have only one choice: convert an ODF document.
This tool doesn't care (because I don't want to) about MS formats. > You miss the first case. I didn't missed it: I do *not* want to manage it! ;-) It's a political choice: I will definitely help Vietnam to move to OOo and Unicode by the way, but I will *not* help keeping using Microsoft formats. > Just try to unzip the .odt file, check if styles.xml and content.xml are > enough. > That's exactly what I'm doing in this script, raising a "NotODFError" event in case of failure. > Great! Can python auto detect those encodings? > I'm not sure what kind of auto-detection you are talking about... Auto-detection is not always doable, depending of the kind of difference between the encodings, or it may because truly hard if the encodings use the same coding rules (like TCVN-5712-1 and ISO-8859-1, using raw 8 bits to code 256 characters) => you'll have to guess the encoding using pattern recognition (eg: words from a dictionary). But I think we need that here... Rules are simple: ".VnTimes" uses TCVN-5712 encoding (declaring it wrongly as CP1252), "VNITimes" uses VNI encoding and "Times New Roman" uses Unicode encoding. So it's quite easy to recognize the "encoding" here. > One point: When you convert TCVN3 to UTF-8 ( in your _fontsInfo ), please > remember that the fonts like .VnTimeH are uppercase. I did care of it! Checks the properties associated to it, this one is to make the text go uppercase: 'fo:text-transform': 'uppercase' -- Jean Christophe "????" ANDR? ? Responsable technique r?gional Bureau Asie-Pacifique (BAP) ? http://asie-pacifique.auf.org/ Agence universitaire de la Francophonie (AuF) ? http://www.auf.org/ Adresse postale : AUF, 21 L? Th?nh T?ng, T.T. Ho?n Ki?m, H? N?i, Vi?t Nam T?l. : +84 4 9331108 Fax : +84 4 8247383 Mobile : +84 91 3248747 ? Note personnelle : merci d'?viter de m'envoyer des fichiers PowerPoint ? ? ou Word, voir http://www.gnu.org/philosophy/no-word-attachments.fr.html ? -------------- section suivante -------------- Une pi?ce jointe non texte a ?t? nettoy?e... Nom: signature.asc Type: application/pgp-signature Taille: 252 octets Desc: OpenPGP digital signature Url: http://lists.hanoilug.org/pipermail/hanoilug/attachments/20080328/baf91d1f/attachment.pgp
