Nguyen Vu Hung a ?crit :
> The users have 2 choices:
> 1. Convert TCVN3 encoded MS .doc file into UTF-8 encoded .odt
> 2. Convert an .odt with TCVN3 encoding into UTF-8 encoed .odt.
>   
No. On my side, they have only one choice: convert an ODF document.

This tool doesn't care (because I don't want to) about MS formats.

> You miss the first case.
I didn't missed it: I do *not* want to manage it! ;-)

It's a political choice: I will definitely help Vietnam to move to OOo
and Unicode by the way, but I will *not* help keeping using Microsoft
formats.

> Just try to unzip the .odt file, check if styles.xml and content.xml are 
> enough.
>   
That's exactly what I'm doing in this script, raising a "NotODFError"
event in case of failure.

> Great! Can python auto detect those encodings?
>   
I'm not sure what kind of auto-detection you are talking about...

Auto-detection is not always doable, depending of the kind of difference
between the encodings, or it may because truly hard if the encodings use
the same coding rules (like TCVN-5712-1 and ISO-8859-1, using raw 8 bits
to code 256 characters) => you'll have to guess the encoding using
pattern recognition (eg: words from a dictionary).

But I think we need that here... Rules are simple: ".VnTimes" uses
TCVN-5712 encoding (declaring it wrongly as CP1252), "VNITimes" uses VNI
encoding and "Times New Roman" uses Unicode encoding. So it's quite easy
to recognize the "encoding" here.

> One point: When you convert TCVN3 to UTF-8 ( in your _fontsInfo ), please 
> remember that the fonts like .VnTimeH are uppercase.
I did care of it! Checks the properties associated to it, this one is to
make the text go uppercase: 'fo:text-transform': 'uppercase'

-- 
Jean Christophe "????" ANDR? ? Responsable technique r?gional
Bureau Asie-Pacifique (BAP) ? http://asie-pacifique.auf.org/
Agence universitaire de la Francophonie (AuF) ? http://www.auf.org/
Adresse postale : AUF, 21 L? Th?nh T?ng, T.T. Ho?n Ki?m, H? N?i, Vi?t Nam
T?l. : +84 4 9331108   Fax : +84 4 8247383   Mobile : +84 91 3248747
? Note personnelle : merci d'?viter de m'envoyer des fichiers PowerPoint  ?
? ou Word, voir http://www.gnu.org/philosophy/no-word-attachments.fr.html ?


-------------- section suivante --------------
Une pi?ce jointe non texte a ?t? nettoy?e...
Nom: signature.asc
Type: application/pgp-signature
Taille: 252 octets
Desc: OpenPGP digital signature
Url: 
http://lists.hanoilug.org/pipermail/hanoilug/attachments/20080328/baf91d1f/attachment.pgp
 

Trả lời cho