2008/3/28, Jean Christophe Andr? <jean-christophe.andre at auf.org>:
> Nguyen Vu Hung a ?crit :
>
> > In your scripts, you assume that the intput is a odt file whose format is 
> > zip?
>  >
>
> I don't assume it: it is! :-)
>
The users have 2 choices:

1. Convert TCVN3 encoded MS .doc file into UTF-8 encoded .odt
2. Convert an .odt with TCVN3 encoding into UTF-8 encoed .odt.

And your goal is:

<quote]>
                8       Goal: convert ODF documents from old Vietnamese
fonts to Unicode
                9       - locate Vietnamese fonts encoded texts and recode them 
to Unicode
                10      - specific to Vietnamese fonts: real encoding not the 
one declared
                11      - replace each old Vietnamese font by some Unicode 
equivalent
</quote>

You miss the first case.


>  I should (I will) read the ODF official specification to check if it
>  *has* to be or only *may* be Zip. I have to read it to check I did the
>  correct thing anyway... Since this script has been developed on a
>  trial/error basis (but still with a good knowledge of XML)...
Just try to unzip the .odt file, check if styles.xml and content.xml
are enough.


> Finally I find how to get rid of external processing since Python has it
>  all already. Well, almost all... Check the tcvn5712_1.py I had to create
>  (read first lines for installation instructions) to give Python know
>  this encoding. Look for "encode" in the source code...
Great! Can python auto detect those encodings?

One point: When you convert TCVN3 to UTF-8 ( in your _fontsInfo ),
please remember that the fonts like .VnTimeH are uppercase.

-- 
Best Regards,
Nguyen Hung Vu ( Nguy?n V? H?ng )
vuhung16plus{[email protected]
An inquisitive look at Harajuku
http://www.flickr.com/photos/vuhung/sets/72157600109218238/

Trả lời cho