2008/3/28, Jean Christophe Andr? <jean-christophe.andre at auf.org>:
> Nguyen Vu Hung a ?crit :
>
> > In your scripts, you assume that the intput is a odt file whose format is
> > zip?
> >
>
> I don't assume it: it is! :-)
>
The users have 2 choices:
1. Convert TCVN3 encoded MS .doc file into UTF-8 encoded .odt
2. Convert an .odt with TCVN3 encoding into UTF-8 encoed .odt.
And your goal is:
<quote]>
8 Goal: convert ODF documents from old Vietnamese
fonts to Unicode
9 - locate Vietnamese fonts encoded texts and recode them
to Unicode
10 - specific to Vietnamese fonts: real encoding not the
one declared
11 - replace each old Vietnamese font by some Unicode
equivalent
</quote>
You miss the first case.
> I should (I will) read the ODF official specification to check if it
> *has* to be or only *may* be Zip. I have to read it to check I did the
> correct thing anyway... Since this script has been developed on a
> trial/error basis (but still with a good knowledge of XML)...
Just try to unzip the .odt file, check if styles.xml and content.xml
are enough.
> Finally I find how to get rid of external processing since Python has it
> all already. Well, almost all... Check the tcvn5712_1.py I had to create
> (read first lines for installation instructions) to give Python know
> this encoding. Look for "encode" in the source code...
Great! Can python auto detect those encodings?
One point: When you convert TCVN3 to UTF-8 ( in your _fontsInfo ),
please remember that the fonts like .VnTimeH are uppercase.
--
Best Regards,
Nguyen Hung Vu ( Nguy?n V? H?ng )
vuhung16plus{[email protected]
An inquisitive look at Harajuku
http://www.flickr.com/photos/vuhung/sets/72157600109218238/