On working http://FreeTamilEbooks.com project, we get many books in non-unicode text.
We are converting the text using open-tamil library. https://github.com/arcturusannamalai/open-tamil https://github.com/arulalant/txt2unicode It is a python script and runs in terminal. It can convert only the plain text. The authors gives their works as MS word doc, with lot of formatting like Bold/Italic/Tables/Headings etc. We convert them to plain text and then convert to unicode. Now, it becomes a tired job to reformat them all the text as previous formatting. Looking for ideas on how to convert the encoding with the rich text without losing any formatting. Share your thoughts. Thanks. -- Regards, T.Shrinivasan My Life with GNU/Linux : http://goinggnu.wordpress.com Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com Get CollabNet Subversion Edge : http://www.collab.net/svnedge _______________________________________________ ILUGC Mailing List: http://www.ae.iitm.ac.in/mailman/listinfo/ilugc ILUGC Mailing List Guidelines: http://ilugc.in/mailinglist-guidelines
