We read in the Word document and store it in our internal format - paragraph w/ properties and in each paragraph a sequence of character formatting and text strings. When we scan the document if it's WordML or DOCX then it is Unicode in the file. If it's RTF there is a codepage and we use that to convert to Unicode (and throw away the codepage info).
As DOCX is the future, we have to handle the case where we start with Unicode and are given no codepage. Thanks - dave -----Original Message----- From: 1T3XT info [mailto:[email protected]] Sent: Monday, December 22, 2008 12:59 PM To: Post all your questions about iText here Subject: Re: [iText-questions] Code page or unicode David Thielen wrote: > We build our documents from a Word document. So the customer has already > selected the fonts in Word - and we must use those fonts. So I don't > think that approach will work. And if it's Russian & Polish using > Verdana, don't we then have to get 2 different Verdana fonts, one for > each code page? If you leave the path of using Unicode, you have to take codepages into account, yes. It's not an obvious question. Are you working with an intermediate format? Do you have the encodings in Word? -- This answer is provided by 1T3XT BVBA http://www.1t3xt.com/ - http://www.1t3xt.info ------------------------------------------------------------------------ ------ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
