RE: character encoding and charsets

Justin Warren Wed, 09 May 2007 06:05:01 -0700

Ok. After some digging, I've found that cChFtnEdn has something to do
with footers.


But I did find that what I'm looking for is the getChs() and setChs() to
determine the default extended character set id for the text stream, and
getChsTables() for determining the default extended charset id for
internal data.

They are fields' field_11 and field_12 respectively. Is there any
documentation on what these fields are, or their mappings? 

thanks
 

-----Original Message-----
From: Justin Warren 
Sent: Thursday, May 03, 2007 11:56 AM
To: POI Users List
Subject: character encoding and charsets

Hi guys..

 

I have an interesting problem. I am using POI to extract text from a
word doc. (word 2000/03 usually). But the document is written in
Chinese. So naturally, when I write the extracted text to a plaintext
file, I get random ascii characters. So, I want to be able to decode the
charset into UTF-8. Is there any way to determine the charset so I can
decode it?

 

In eclipse, I am doing a WordExtractor.getParagraphs() and if I set a
breakpoint, I can see the Chinese characters. Also, I noticed that there
is a property in HWPFDocument called field_27_cChFtnEdn. Is that
possibly what I should be looking at?

 

Thanks



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

RE: character encoding and charsets

Reply via email to