tika-user  

Can not filter out doc containing Chinese chars

Li Leon
Thu, 03 Dec 2009 19:05:25 -0800

Hi all,

I'm using the following command to filter out the attached doc which is in
Chinese. The doc was filtered fine but only with gibberish output. Any
ideas?

"type "chinese char.doc" | java -jar "tika-app-0.4.jar" -x"



Thanks,

Attachment: Chinese Char.doc
Description: MS-Word document