Li Leon
Sun, 06 Dec 2009 18:24:36 -0800
With $ java -jar tika-app-0.5.jar --text "Chinese Char.doc" I ended up with "??"
In my situation: java -jar tika-app-0.5.jar -eunicode --text "Chinese Char.doc" produced correct result "在么" All of above happened in a Windows environment during debugging. I spotted the output in Visual Studio "Watch window" tool that supports displaying UTF-8 encoding. I just wonder why this is happened. Thanks, 2009/12/4 Jukka Zitting <jukka.zitt...@gmail.com> > Hi, > > 2009/12/4 Li Leon <leon800...@gmail.com>: > > Out of interest, how did you get the output? Programmatically or command > > line, if command line what command did you use. > > $ java -jar tika-app-0.5.jar --text "Chinese Char.doc" > > BR, > > Jukka Zitting >