tika-user  

Re: Can not filter out doc containing Chinese chars

Alex Ott
Thu, 03 Dec 2009 23:49:55 -0800

Re

Current snapshot of tika (0.6) process this file correctly, returning 在么
as text

Li Leon  at "Fri, 4 Dec 2009 11:04:58 +0800" wrote:
 LL> Hi all,
 LL>  
 LL>  
 LL> I'm using the following command to filter out the attached doc which is in 
Chinese. The doc was filtered fine but only with gibberish output.
 LL> Any ideas?
 LL>  
 LL> "type "chinese char.doc" | java -jar "tika-app-0.4.jar" -x"




-- 
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/           http://xtalk.msk.su/~ott/
http://alexott-ru.blogspot.com/