This one is better" https://issues.apache.org/jira/browse/TIKA-546
On Sat, Oct 26, 2013 at 10:05 PM, Oleg Tikhonov <[email protected]> wrote: > Hi Animesh, > my wild guess is that N-gram profile for Chinese wasn't trained pretty > well. Try recreate Chinese language profile. > > Have a look here: > > http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html > > Hope it helps. > > > On Sat, Oct 26, 2013 at 8:48 PM, Chris Mattmann <[email protected]>wrote: > >> Hi Animesh, >> >> Please detail your issue here on [email protected] and I'm sure >> someone can help. >> >> Cheers, >> Chris >> >> >> -----Original Message----- >> From: Animesh Kumar <[email protected]> >> Date: Wednesday, October 23, 2013 9:15 PM >> To: "[email protected]" <[email protected]> >> Subject: Fwd: Having Problem in Word Count and Language Detaction >> >> > >> > >> >Sir/Mam, >> >I am developing a web based software which use Apache Tika for getting >> >Language and words Count of Uploaded file. Its working fine for English, >> >Japanese , Hindi etc but giving wrong words count for Chinese. I am using >> >tika-app-1.4.jar . >> >and there is an another problem in word counting of file format different >> >from doc and docx >> > >> > >> >-- >> >With Thanks & Regards >> >Animesh Kumar >> >+918927992397 <tel:%2B918927992397> >> > >> > >> > >> > >> > >> > >> > >> >-- >> >With Thanks & Regards >> >Animesh Kumar >> >+918927992397 <tel:%2B918927992397> >> > >> > >> >> >> >
