Hi Animesh,
my wild guess is that N-gram profile for Chinese wasn't trained pretty
well. Try recreate Chinese language profile.

Have a look here:
http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html

Hope it helps.


On Sat, Oct 26, 2013 at 8:48 PM, Chris Mattmann <[email protected]> wrote:

> Hi Animesh,
>
> Please detail your issue here on [email protected] and I'm sure
> someone can help.
>
> Cheers,
> Chris
>
>
> -----Original Message-----
> From: Animesh Kumar <[email protected]>
> Date: Wednesday, October 23, 2013 9:15 PM
> To: "[email protected]" <[email protected]>
> Subject: Fwd: Having Problem in Word Count and Language Detaction
>
> >
> >
> >Sir/Mam,
> >I am developing a web based software which use Apache Tika for getting
> >Language and words Count of Uploaded file. Its working fine for English,
> >Japanese , Hindi etc but giving wrong words count for Chinese. I am using
> >tika-app-1.4.jar .
> >and there is an another problem in word counting of file format different
> >from doc and docx
> >
> >
> >--
> >With Thanks & Regards
> >Animesh Kumar
> >+918927992397 <tel:%2B918927992397>
> >
> >
> >
> >
> >
> >
> >
> >--
> >With Thanks & Regards
> >Animesh Kumar
> >+918927992397 <tel:%2B918927992397>
> >
> >
>
>
>

Reply via email to