Re: Having Problem in Word Count and Language Detaction

Oleg Tikhonov Sat, 26 Oct 2013 12:11:37 -0700

This one is better"
https://issues.apache.org/jira/browse/TIKA-546




On Sat, Oct 26, 2013 at 10:05 PM, Oleg Tikhonov <[email protected]> wrote:

> Hi Animesh,
> my wild guess is that N-gram profile for Chinese wasn't trained pretty
> well. Try recreate Chinese language profile.
>
> Have a look here:
>
> http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/section6.html
>
> Hope it helps.
>
>
> On Sat, Oct 26, 2013 at 8:48 PM, Chris Mattmann <[email protected]>wrote:
>
>> Hi Animesh,
>>
>> Please detail your issue here on [email protected] and I'm sure
>> someone can help.
>>
>> Cheers,
>> Chris
>>
>>
>> -----Original Message-----
>> From: Animesh Kumar <[email protected]>
>> Date: Wednesday, October 23, 2013 9:15 PM
>> To: "[email protected]" <[email protected]>
>> Subject: Fwd: Having Problem in Word Count and Language Detaction
>>
>> >
>> >
>> >Sir/Mam,
>> >I am developing a web based software which use Apache Tika for getting
>> >Language and words Count of Uploaded file. Its working fine for English,
>> >Japanese , Hindi etc but giving wrong words count for Chinese. I am using
>> >tika-app-1.4.jar .
>> >and there is an another problem in word counting of file format different
>> >from doc and docx
>> >
>> >
>> >--
>> >With Thanks & Regards
>> >Animesh Kumar
>> >+918927992397 <tel:%2B918927992397>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >--
>> >With Thanks & Regards
>> >Animesh Kumar
>> >+918927992397 <tel:%2B918927992397>
>> >
>> >
>>
>>
>>
>

Re: Having Problem in Word Count and Language Detaction

Reply via email to