Hi Mike,

So the issues are Arabic, Korean and Chinese, right?

I’d suggest filing an issue for Tika, so at least we can track it, though 
likely the issue is with the language-detector project we’re using for 
detection.

I’m leaving on a trip this evening, but back next week, so will try to look at 
it then.

Regards,

— Ken


> On Jan 17, 2019, at 1:48 PM, Mike Thomsen <[email protected]> wrote:
> 
> Ken,
> 
> Here's a Gist version of it:
> 
> https://gist.github.com/MikeThomsen/84abb89aab903a8b21d64af532cc369b
> 
> Thanks,
> 
> Mike
> 
> On Thu, Jan 17, 2019 at 2:25 PM Ken Krugler <[email protected]>
> wrote:
> 
>> Hi Mike,
>> 
>> I don’t see the script - did it get stripped?
>> 
>> Below is a list of the language profiles that I believe are bundled with
>> the language-detector jar that’s pulled in by Tika.
>> 
>> I don’t see “gr” - note that Greek is “el”.
>> 
>> And there’s “zh-CN” and “zh-TW” vs. just “zh”, but otherwise I’d expect
>> detection to work for your test cases.
>> 
>> — Ken
>> 
>> af
>> an
>> ar
>> ast
>> be
>> bg
>> bn
>> br
>> ca
>> cs
>> cy
>> da
>> de
>> el
>> en
>> es
>> et
>> eu
>> fa
>> fi
>> fr
>> ga
>> gl
>> gu
>> he
>> hi
>> hr
>> ht
>> hu
>> id
>> is
>> it
>> ja
>> km
>> kn
>> ko
>> lt
>> lv
>> mk
>> ml
>> mr
>> ms
>> mt
>> ne
>> nl
>> no
>> oc
>> pa
>> pl
>> pt
>> ro
>> ru
>> sk
>> sl
>> so
>> sq
>> sr
>> sv
>> sw
>> ta
>> te
>> th
>> tl
>> tr
>> uk
>> ur
>> vi
>> yi
>> zh-CN
>> zh-TW
>> 
>> 
>>> On Jan 17, 2019, at 9:39 AM, Mike Thomsen <[email protected]>
>> wrote:
>>> 
>>> I wrote a Groovy script (attached) to test a bunch of languages against
>> the LanguageDetector class, and these were the results:
>>> 
>>> ar    fa
>>> de    de
>>> en    en
>>> es    es
>>> fr    fr
>>> gr    el
>>> it    it
>>> ko    lt
>>> nl    nl
>>> ru    ru
>>> zh    lt
>>> 
>>> Is there something that needs to be done to enable the detection of
>> Asian languages or should I file this as a bug report?
>>> 
>>> Thanks,
>>> 
>>> Mike
>> 
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> Custom big data solutions & training
>> Flink, Solr, Hadoop, Cascading & Cassandra
>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to