Ken, Here's a Gist version of it:
https://gist.github.com/MikeThomsen/84abb89aab903a8b21d64af532cc369b Thanks, Mike On Thu, Jan 17, 2019 at 2:25 PM Ken Krugler <[email protected]> wrote: > Hi Mike, > > I don’t see the script - did it get stripped? > > Below is a list of the language profiles that I believe are bundled with > the language-detector jar that’s pulled in by Tika. > > I don’t see “gr” - note that Greek is “el”. > > And there’s “zh-CN” and “zh-TW” vs. just “zh”, but otherwise I’d expect > detection to work for your test cases. > > — Ken > > af > an > ar > ast > be > bg > bn > br > ca > cs > cy > da > de > el > en > es > et > eu > fa > fi > fr > ga > gl > gu > he > hi > hr > ht > hu > id > is > it > ja > km > kn > ko > lt > lv > mk > ml > mr > ms > mt > ne > nl > no > oc > pa > pl > pt > ro > ru > sk > sl > so > sq > sr > sv > sw > ta > te > th > tl > tr > uk > ur > vi > yi > zh-CN > zh-TW > > > > On Jan 17, 2019, at 9:39 AM, Mike Thomsen <[email protected]> > wrote: > > > > I wrote a Groovy script (attached) to test a bunch of languages against > the LanguageDetector class, and these were the results: > > > > ar fa > > de de > > en en > > es es > > fr fr > > gr el > > it it > > ko lt > > nl nl > > ru ru > > zh lt > > > > Is there something that needs to be done to enable the detection of > Asian languages or should I file this as a bug report? > > > > Thanks, > > > > Mike > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > Custom big data solutions & training > Flink, Solr, Hadoop, Cascading & Cassandra > >
