thanks Matt. Would you be able to point out such additional step in a bit more detail when you have time ? Not sure what you used for segmentation, perhaps could use either Lucene's CJK [1] or Kuromoji [2] analyzers.
Regards, Tommaso [1] : https://lucene.apache.org/core/7_0_0/analyzers-common/org/apache/lucene/analysis/cjk/CJKAnalyzer.html [2] : https://lucene.apache.org/core/7_0_0/analyzers-kuromoji/ Il giorno lun 19 feb 2018 alle ore 12:12 Matt Post <p...@cs.jhu.edu> ha scritto: > I don’t think I ever built these. There is an additional step of properly > and consistently segmenting Chinese which complicates things and creates an > external dependency. > > matt (from my phone) > > > Le 19 févr. 2018 à 10:46, Tommaso Teofili <tommaso.teof...@gmail.com> a > écrit : > > > > Hi all, > > > > I am not sure if I am missing something, but I somewhat recalled that > > language packs for Chinese (but also Japanese / Korean) existed at [1], > > however I can't find any. > > Reading through the comments it seems at least that was the plan. > > If that is a leftout from the recent LP migration we could try to fix it > > otherwise it'd be nice to build and provide such CJK LPs. > > Can anyone help clarify ? > > > > Regards, > > Tommaso > > > > [1] : https://cwiki.apache.org/confluence/display/JOSHUA/Language+Packs > >