I worked at a Japanese EC company before, and they used to have over 200,000 user dictionary entries. I am not sure they still use such a user dictionary, but the tokenizer and char/token filters cannot handle several writing variations. So, this is the important feature for Japanese handling.
Best, Kazuaki On May 18, 2024 at 21:39:23, Bruno Roustant <bruno.roust...@gmail.com> wrote: > Hi, > > While looking at the various usages of Map with Integer keys, I found > ja.dict.UserDictionary with its lookup() method where there is a *TODO: > can we avoid this treemap/toIndexArray?* > > I could propose something, but I would like to know how much it is used, > and if it is worth improving it. > > Thanks > > Bruno >