I worked at a Japanese EC company before, and they used to have over
200,000 user dictionary entries. I am not sure they still use such a user
dictionary, but the tokenizer and char/token filters cannot handle several
writing variations. So, this is the important feature for Japanese
handling.

Best,
Kazuaki


On May 18, 2024 at 21:39:23, Bruno Roustant <bruno.roust...@gmail.com>
wrote:

> Hi,
>
> While looking at the various usages of Map with Integer keys, I found
> ja.dict.UserDictionary with its lookup() method where there is a *TODO:
> can we avoid this treemap/toIndexArray?*
>
> I could propose something, but I would like to know how much it is used,
> and if it is worth improving it.
>
> Thanks
>
> Bruno
>

Reply via email to