Github user mjpost commented on the issue:
https://github.com/apache/incubator-joshua/pull/48
Holy smokes, thanks for tracking this down. So if I understand correctly,
this only occurs under the following circumstances:
- decoding with multiple KenLM language models
- built with different vocabularies (the usual case)
- a hash collision occurs and returns a state containing an ID that is
invalid in the calling KenLM
Do you have any idea how often hash collisions actually occur?
I wonder if turning off sharing of KenLM states across LMs would also have
worked, with little to no effect on performance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---