Hi Daniel,
The data structures are keyed on the word being predicted, which is
inefficient at predicting every possible continuation. A forward trie
is much better at implementing these sorts of queries. I was designing
for random query speed.
You can eliminate backoff computation by using a State object. In
Python this is exposed with model.BeginSentence(state) then carrying
state with BaseScore. See the implementation of full_scores for an
example. Though you'd be better off with a C++ loop which sends an
array back to python, as much of the time you are spending is glue
rather than queries.
Kenneth
On 11/06/2017 11:26 AM, Daniel Torregrosa wrote:
> Hi all.
>
> I implemented a naïve system to predict the most likely next word for a
> given sentence, using KenLM python interface. The algorithm is simple:
> append each word in the vocabulary to the last n-1 (being n the order of
> the language model) words of the sentence, then score them using
> kenlm.Model.score(). The 1-grams of the language model are used as
> vocabulary.
>
> But, the prediction takes around 1 second. Is there any way to speed up
> this process? I have though of a couple of approaches:
>
> * The code can be further optimized by using the fragment score to
> cache some operations, but it seems that the feature is not
> implemented in the python interface
> (https://github.com/kpu/kenlm/issues/78).
> * The vocabulary can be pruned. Currently, it has around 400k words,
> but I cannot find a meaningful way of pruning the model that I can
> also justify.
>
> Thanks a lot.
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support