Hi Daniel,

        The data structures are keyed on the word being predicted, which is
inefficient at predicting every possible continuation.  A forward trie
is much better at implementing these sorts of queries.  I was designing
for random query speed.

        You can eliminate backoff computation by using a State object.  In
Python this is exposed with model.BeginSentence(state) then carrying
state with BaseScore.  See the implementation of full_scores for an
example.  Though you'd be better off with a C++ loop which sends an
array back to python, as much of the time you are spending is glue
rather than queries.

Kenneth
        

On 11/06/2017 11:26 AM, Daniel Torregrosa wrote:
> Hi all.
> 
> I implemented a naïve system to predict the most likely next word for a
> given sentence, using KenLM python interface. The algorithm is simple:
> append each word in the vocabulary to the last n-1 (being n the order of
> the language model) words of the sentence, then score them using
> kenlm.Model.score(). The 1-grams of the language model are used as
> vocabulary. 
> 
> But, the prediction takes around 1 second. Is there any way to speed up
> this process? I have though of a couple of approaches:
> 
>   * The code can be further optimized by using the fragment score to
>     cache some operations, but it seems that the feature is not
>     implemented in the python interface
>     (https://github.com/kpu/kenlm/issues/78).
>   * The vocabulary can be pruned. Currently, it has around 400k words,
>     but I cannot find a meaningful way of pruning the model that I can
>     also justify.
> 
> Thanks a lot.
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to