Re: Joshua Model Input Format(s) and LM Loading

lewis john mcgibbney Wed, 26 Oct 2016 00:36:09 -0700

I hear ye loud and clear Matt :) Thank you for the response.

On Wed, Oct 26, 2016 at 12:30 AM, <
[email protected]> wrote:


>
> From: Matt Post <[email protected]>
> To: [email protected]
> Cc:
> Date: Tue, 25 Oct 2016 08:49:19 -0400
> Subject: Re: Joshua Model Input Format(s) and LM Loading
> Hi Lewis,
>
> Joshua supports two language model representation packages: KenLM [0] and
> BerkeleyLM [1]. These were both developed at about the same time, and
> represented huge gains in doing this task efficiently, over what had
> previously been the standard approach (SRILM). Ken Heafield (who has
> contributed a lot to Joshua) went on to contribute a lot of other
> improvements to language model representation, decoder integration, and
> also the actual construction of language models and their efficient
> interpolation. His goal for a while was to make SRILM completely
> unnecessary, and I think he succeeded.
>
> BerkeleyLM was more of a one-off project. It is slower than KenLM and
> hasn't been touched in years. If you want to understand, your efforts are
> probably best spent looking into KenLM papers. But it's also worth noting
> that Ken is a crack C++ programmer who has spent years hacking away on
> these problems, and your chances of finding any further efficiencies there
> are probably quite limited unless you have a lot of background in the area.
> But even if you did, I would recommend you not spend your time that way — I
> basically consider the LM representation problem to have been solved by
> KenLM. That's not to say that there are some improvements to be had on the
> Joshua / JNI bridge, but even there, there are probably better things to do.
>
> matt
>
> [0] KenLM: Faster and Smaller Language Model Queries
> http://www.kheafield.com/professional/avenue/kenlm.pdf
>
> [1] Faster and Smaller N-Gram Language Models
> http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf
>
>

Re: Joshua Model Input Format(s) and LM Loading

Reply via email to