I hear ye loud and clear Matt :) Thank you for the response. On Wed, Oct 26, 2016 at 12:30 AM, < [email protected]> wrote:
> > From: Matt Post <[email protected]> > To: [email protected] > Cc: > Date: Tue, 25 Oct 2016 08:49:19 -0400 > Subject: Re: Joshua Model Input Format(s) and LM Loading > Hi Lewis, > > Joshua supports two language model representation packages: KenLM [0] and > BerkeleyLM [1]. These were both developed at about the same time, and > represented huge gains in doing this task efficiently, over what had > previously been the standard approach (SRILM). Ken Heafield (who has > contributed a lot to Joshua) went on to contribute a lot of other > improvements to language model representation, decoder integration, and > also the actual construction of language models and their efficient > interpolation. His goal for a while was to make SRILM completely > unnecessary, and I think he succeeded. > > BerkeleyLM was more of a one-off project. It is slower than KenLM and > hasn't been touched in years. If you want to understand, your efforts are > probably best spent looking into KenLM papers. But it's also worth noting > that Ken is a crack C++ programmer who has spent years hacking away on > these problems, and your chances of finding any further efficiencies there > are probably quite limited unless you have a lot of background in the area. > But even if you did, I would recommend you not spend your time that way — I > basically consider the LM representation problem to have been solved by > KenLM. That's not to say that there are some improvements to be had on the > Joshua / JNI bridge, but even there, there are probably better things to do. > > matt > > [0] KenLM: Faster and Smaller Language Model Queries > http://www.kheafield.com/professional/avenue/kenlm.pdf > > [1] Faster and Smaller N-Gram Language Models > http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf > >
