for the SRILM, you use the -unk flag; RandLM does this by default if I recall
Miles On 16 August 2011 06:28, Tom Hoar <[email protected]>wrote: > Ken, > > Does the online moses documentation refer to how to ensure the language > model has <unk> in the vocabulary? I've never seen it. > > What's the best way to ensure a LM has the <unk> token in the vocabulary? > Is it as simple as appending one line consisting of one <unk> token to the > language model corpus? Or, is there command line switch for ngram-count, > build-lm.sh, buildlm? Or, should we just edit the raw text language model > and add it to the vocabulary manually? > > Thanks, > Tom > > > > On Mon, 15 Aug 2011 22:12:36 +0100, Kenneth Heafield <[email protected]> > wrote: > > Ok I have reproduced the problem. It only happens when the ARPA file is > missing and is probably an off-by-one on vocabulary size. I'll have a fix > soon. > > Kenneth > > On 08/15/11 19:20, Kenneth Heafield wrote: > > Hi, > > Back from vacation and sorry but I'm having trouble reproducing this > locally. > > - Latest Moses (revision 4143); I haven't made any changes that should > impact language modeling since 4096. > - svn status says the relevant source code is unmodified. > - Tried an SRI model, including rebuilding with build_binary that ships > with Moses. > - Ran threaded and not threaded. > > Can you send me your very small SRILM model? Does it have ? > > Kenneth > > On 08/04/11 11:42, Kenneth Heafield wrote: > > Sorry I am slow to respond. This is my first thing to look at, but I am > traveling a lot through the 14th. > > Alex Fraser <[email protected]> wrote: >> >> Hi Kenneth -- >> >> Latest revision, 4096. Single threaded also crashes. >> >> Cheers, Alex >> >> >> On Fri, Jul 29, 2011 at 6:00 PM, Kenneth Heafield <[email protected]> >> wrote: >> >> > Hi, >> > >> > There was a problem with this; thought it was fixed but maybe it >> > came >> > back. Which revision are you running? Does it still happen if you run >> > single-threaded? >> > >> > Kenneth >> >> > >> > On 07/29/11 09:39, Alex Fraser wrote: >> >> Hi Folks, >> >> >> >> Tom Hoar previously mentioned that he had a problem with KenLMs built >> >> from SRILM crashing Moses. >> >> >> >> >> Fabienne Cap and I also have had a problem with this. It seems to be >> >> restricted to using the trie option with build-binary. >> >> >> >> Ken, if you have any problems repr! >> oducing >> this, please let me know. I >> >> >> can send you a very small SRILM trained language model that crashes >> >> moses when converted to binary with the trie option, but works fine as >> >> a probing binary and using the original ARPA. (BTW, this is running >> >> >> the decoder multi-threaded and the crash comes at some point during >> >> decoding the first sentence, not during loading files) >> >> >> >> Cheers, Alex >> >> >> ------------------------------ >> >> >> Moses-support mailing list >> >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > >> > >> ------------------------------ >> >> > Moses-support mailing list >> > [email protected] >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > >> >> _______________________________________________ Moses-support mailing > list [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
