[Moses-support] KenLM distributed with Moses

2010-10-18 Thread Kenneth Heafield
Hi Moses, Introducing kenlm in Moses trunk. You no longer need to download a separate language model to use Moses; it's distributed with Moses and compiled in by default on UNIX. This is threadsafe language model inference code that returns the same probabilities as SRI (up to floating p

Re: [Moses-support] KenLM distributed with Moses

2010-10-22 Thread Kenneth Heafield
do we need to > train with another tool, like SRILM or convert IRSTLM to full ARPA format? > > Thanks again, > Tom > > > > On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield > wrote: >> Hi Moses, >> >> Introducing kenlm in Moses trunk. You no

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Kenneth Heafield
owing an instance of 'lm::FormatLoadException' >> what(): Expected blank line after 3-grams at byte 22348989 in file >> arpa.en.lm >> Aborted >> >> What am I missing? >> >> Thanks, >> Tom >> >> >> On Fri,

Re: [Moses-support] KenLM distributed with Moses

2010-10-26 Thread Kenneth Heafield
n-grams) and > the error disappeared. > > It's pretty fast now. I look forward to testing the optimized code. > > Tom > > > > On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield > wrote: >> I've fixed this in revision 3657 and tested that it wo

Re: [Moses-support] KenLM distributed with Moses

2010-10-27 Thread Kenneth Heafield
Revision 3671 introduces an updated version of kenlm. Queries are faster now (no more string vocab lookups, state is kept so backoffs cost less). The binary format has changed as a result; please rebuild your binary files. Timing is forthcoming. Kenneth On 10/18/10 20:31, Kenneth Heafield

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-28 Thread Kenneth Heafield
Hi Felipe, Please run $recent_moses_build/kenlm/query langmodel.lm Hello all, > > My question is about SRILM and IRSTLM, it is not directly related to > Moses, but I did not know where to ask. > > I am scoring individual sentences with a 5-gram language model and I get > different sco

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Kenneth Heafield
sys 0.00047 > rss 316656 kB > Total time including destruction: > user18.0001 > sys 0.00051 > rss 1312 kB > > It seems that it is adding the end-of-sentence token, but not that of > the begin of sentence. > > Score (-55.599) is different from SRILM (

Re: [Moses-support] KenLM distributed with Moses

2010-10-29 Thread Kenneth Heafield
50 minutes > BLEU Score: 0.2514 > > > > > On Wed, 27 Oct 2010 14:15:39 -0400, Kenneth Heafield > wrote: >> Revision 3671 introduces an updated version of kenlm. Queries are >> faster now (no more string vocab lookups, state is kep

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Kenneth Heafield
That documentation was specific to kenlm's query tool. kenlm does the same thing as SRI with respect to sentence boundary tokens. As to what that is, I'm deferring to Edinburgh. Kenneth On 10/29/10 10:28, John Burger wrote: > Kenneth Heafield wrote: > >> kenlm's q

Re: [Moses-support] Different scores with SRILM and IRSTLM

2010-10-29 Thread Kenneth Heafield
esigned to score internal and tokens so you'll get weird results if they're duplicated. . . Kenneth On 10/29/10 10:37, Kenneth Heafield wrote: > That documentation was specific to kenlm's query tool. kenlm does the > same thing as SRI with respect to sentence boundary tok

[Moses-support] Language model filter

2010-10-30 Thread Kenneth Heafield
Dear Moses, Can I interest you in an ARPA language model filter? http://kheafield.com/code/mt/filter.html . It enforces phrase and sentence-level constraints, not just vocabulary. You might have to modify your perl scripts. Kenneth ___ Moses-s

Re: [Moses-support] KenLM distributed with Moses

2010-11-02 Thread Kenneth Heafield
ndly CompanyThink of the environment; please > don't print this e-mail unless you really need to. > > Fast Track 100 2009Queens Award for Business > > > > > On 19 October 2010 01:31, Kenneth Heafield <mailto:mo...@kheafield.com>> wrote: > > Hi Moses

Re: [Moses-support] Compiling moses with SRILM, checking for trigram_init in -loom... no

2010-11-19 Thread Kenneth Heafield
Try KenLM. Run ./configure (no argument), change your moses.ini so the first digit is 8, and get the same results with less time, memory, and compilation headache. If you still want to use SRI with moses: Is your machine actually 64-bit but SRI annoyingly decided to compile 32-bit? If so, modif

Re: [Moses-support] Error : Cannot find -lkenlm

2010-11-23 Thread Kenneth Heafield
Interesting. Do you have the file /home/deeps/mosesdecoder/kenlm/libkenlm.a? Does this happen after: make clean ./regenerate-makefiles.sh ./configure --with-irstlm=/usr/local/lib make Try using single-threaded make so we can tell if this is a parallelization issue. What Linux distribution are

Re: [Moses-support] list of special characters

2010-11-24 Thread Kenneth Heafield
, , and (but your tokenizer might split these anyway). On 11/24/10 11:16, Philipp Koehn wrote: > Hi, > > this would probably good to spell out in the documentation. > > The short answer is: > > * if you use the default setup, only the bar '|' is a special character > > * if you use XML input

Re: [Moses-support] Compilation error with moses-chart

2010-11-26 Thread Kenneth Heafield
You're missing PhraseDictionaryMyImpl::~PhraseDictionaryMyImpl() {} in your cc file. On 11/26/10 12:58, Fabienne Braune wrote: > Hi, > > I have implemented a new type of phrase dictionnary in order to write my > own GetChartRuleCollection(...) method. I get the error-message > "/.../mosesdecoder/

Re: [Moses-support] error compiling moses (little endian error ppc64)

2010-11-27 Thread Kenneth Heafield
Ooh a big-endian user. Guess I'll have to write those routines. For now you can comment out the offending #error but don't use kenlm's trie implementation (the default probing hash table is fine). It looks like the switchable endianness on PPC is a choice made by the operating system and you're

Re: [Moses-support] error compiling moses (little endian error ppc64)

2010-11-27 Thread Kenneth Heafield
reads, Linux does little-endian on Itanium, and running Moses on my MIPS-based wireless router doesn't sound like a good idea. A shame we threw out the PA-RISC machines. Kenneth On 11/27/10 10:15, Kenneth Heafield wrote: > Ooh a big-endian user. Guess I'll have to write those routines.

[Moses-support] KenLM benchmarks and data structure options

2010-12-07 Thread Kenneth Heafield
Dear Moses, Of SRI and IRST, the fastest is SRI's default. KenLM's trie implementation uses 16% less CPU. The smallest [without quantization] is IRST with lazy loading. KenLM's trie implementation uses 42% less memory. Simultaneously. Full benchmarks at http://kheafield.com/code/kenlm

Re: [Moses-support] moses -threads X vs. make -j X

2011-01-03 Thread Kenneth Heafield
make -j sets the number of processes to compile Moses. It impacts the speed with which Moses compiles. It has no impact on the binary produced and, therefore, no impact on training or decoding time. There is no maximum but, as some files depend on others, only so many files can be compiled simul

Re: [Moses-support] question

2011-01-05 Thread Kenneth Heafield
kenlm doesn't build ARPA files; you will need SRILM/IRSTLM to build one. So for example, 1. Compile Moses. I put this before the install SRI step to emphasize that Moses does not need to be linked to SRI. 2. Install SRILM 3. Run SRILM's ngram program to generate an ARPA file 4. Pass --lm 0:5:foo

Re: [Moses-support] KenLM build_binary exception

2011-01-06 Thread Kenneth Heafield
Hey IRST, why are you generating positive log probabilities? I'll have to fix the error message to print the number 4 instead of ASCII value 4. On 01/06/11 04:13, supp...@precisiontranslationtools.com wrote: > I've been using IRSTLM's build-lm.sh to build an LM. Then converted from > iARPA to ARP

Re: [Moses-support] multiple LMs on 64-bit

2011-01-06 Thread Kenneth Heafield
That code is inside SRILM. You might get an answer by posting to srilm-u...@speech.sri.com . Or use kenlm. . . On 01/06/11 08:02, John Morgan wrote: > Hello, > > I'm trying to build systems with multiple LMs as features in the ems. > I have 7 subcorpora, c1,c2,...c7. > I use [LM:c1], [LM:c2]

Re: [Moses-support] error with --with-kenlm

2011-01-11 Thread Kenneth Heafield
I've just checked in revision 3796 which fixes this problem, including the OnDiskWrapper issue for bonus kicks. Tested with: ./configure, ./configure --without-kenlm, ./configure --enable-shared, and ./configure --enable-shared --without-kenlm . I would have just added -lkenlm to LIBS in configur

[Moses-support] kenlm updated in 3847

2011-01-25 Thread Kenneth Heafield
I've checked in an updated kenlm as revision 3847. This involves a binary format change, so you'll need to rebuild from your ARPA files, sorry. - There's an important correctness fix. Some models contain n-grams like "foo bar baz quux" without their n-grams e.g. "bar baz quux" and "baz quux" bec

[Moses-support] Fixed cannot find KEN-LM in lm/model.hh

2011-01-28 Thread Kenneth Heafield
If you created a clean checkout with revision [3849,3859) then you might have gotten an error from ./configure about not finding KEN-LM in lm/model.hh . I've fixed this in revision 3859. Existing checkouts updated to these revisions were fine. Sorry, Kenneth "can we please stop using autotools?

Re: [Moses-support] problem loading LM

2011-02-06 Thread Kenneth Heafield
The first error you report (body != 0) means malloc returned 0. That's an out of memory condition (or a bug in SRI asking for 0 memory). Are you you compiling 32-bit or running with any other hard limit on RAM? Don't know what your second error is. Try kenlm. It uses less memory and has more i

Re: [Moses-support] problem loading LM

2011-02-07 Thread Kenneth Heafield
*** >> [/home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.kenlm] >> Segmentation fault >> make: *** Deleting file >> `/home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.kenlm' >> >> Thanks

Re: [Moses-support] Using KENLM

2011-02-07 Thread Kenneth Heafield
Hi, What revision of Moses are you using? Does this still happen after you run svn up and recompile Moses? Kenneth On 02/07/11 10:53, Kārlis Goba wrote: > Hi, > > My preferred way to build large LMs has been IRSTLM as it can handle large > corpora nicely by splitting the task. The pro

Re: [Moses-support] Using KENLM

2011-02-07 Thread Kenneth Heafield
Just to get the word out more, trie is broken before 3847 for common pruning strategies as announced in "[Moses-support] kenlm updated in 3847". Admittedly the subject could have yelled more, but it's also easy to miss posts. On 02/07/11 11:11, Kārlis Goba wrote: > Thanks, Kenneth, > > This was

Re: [Moses-support] Unable to build KenLM: bus error

2011-02-09 Thread Kenneth Heafield
What architecture are you on? 64-bit x86? I'm assuming you compiled 64-bit. Could you send me either the ARPA or a tarball of the temporary building directory snapshotted by hitting ctrl+c in while the second progress bar is running? I've sent you off-list instructions on how transfer a file t

Re: [Moses-support] Segmentation fault in KenLM

2011-02-10 Thread Kenneth Heafield
Weird. It's already checked that contextFactor is non-empty. This could be a bad or NULL Word * object or factor set incorrectly. Are you using factors? What are your LM lines from moses.ini? On 02/10/11 04:39, Christian Rishøj Jensen wrote: > > I am seeing a segmentation fault in KenLM this

Re: [Moses-support] Unable to build KenLM: bus error

2011-02-10 Thread Kenneth Heafield
Please update to revision 3877 or above. I've checked in fix that's probably it. Sorry, Kenneth On 02/10/11 01:07, Kenneth Heafield wrote: > What architecture are you on? 64-bit x86? I'm assuming you compiled > 64-bit. > > Could you send me either the ARPA or

Re: [Moses-support] Segmentation fault in KenLM

2011-02-10 Thread Kenneth Heafield
Does this work if you substitute IRST or SRI? I'm using essentially the same calls they are to get vocab IDs here. On 02/10/11 04:39, Christian Rishøj Jensen wrote: > > I am seeing a segmentation fault in KenLM this morning: > > reading bin ttable > size of OFF_T 8 > binary phrasefile loaded, d

Re: [Moses-support] Unable to build KenLM: bus error

2011-02-11 Thread Kenneth Heafield
odels, sorry. Kenneth On 02/10/11 20:54, Kenneth Heafield wrote: > Please update to revision 3877 or above. I've checked in fix that's > probably it. > > Sorry, > > Kenneth > > On 02/10/11 01:07, Kenneth Heafield wrote: >> What architecture are you on?

[Moses-support] Fwd: Build LM using IRSTLM

2011-02-13 Thread Kenneth Heafield
I don't really know how to use EMS, so hopefully the mailing list can answer this question. Original Message Subject:Build LM using IRSTLM Date: Sun, 13 Feb 2011 22:05:13 +0330 From: amin farajian To: mo...@kheafield.com Hello Dear Heafield, I'm trying to bui

Re: [Moses-support] Segmentation fault in KenLM

2011-02-14 Thread Kenneth Heafield
, unsigned int*) const: >> Assertion `(*contextFactor[count-1])[factorType] != __null' failed. >> >> I am not quite sure what is causing this. >> Could it be related to the use of binarized phrase tables? >> >> >> >> On Feb 10, 2011, at 4:00 PM, Ken

[Moses-support] Exception Printing

2011-02-23 Thread Kenneth Heafield
Hiya Moses, There are a fair number of exceptions thrown that are not intended to be caught e.g. Sentence.cpp: 107: if (!ProcessAndStripXMLTags(line, xmlOptionsList, m_reorderingConstraint, xmlWalls )) { const string msg("Unable to parse XML in line: " + line); TRACE

Re: [Moses-support] Exception Printing

2011-02-23 Thread Kenneth Heafield
owever, they note that "new projects" could benefit. Also, I'm responsible for getting pointer container on the list of approved Boost libraries. > > > cheers > Barry > > > On Wednesday 23 Feb 2011 20:30:27 Kenneth Heafield wrote: >> Hiya Moses, &

Re: [Moses-support] Exception Printing

2011-02-23 Thread Kenneth Heafield
On 02/23/11 17:02, Barry Haddow wrote: > >> There's a question of location: for my purposes this should be linked >> into kenlm/build_binary, kenlm/query, moses-cmd/src/moses, etc. I see >> the mert implementation and lmserver also throw exceptions, so it should >> probably be linked in there as

Re: [Moses-support] boost check fails, boost installed in home directory

2011-03-12 Thread Kenneth Heafield
I thought Ondrej Bojar had changed the regression tests to use KenLM but perhaps this was only a partial change. On 03/12/11 09:01, Alexander Fraser wrote: > 2) The regression tests fail with no external LMs because of some > problem. This is also not true, the regression tests require you to > co

Re: [Moses-support] Trying to do fancy things with LMs; need some advice.

2011-03-18 Thread Kenneth Heafield
I think you'd be better off implementing your own StatefulFeatureFunction, bypassing LanguageModel.{h,cpp} which mostly handles n-grams crossing phrase boundaries, and calling the LanguageModelImplementation as the backend. You'll probably want larger beams too. Kenneth On 03/18/11 13:38, Dennis

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-19 Thread Kenneth Heafield
I believe the right answer to this is adding an OOV count feature to Moses. In fact, I've gone through and made all the language models return a struct indicating if the word just scored was OOV. However, this needs to make in into the phrases and ultimately the features. Also, there's the fun of

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-19 Thread Kenneth Heafield
ack to a very low floor. > So it may be that Alex's desired feature is just a bug, which can > be reproduced with kenlm by not training with "-unk", hence > also falling back to the floor probability (if that is what kenlm > is doing). > > -phi > > On Sat,

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-19 Thread Kenneth Heafield
With a closed vocabulary LM, SRILM returns -inf on OOV and moses floors this to LOWEST_SCORE which is -100.0. If you want identical behavior from KenLM, kenlm/build_binary -u -100.0 foo.arpa foo.binary Unless you passed -vocab to SRILM (and most people don't), never appears except as a unigram.

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-19 Thread Kenneth Heafield
atever; this is what I thought the error message > was referring to. Yes, that is what is causing the problem. > > Cheers, Alex > > > On Sat, Mar 19, 2011 at 6:25 PM, Kenneth Heafield wrote: >> The original behavior was to refuse to load any model without . >>

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-19 Thread Kenneth Heafield
gt; get a further improvement. > > Cheers, Alex > > > On Sat, Mar 19, 2011 at 7:18 PM, Kenneth Heafield wrote: >> With a closed vocabulary LM, SRILM returns -inf on OOV and moses floors >> this to LOWEST_SCORE which is -100.0. If you want identical behavior >>

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-21 Thread Kenneth Heafield
op-unknown). > All translations will have them. > > Otherwise, all words in the translation model should be known. > > So, what is the choice here? > > -phi > > On Sat, Mar 19, 2011 at 7:19 PM, Kenneth Heafield wrote: >> I believe -vocab takes a file containing the

Re: [Moses-support] producing the minimal number of LM-OOVs

2011-03-21 Thread Kenneth Heafield
many cases. I believe this behavior is better than the situation with SRI where no backoff penalty is charged, and therefore you may encounter different results when using KenLM on any language model without . Kenneth On 03/21/11 09:56, Kenneth Heafield wrote: > So, assuming the parallel data is

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-24 Thread Kenneth Heafield
Many distributions randomize shared library addresses each time you run an executable in order to make buffer overflow attacks harder. There's plenty of things that will make addresses returned by malloc/mmap vary without threading. Kenneth On 03/24/11 10:15, Lane Schwartz wrote: > On Thu, Mar 2

Re: [Moses-support] Segmentation Fault during decoding

2011-03-25 Thread Kenneth Heafield
I haven't tested kenlm on Cygwin, but it could work. Can you run tests? 1) Install Boost. Cygwin's package manager should provide it. 2) Run kenlm tests. wget http://kheafield.com/code/kenlm.tar.gz tar xzf kenlm.tar.gz cd kenlm ./test.sh On 03/25/11 06:44, Sudip Datta wrote: > I've used gcc i

Re: [Moses-support] Moses dies with "segmentation fault" on first sentence (IRSTLM)

2011-03-30 Thread Kenneth Heafield
I've had this happen too when running benchmarks. The latest IRSTLM is actually 5.60.01: http://hlt.fbk.eu/en/irstlm and appears to resolve your issue. The sourceforge page is out of date. #include On 03/30/11 10:10, Arda Tezcan wrote: > Hi Everyone, > After working with SRILM for a while, I j

Re: [Moses-support] KenLM build_binary exception

2011-04-15 Thread Kenneth Heafield
+0100, Nicola Bertoldi >> wrote: >>> Indeed this should not happen >>> >>> Tom, could you please upload the following data in our ftp area? >>> >>> - textual training data (if possible) >>> - LM in iARPA format >>> - LM in binary f

Re: [Moses-support] Versions of Moses

2011-04-30 Thread Kenneth Heafield
Barry is correct. Also kenlm doesn't care what the third field is. I just read it from the ARPA file. Using a model with lower order that it was trained for is incorrect under most smoothing methods. On 04/30/11 16:39, Barry Haddow wrote: > Hi Alexandre > > The format of the language model spe

Re: [Moses-support] Moses Arabic tokenizer

2011-05-14 Thread Kenneth Heafield
Hi, There's http://statmt.org/wmt09/scripts.tgz but these are only for select European languages. The post you refer to suggests MADA+TOKAN for Arabic: http://www1.ccls.columbia.edu/~cadim/MADA.html . Kenneth On 05/14/11 11:03, ahmed sabry rizk wrote: > Hi, > I am trying to toke

Re: [Moses-support] KenLM build_binary exception

2011-05-17 Thread Kenneth Heafield
> Tom > > > >> -Original Message- >> *From*: Kenneth Heafield > <mailto:kenneth%20heafield%20%3cmo...@kheafield.com%3e>> >> *To*: moses-support@mit.edu <mailto:moses-support@mit.edu> >> *Subject*: Re: [Moses-support] KenLM build_binary exception

Re: [Moses-support] configure fails to recognize option --with-boost-thread

2011-05-19 Thread Kenneth Heafield
Hmmm. . . looks like it's crashing on lm/lm_exception.cc and lm/config.cc which are mine. But the compiler should throw you an error instead of taking infinite memory. See if I can reproduce. On 05/19/11 22:58, supp...@precisiontranslationtools.com wrote: > I'm updating to the newest moses trun

Re: [Moses-support] configure fails to recognize option --with-boost-thread

2011-05-19 Thread Kenneth Heafield
ebuild. > > @Others: regarding configure's WARNING: unrecognized options: > --with-boost-thread, is this option still required? > > Tom > > > > On Thu, 19 May 2011 23:05:53 -0400, Kenneth Heafield > wrote: >> Hmmm. . . looks like it's crashing on lm

Re: [Moses-support] configure fails to recognize option --with-boost-thread

2011-05-19 Thread Kenneth Heafield
e because ltmain.sh won't be in the repository. On 05/19/11 23:47, Tom Hoar wrote: > I'm glad you can replicate the problem. Easier to fix that way. > > On Thu, 19 May 2011 23:42:55 -0400, Kenneth Heafield > wrote: >> Apparently this is a libtool issue, not one wit

Re: [Moses-support] Query moses

2011-05-23 Thread Kenneth Heafield
head config.log Not aware of e.g. a runtime query. On 05/23/11 12:48, Barry Haddow wrote: > On Monday 23 May 2011 17:39, Tom Hoar wrote: >> Is there a way to query the moses binary to report what configure >> options were used? i.e. such as which --with-[xxxlm]= > > > No. > > Do you want to kn

Re: [Moses-support] KenLM support for high order values

2011-05-23 Thread Kenneth Heafield
Edit kenlm/lm/max_order.hh and recompile. The reason is to minimize the size of the State object held by each hypothesis while avoiding dynamic memory allocation. On 05/23/11 15:39, Tom Hoar wrote: > I use KenLM's build_binary for language models. There are no problems > order values up to 6 gram

Re: [Moses-support] KenLM support for high order values

2011-05-24 Thread Kenneth Heafield
irements/allocation for the State object? I.e. if I > compile with kMaxOrder = 12, and use Kenlm for a model with order = 6, > is more memory required/allocated and if so, how much? Or, does the > additional allocation only occur when the model has a higher order? > > Tom > >

Re: [Moses-support] Running application using Moses

2011-06-03 Thread Kenneth Heafield
Moses outputs translations to stdout and advisory messages to stderr. This is the correct behavior. I think you're referring to Java's rudimentary process IO handling. http://stackoverflow.com/questions/60302/starting-a-process-with-inherited-stdin-stdout-stderr-in-java-6 On 06/03/11 05:50, nakul

Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Kenneth Heafield
Try using MGIZA: http://geek.kyloo.net/software/doku.php/mgiza:overview On 06/15/11 04:51, Prasanth K wrote: > Hello All, > > I am conducting a series of experiments to build translation systems > using Moses in which the corpus of the current experiment is a subset of > the corpora used in the p

[Moses-support] KenLM citation

2011-06-23 Thread Kenneth Heafield
Hi, KenLM was accepted to WMT 2011 as a research paper :-). That means my July 1 camera-ready deadline is the same as many WMT participants, some of whom have asked me how to cite. To resolve this race condition, here's a BibTeX: @InProceedings{kenlm, author = {Kenneth Hea

Re: [Moses-support] Segmentation fault

2011-06-26 Thread Kenneth Heafield
I don't change the binary file format without updating the version number so old versions won't load. The recent versions shouldn't impact that. Sounds like a case for gdb. On 06/26/11 08:22, Hieu Hoang wrote: > i believe there's been changes to the binary phrase table (to the > support word ali

[Moses-support] Quantization

2011-06-26 Thread Kenneth Heafield
kenlm now supports quantization. To use it, svn up then run build_binary with -q: kenlm/build_binary -q 8 trie foo.arpa foo.out for 8 bits. You can choose from 2 to 25 bits, inclusive. Currently, probability and backoff are quantized separately (in this case using 8 bits each). By default, -q

[Moses-support] Don't use [4037,4040) with trie

2011-06-27 Thread Kenneth Heafield
Folks, Don't use revisions [4037,4040) with the trie model. I accidentally changed the file format and you'll get segfaults on existing binary files. Also, the binary files it builds are corrupt. This doesn't sole Tom Hoar's problem because his segfault came before revision 4037. Kenne

Re: [Moses-support] Don't use [4037,4040) with trie

2011-06-27 Thread Kenneth Heafield
ns to the file format compatible with rev's > 4036 and before? > > Tom > > > > On Mon, 27 Jun 2011 17:29:58 -0400, Kenneth Heafield > wrote: >> Folks, >> >> Don't use revisions [4037,4040) with the trie model. I accidentally >> chan

Re: [Moses-support] multithreaded moses, memory and on-disk issues

2011-06-29 Thread Kenneth Heafield
Since we're playing optimize Moses memory usage, what's your language model? On 06/29/11 14:32, Dennis Mehay wrote: > Hi Phil, > > Thanks for the tips. I already tried reducing the max span for the > re-ordering grammar (to 35, which is ~5 words more than the average span > of the training sente

Re: [Moses-support] Wrap XML scripts

2011-07-12 Thread Kenneth Heafield
http://kheafield.com/code/scoring.tar.gz On 07/12/11 11:56, Lane Schwartz wrote: > Does anyone have a good script for taking plain-text versions of > source, reference, and hypothesis files and wrapping them in XML for > use by metric tools like TERp and the NIST scripts that require XML? > > I'm

Re: [Moses-support] Using Moses language models

2011-07-13 Thread Kenneth Heafield
ough the Moses abstraction layers to >>> retrieve a pointer to a lm::Model from kenlm, but the >>> Moses::LanguageModelKen header is not part of the public headers of >>> Moses ; that's why I tried to use only Moses interface. >>> >>> (I did I did not m

Re: [Moses-support] Using Moses language models

2011-07-13 Thread Kenneth Heafield
> running the decoder, I wanted to use the already loaded LM. > > > > > > I first tried to dig my way through the Moses abstraction layers to > > > retrieve a pointer to a lm::Model from kenlm, but the > > > Moses::LanguageModelKen header is not

Re: [Moses-support] Using Moses language models

2011-07-13 Thread Kenneth Heafield
On 07/13/11 15:53, Philipp Koehn wrote: > Hi, > > But you're asking for a third piece of information. If you query for > "foo bar baz" and I can tell you that it will never extend to "* foo bar > baz" for any word * (due to pruning or filtering), then you need only > remember "foo

[Moses-support] More LM compression

2011-07-13 Thread Kenneth Heafield
Hi Moses, If trie uses too much memory, svn up to revision >= 4074 then pass "-a #bits" to build_binary. It will minimize memory usage subject to the maximum number of bits you specify (so e.g. pass bits 40 to minimize memory usage). Compressing in this manner is lossless, but takes addi

Re: [Moses-support] A guide to running Moses on Windows 7

2011-07-16 Thread Kenneth Heafield
I use the following: errno, sterror_r, open, close, mmap, munmap, ftruncate, fstat (for file size), lseek, read, and write Apparently the Windows equivalent to mmap is CreateFileMapping. If there's a windows user out there who wants native calls and is willing to help #ifdef, contact me. I pro

Re: [Moses-support] A guide to running Moses on Windows 7

2011-07-16 Thread Kenneth Heafield
> If any of the IRSTLM/KenLM/$foo-LM –using folks on here have > instructions or experience with compiling their particular tool under > Cygwin, lemme know, and I’ll either include it or point to it. I > guarantee dozens of extra downloads! Sure, here's how you compile KenLM and link it into Moses

Re: [Moses-support] Using Moses language models

2011-07-21 Thread Kenneth Heafield
ications in every other wrapper ?" >> >> How do you, Moses developers, feel about this ? >> Is it acceptable / outrageously stupid if I set the value to -1 in the other >> wrappers, >> maybe with a TODO, and properly document it in the super class ? >&

Re: [Moses-support] Using Moses language models

2011-07-22 Thread Kenneth Heafield
tenance cost for us (me and the peop... Well, you know). > > ----- Mail original - >> De: "Hieu Hoang" >> À: "Kenneth Heafield" >> Cc: moses-support@mit.edu >> Envoyé: Vendredi 22 Juillet 2011 04:50:14 >> Objet: Re: [Moses-support] Using Moses langu

Re: [Moses-support] Using KenLM instead of IRSTLM in existing moses.ini error

2011-07-27 Thread Kenneth Heafield
Hi, Which ASCII character sequence represents newline in your file? Try converting to UNIX newlines. Also can you send me the output of zcat /home/moses/languagemodels/model.es.lm.gz |head -n 10 |gzip >send.gz (I'm asking you to rezip so that your mail client doesn't convert the enter

Re: [Moses-support] Using KenLM instead of IRSTLM in existing moses.ini error

2011-07-28 Thread Kenneth Heafield
FYI we resolved the problem off-list. KenLM does not load IRST's iARPA format. You must first run IRST"s compile-lm to generate an ARPA. I might add an error message specific to this case. On 07/27/11 09:27, Lee Ball (Applied Language) wrote: > Hi guys, > > I just tried using KenLM out of inte

Re: [Moses-support] KenLM build-binary trie problems (SRILM ARPA file)

2011-07-29 Thread Kenneth Heafield
Hi, There was a problem with this; thought it was fixed but maybe it came back. Which revision are you running? Does it still happen if you run single-threaded? Kenneth On 07/29/11 09:39, Alex Fraser wrote: > Hi Folks, > > Tom Hoar previously mentioned that he had a problem with KenLM

Re: [Moses-support] KenLM build-binary trie problems (SRILM ARPA file)

2011-08-04 Thread Kenneth Heafield
Sorry I am slow to respond. This is my first thing to look at, but I am traveling a lot through the 14th. Alex Fraser wrote: Hi Kenneth -- Latest revision, 4096. Single threaded also crashes. Cheers, Alex On Fri, Jul 29, 2011 at 6:00 PM, Kenneth Heafield wrote: > Hi, > >

Re: [Moses-support] KenLM build-binary trie problems (SRILM ARPA file)

2011-08-15 Thread Kenneth Heafield
rebuilding with build_binary that ships with Moses. - Ran threaded and not threaded. Can you send me your very small SRILM model? Does it have ? Kenneth On 08/04/11 11:42, Kenneth Heafield wrote: > Sorry I am slow to respond. This is my first thing to look at, but I > am traveling a lot

Re: [Moses-support] KenLM build-binary trie problems (SRILM ARPA file)

2011-08-15 Thread Kenneth Heafield
Ok I have reproduced the problem. It only happens when the ARPA file is missing and is probably an off-by-one on vocabulary size. I'll have a fix soon. Kenneth On 08/15/11 19:20, Kenneth Heafield wrote: > Hi, > > Back from vacation and sorry but I'm having trouble

Re: [Moses-support] KenLM build-binary trie problems (SRILM ARPA file)

2011-08-16 Thread Kenneth Heafield
sed on the counts given in the ARPA file. When is missing from the ARPA file, I now pad the vocabulary to the size it expects for the corrected count. Sorry it took so long! Kenneth On 08/15/11 22:12, Kenneth Heafield wrote: > Ok I have reproduced the problem. It only happens when the A

Re: [Moses-support] LM error in moses

2011-08-18 Thread Kenneth Heafield
Do you have in your input or phrase table target side? On 08/18/11 15:04, Sriram venkatapathy wrote: > > Hello, > > For a particular translation experiment, I get the following error in > Moses decoder, and then the decoder aborts. > > moses: LanguageModel.cpp:115: void > Moses::LanguageModel::C

Re: [Moses-support] Using Moses language models

2011-08-24 Thread Kenneth Heafield
be finally merged into the trunk ? > (not the useless changes to PhraseDictionaryTree) > > Thanks, (And sorry for my low reactivity, I hope you remember me!) > > Marc > > - Mail original - >> De: "Hieu Hoang" >> À: "Marc LEGENDRE" &g

Re: [Moses-support] Using Moses language models

2011-08-24 Thread Kenneth Heafield
Valgrind ; but hey, don't we all strive for perfection > ? :-) > > I don't need this, I guess I should have removed it from my branch if I > wanted to merge. > It's done. > > - Mail original - >> De: "Kenneth Heafield" >> À:

Re: [Moses-support] Using Moses language models

2011-08-24 Thread Kenneth Heafield
You're in trunk as of 4160. On 08/24/11 11:33, Marc LEGENDRE wrote: > Absolutely no problem about the name thing, thank you for asking. > > Marc > > - Mail original ----- >> De: "Kenneth Heafield" >> À: moses-support@mit.edu >> Envoyé: Mer

Re: [Moses-support] R: Re: Minimum requirements, punctuation and other general questions

2011-08-26 Thread Kenneth Heafield
Or just run kenlm/build_binary lm.arpa and it will spit out a memory usage estimate (covering the LM only). On 08/26/11 09:24, Hieu Hoang wrote: > barry's right. > > Binarize the phrase table and the LM with irstlm or kenlm. Then just > look at the file sizes & add a few 100mb and that's your me

Re: [Moses-support] build 5 gram with SRILM and moses

2011-09-07 Thread Kenneth Heafield
Hi, Edit your moses.ini and find [lmodel-file]. Change the first number to 8. [lmodel-file] 8 0 5 /path/to/model.arpa Or you can try to link against SRI, use more memory, and take longer. . . Kenneth On 09/06/11 19:43, Cyrine NASRI wrote: > Hi , thank you for your reply > I buit a 5gr

Re: [Moses-support] running multi-threaded moses_chart in EMS

2011-09-14 Thread Kenneth Heafield
So what exactly is the issue? Progress can be monitored with stdout. If stderr is queued, then you won't get sub-sentential progress anyway. I'd rather stderr tell me what it's doing so if/when there's a segfault, I have a place to start. Kenneth On 09/14/11 13:32, Phil Williams wrote: > Yes

Re: [Moses-support] Compact in-memory phrase-table representation

2011-09-20 Thread Kenneth Heafield
I took at look at the existing FactorCollection code and it made me cry, so I rewrote it for revision 4242 including a better locking strategy. On 09/20/11 12:10, Marcin Junczys-Dowmunt wrote: > Hi Barry, > very high lock contention. Deadlock is the wrong word. With 48 threads > 'top' shows me ro

[Moses-support] Left language model state in 4247

2011-09-21 Thread Kenneth Heafield
Dear Moses, Trunk revision 4247 incorporates KenLM changes from MT Marathon (team: Hieu Hoang, Tetsuo Kiso, Marcello Federico, and myself) to minimize left language model state for chart decoding. This resulted in a binary file format change. Previously, if you used e.g. a 5-gram langua

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Kenneth Heafield
My fault. Sorry. Fixed. On 09/22/11 09:41, Hieu Hoang wrote: > hiya > > There's currently a compile error in trunk when multi-threading is > enabled. However, I think the root cause of the problem is that > there's currently too many compile flags so developers can't test the > different combin

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Kenneth Heafield
-threads 1 ? On 09/22/11 10:06, Tom Hoar wrote: > > Re: the survey. I suggest if multi-threading is always enabled, there > should be a command-line option that allows users to disable > multi-threading for debugging. > > Tom > > > > On Thu, 22 Sep 2011 09

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Kenneth Heafield
ally want something like > > --threads 0 > > which should bypass everything and truly run in single threaded mode > > Miles > > On 22 September 2011 10:26, Kenneth Heafield wrote: >> -threads 1 ? >> >> On 09/22/11 10:06, Tom Hoar wrote: >> >> Re: th

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Kenneth Heafield
But I don't see a use case for it. I can run gdb just fine on a multithreaded program that happens to be running one thread. And the stderr output will be in order. On 09/22/11 11:21, Miles Osborne wrote: > should someone want to debug with no threading, then there would need > to be a mess of

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Kenneth Heafield
gt;>>> Hi >>>> >>>> Here's my thoughts: >>>> >>>> - there should be single and multi-thread compile paths so single-thread >>>> users don't pay the lock penalty. Maybe a -threads 0 works, but then you >>>> have to check

  1   2   3   4   5   6   7   >