runs ok for me. Try git pull on moses if your code is a few months old. There might be some error incompatibility between the lmplz wrapper script and lmplz.
My command: # trainlm-lmplz.perl -order 5 -lmplz ~/workspace/github/mosesdecoder/bin/lmplz -T . -S 1G -text lm/europarl.lowercased.1 -lm lm/europarl.lmplz ... Chain sizes: 1:171084 2:2345088 3:6649660 4:10482672 5:13132616 === 5/5 Writing ARPA model === RSSMax:219410432 kB user:2.62476 sys:2.46472 CPU:5.08947 real:0 On 26 November 2013 14:57, Prasanth K <[email protected]> wrote: > Ok. I have managed to re-create this error (no reason why it shouldn't > come back, I knew exactly what I told moses to do). So, the exact command > run to create the language model from the logs is as follows: > > scripts/generic/trainlm-lmplz.perl -lmplz bin/lmplz -order 5 -T > europarl.en-sv/phrase-based-dup/tmp > -S 10G -text europarl.en-sv/phrase-based-dup/lm/europarl.lowercased.1 -lm > europarl.en-sv/phrase-based-dup/lm/europarl.lm.1 > > Of course, all paths in the above command given were absolute paths (I > just removed them for readability.) When this is run, my log file from EMS > gives the following in LM_europarl_train.id.STDERR > > EXECUTING bin/lmplz --order 5 -T europarl.en-sv/phrase-based-dup/tmp -S > 10G < europarl.en-sv/phrase-based-dup/lm/europarl.lowercased.1 > > europarl.en-sv/phrase-based-dup/lm/europarl.lm.1 > > === 1/5 Counting and sorting n-grams === > > Reading stdin > > > ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 > > > **************************************************************************************************** > > Function not implemented > > This does not get the language model step to crash, instead creates an > empty language model (0 lines). The below is the log file for > LM_europarl_binarize.id.STDERR > > Reading europarl.en-sv/phrase-based-dup/lm/europarl.lm.1 > > > ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 > > End of file Byte: 0 File: europarl.en-sv/phrase-based-dup/lm/europarl.lm.1 > > ERROR > > Clearly, something is wrong with my installation of kenlm (the decoding > with kenlm works just fine ..I have confirmed that now), which makes the > estimation go funny. The question is where I start to fix this? > > Thanks. > > - Regards, > > Prasanth > > > On Tue, Nov 26, 2013 at 1:56 PM, Hieu Hoang <[email protected]> wrote: > >> ok, i can't reproduce your error >> FUnction not implemented >> you should find out exactly how lmplz is being run, it may be that you >> have a slightly older version and doesn't know all the arguments you've >> given it. >> >> >> On 26/11/2013 06:47, Prasanth K wrote: >> >> Hello Hieu, >> >> My first attempt was to specify the absolute amount of memory (10G) but >> that gave an error saying function not implemented. Later, when I tried >> specifying the relative size (80%), I got a similar parse error to what you >> have given above. Strange that it should >> >> @Kenneth, thanks for the code to estimate physical memory. I am going >> to give it a shot and let you know how it goes. >> >> - Regards, >> Prasanth >> >> >> On Mon, Nov 25, 2013 at 9:20 PM, Hieu Hoang <[email protected]> wrote: >> >>> Prasanth - what is the exact lmplz command that was ran by the EMS? >>> >>> >>> This works >>> .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa >>> lm/europarl.lmplz -T /tmp -S 1G >>> This doesn't >>> .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa >>> lm/europarl.lmplz -T /tmp -S 80% >>> it give the error >>> util/usage.cc:220 in uint64_t util::<anonymous >>> namespace>::ParseNum(const std::string &) [Num = double] threw >>> SizeParseError because `!mem'. >>> Failed to parse 80% into a memory size because % was specified but the >>> physical memory size could not be determined. >>> >>> However, it worked even with the source code from 4 days ago. >>> >>> >>> On 25/11/2013 19:07, Kenneth Heafield wrote: >>> > Hi, >>> > >>> > I've taken a shot in the dark based on physmem.c to support >>> physical >>> > memory estimation on BSD and OS X. Please clone >>> > >>> > github.com/kpu/kenlm >>> > >>> > and compile with >>> > >>> > ./bjam >>> > >>> > If that fails, please let Hieu and I know (maybe Hieu can help since he >>> > has OS X). If it doesn't fail, run >>> > >>> > bin/lmplz >>> > >>> > with no argument. The help message will include a line e.g. >>> > >>> > "This machine has 135224176640 bytes of memory." >>> > >>> > or >>> > >>> > "Unable to determine the amount of memory on this machine." >>> > >>> > If it works, then I'll push to Moses. Trying to not break Moses master >>> > for OS X. >>> > >>> > Kenneth >>> > >>> > On 11/24/13 22:40, Prasanth K wrote: >>> >> Hi Kenneth, >>> >> >>> >> Thanks for the clarification w.r.t. calculating the memory size. But I >>> >> am running these on a Mac (10.9 Mavericks). Do you think I should >>> still >>> >> port the lmplz code to Mac for the estimation of probabilities? >>> >> >>> >> One thing though, I did change the default clang compiler that comes >>> >> with this new Mac to a gcc-4.8 (not sure that changes anything in this >>> >> context). >>> >> >>> >> - Prasanth >>> >> >>> >> >>> >> >>> >> >>> >> On Fri, Nov 22, 2013 at 6:50 PM, Kenneth Heafield < >>> [email protected] >>> >> <mailto:[email protected]>> wrote: >>> >> >>> >> Hi, >>> >> >>> >> What OS are you on? Cygwin? Apparently every OS reports >>> >> memory size >>> >> in a different way: >>> >> >>> >> >>> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/physmem.c;h=2629936146e3042f927523322f18aca76996cd7f;hb=HEAD >>> >> >>> >> The good news is that the above code is LGPLv2: >>> >> >>> >> >>> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=modules/physmem;h=9644522e0493a85a9fb4ae7c4449741c2c1500ea;hb=HEAD >>> >> >>> >> But currently I'm just using this short function that will fail >>> on some >>> >> platforms: >>> >> >>> >> uint64_t GuessPhysicalMemory() { >>> >> #if defined(_WIN32) || defined(_WIN64) >>> >> return 0; >>> >> #elif defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE) >>> >> long pages = sysconf(_SC_PHYS_PAGES); >>> >> if (pages == -1) return 0; >>> >> long page_size = sysconf(_SC_PAGESIZE); >>> >> if (page_size == -1) return 0; >>> >> return static_cast<uint64_t>(pages) * >>> >> static_cast<uint64_t>(page_size); >>> >> #else >>> >> return 0; >>> >> #endif >>> >> } >>> >> >>> >> If it fails, I just don't let users specify memory as a >>> percentage. So >>> >> one thing thing to fix is putting physmem.{h,c} in util then >>> changing >>> >> calls to GuessPhysicalMemory. But I'm also not a fan of the way >>> the GNU >>> >> code gives up and makes up a number at the end. >>> >> >>> >> The second porting issue is that lmplz makes parallel use of >>> pread, >>> >> pwrite, and write. Windows is unsafe in this regard (POSIX >>> requires >>> >> that pread/pwrite not change the file pointer; Windows has no >>> way to >>> >> implement that atomically). To fix this, we'll always specify >>> the file >>> >> offset in cases that happen concurrently. Extend >>> util/stream/io.* with >>> >> a PWrite class based on PWriteOrThrow then change FileBuffer to >>> use >>> >> PWrite. Then I guess one should rename >>> PReadOrThrow/PWriteOrThrow to >>> >> something that indicates they're not-quite-POSIX on windows. >>> Also, the >>> >> macros in these functions should detect cygwin, bypassing >>> cygwin's >>> >> "Function not implemented" and calling Windows APIs directly >>> (they're >>> >> already there for _WIN32). >>> >> >>> >> I don't have a windows box so I can say what should be changed >>> at a high >>> >> level, but need an actual user to ensure it compiles and runs >>> correctly. >>> >> >>> >> Kenneth >>> >> >>> >> On 11/22/13 06:49, Prasanth K wrote: >>> >> > Hi, >>> >> > >>> >> > I am trying to use KenLM for building a language model on the >>> Europarl >>> >> > corpus. Following the instructions in >>> >> > >>> >> ( >>> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc19 >>> ), >>> >> > I added the few lines for getting KenLM to estimate the LM >>> >> probabilities >>> >> > (order/n=5) to my config file to the EMS. The language model >>> dies down >>> >> > during training saying that the "Function not implemented" at >>> counting >>> >> > and sorting n-grams stage (the first stage itself). Does this >>> mean >>> >> there >>> >> > is something wrong with my installation? Or is just >>> insufficient >>> >> memory? >>> >> > >>> >> > Incidentally, when I started giving the amount of memory in >>> terms of % >>> >> > (80%) there was an error "Failed to parse .. into memory size >>> because >>> >> > physical memory size could not be determined". I am also >>> curious why >>> >> > this happens? >>> >> > >>> >> > Kenneth, can you shed some light on this? Thanks. >>> >> > >>> >> > - Regards, >>> >> > Prasanth >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > "Theories have four stages of acceptance. i) this is worthless >>> >> nonsense; >>> >> > ii) this is an interesting, but perverse, point of view, iii) >>> this is >>> >> > true, but quite unimportant; iv) I always said so." >>> >> > >>> >> > --- J.B.S. Haldane >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Moses-support mailing list >>> >> > [email protected] <mailto:[email protected]> >>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> > >>> >> _______________________________________________ >>> >> Moses-support mailing list >>> >> [email protected] <mailto:[email protected]> >>> >> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> "Theories have four stages of acceptance. i) this is worthless >>> nonsense; >>> >> ii) this is an interesting, but perverse, point of view, iii) this is >>> >> true, but quite unimportant; iv) I always said so." >>> >> >>> >> --- J.B.S. Haldane >>> > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> >> >> -- >> "Theories have four stages of acceptance. i) this is worthless nonsense; >> ii) this is an interesting, but perverse, point of view, iii) this is true, >> but quite unimportant; iv) I always said so." >> >> --- J.B.S. Haldane >> >> >> > > > -- > "Theories have four stages of acceptance. i) this is worthless nonsense; > ii) this is an interesting, but perverse, point of view, iii) this is true, > but quite unimportant; iv) I always said so." > > --- J.B.S. Haldane > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
