Re: [Moses-support] Estimating probabilities with KenLM

Hieu Hoang Mon, 25 Nov 2013 12:23:29 -0800

Prasanth - what is the exact lmplz command that was ran by the EMS?


This works
     .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa 
lm/europarl.lmplz -T /tmp -S 1G
This doesn't
    .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa 
lm/europarl.lmplz -T /tmp -S 80%
it give the error
   util/usage.cc:220 in uint64_t util::<anonymous 
namespace>::ParseNum(const std::string &) [Num = double] threw 
SizeParseError because `!mem'.
Failed to parse 80% into a memory size because % was specified but the 
physical memory size could not be determined.

However, it worked even with the source code from 4 days ago.


On 25/11/2013 19:07, Kenneth Heafield wrote:
> Hi,
>
>       I've taken a shot in the dark based on physmem.c to support physical
> memory estimation on BSD and OS X.  Please clone
>
> github.com/kpu/kenlm
>
> and compile with
>
> ./bjam
>
> If that fails, please let Hieu and I know (maybe Hieu can help since he
> has OS X).  If it doesn't fail, run
>
> bin/lmplz
>
> with no argument.  The help message will include a line e.g.
>
> "This machine has 135224176640 bytes of memory."
>
> or
>
> "Unable to determine the amount of memory on this machine."
>
> If it works, then I'll push to Moses.  Trying to not break Moses master
> for OS X.
>
> Kenneth
>
> On 11/24/13 22:40, Prasanth K wrote:
>> Hi Kenneth,
>>
>> Thanks for the clarification w.r.t. calculating the memory size. But I
>> am running these on a Mac (10.9 Mavericks). Do you think I should still
>> port the lmplz code to Mac for the estimation of probabilities?
>>
>> One thing though, I did change the default clang compiler that comes
>> with this new Mac to a gcc-4.8 (not sure that changes anything in this
>> context).
>>
>> - Prasanth
>>
>>
>>
>>
>> On Fri, Nov 22, 2013 at 6:50 PM, Kenneth Heafield <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>      Hi,
>>
>>              What OS are you on?  Cygwin?  Apparently every OS reports
>>      memory size
>>      in a different way:
>>
>>      
>> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/physmem.c;h=2629936146e3042f927523322f18aca76996cd7f;hb=HEAD
>>
>>      The good news is that the above code is LGPLv2:
>>
>>      
>> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=modules/physmem;h=9644522e0493a85a9fb4ae7c4449741c2c1500ea;hb=HEAD
>>
>>      But currently I'm just using this short function that will fail on some
>>      platforms:
>>
>>      uint64_t GuessPhysicalMemory() {
>>      #if defined(_WIN32) || defined(_WIN64)
>>        return 0;
>>      #elif defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
>>        long pages = sysconf(_SC_PHYS_PAGES);
>>        if (pages == -1) return 0;
>>        long page_size = sysconf(_SC_PAGESIZE);
>>        if (page_size == -1) return 0;
>>        return static_cast<uint64_t>(pages) *
>>      static_cast<uint64_t>(page_size);
>>      #else
>>        return 0;
>>      #endif
>>      }
>>
>>      If it fails, I just don't let users specify memory as a percentage.  So
>>      one thing thing to fix is putting physmem.{h,c} in util then changing
>>      calls to GuessPhysicalMemory.  But I'm also not a fan of the way the GNU
>>      code gives up and makes up a number at the end.
>>
>>      The second porting issue is that lmplz makes parallel use of pread,
>>      pwrite, and write.  Windows is unsafe in this regard (POSIX requires
>>      that pread/pwrite not change the file pointer; Windows has no way to
>>      implement that atomically).  To fix this, we'll always specify the file
>>      offset in cases that happen concurrently.  Extend util/stream/io.* with
>>      a PWrite class based on PWriteOrThrow then change FileBuffer to use
>>      PWrite.  Then I guess one should rename PReadOrThrow/PWriteOrThrow to
>>      something that indicates they're not-quite-POSIX on windows.  Also, the
>>      macros in these functions should detect cygwin, bypassing cygwin's
>>      "Function not implemented" and calling Windows APIs directly (they're
>>      already there for _WIN32).
>>
>>      I don't have a windows box so I can say what should be changed at a high
>>      level, but need an actual user to ensure it compiles and runs correctly.
>>
>>      Kenneth
>>
>>      On 11/22/13 06:49, Prasanth K wrote:
>>      > Hi,
>>      >
>>      > I am trying to use KenLM for building a language model on the Europarl
>>      > corpus. Following the instructions in
>>      >
>>      
>> (http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc19),
>>      > I added the few lines for getting KenLM to estimate the LM
>>      probabilities
>>      > (order/n=5) to my config file to the EMS. The language model dies down
>>      > during training saying that the "Function not implemented" at counting
>>      > and sorting n-grams stage (the first stage itself). Does this mean
>>      there
>>      > is something wrong with my installation? Or is just insufficient
>>      memory?
>>      >
>>      > Incidentally, when I started giving the amount of memory in terms of %
>>      > (80%) there was an error "Failed to parse .. into memory size because
>>      > physical memory size could not be determined". I am also curious why
>>      > this happens?
>>      >
>>      > Kenneth, can you shed some light on this? Thanks.
>>      >
>>      > - Regards,
>>      > Prasanth
>>      >
>>      >
>>      >
>>      > --
>>      > "Theories have four stages of acceptance. i) this is worthless
>>      nonsense;
>>      > ii) this is an interesting, but perverse, point of view, iii) this is
>>      > true, but quite unimportant; iv) I always said so."
>>      >
>>      >   --- J.B.S. Haldane
>>      >
>>      >
>>      > _______________________________________________
>>      > Moses-support mailing list
>>      > [email protected] <mailto:[email protected]>
>>      > http://mailman.mit.edu/mailman/listinfo/moses-support
>>      >
>>      _______________________________________________
>>      Moses-support mailing list
>>      [email protected] <mailto:[email protected]>
>>      http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> -- 
>> "Theories have four stages of acceptance. i) this is worthless nonsense;
>> ii) this is an interesting, but perverse, point of view, iii) this is
>> true, but quite unimportant; iv) I always said so."
>>
>>    --- J.B.S. Haldane
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Estimating probabilities with KenLM

Reply via email to