Re: [Moses-support] Estimating probabilities with KenLM

Prasanth K Mon, 25 Nov 2013 22:51:48 -0800

Hello Hieu,

My first attempt was to specify the absolute amount of memory (10G) but
that gave an error saying function not implemented. Later, when I tried
specifying the relative size (80%), I got a similar parse error to what you
have given above. Strange that it should


@Kenneth, thanks for the code to estimate physical memory. I am going to
give it a shot and let you know how it goes.

- Regards,
Prasanth


On Mon, Nov 25, 2013 at 9:20 PM, Hieu Hoang <[email protected]> wrote:

> Prasanth - what is the exact lmplz command that was ran by the EMS?
>
>
> This works
>      .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa
> lm/europarl.lmplz -T /tmp -S 1G
> This doesn't
>     .../lmplz --order 5 --text lm/europarl.lowercased.1 --arpa
> lm/europarl.lmplz -T /tmp -S 80%
> it give the error
>    util/usage.cc:220 in uint64_t util::<anonymous
> namespace>::ParseNum(const std::string &) [Num = double] threw
> SizeParseError because `!mem'.
> Failed to parse 80% into a memory size because % was specified but the
> physical memory size could not be determined.
>
> However, it worked even with the source code from 4 days ago.
>
>
> On 25/11/2013 19:07, Kenneth Heafield wrote:
> > Hi,
> >
> >       I've taken a shot in the dark based on physmem.c to support
> physical
> > memory estimation on BSD and OS X.  Please clone
> >
> > github.com/kpu/kenlm
> >
> > and compile with
> >
> > ./bjam
> >
> > If that fails, please let Hieu and I know (maybe Hieu can help since he
> > has OS X).  If it doesn't fail, run
> >
> > bin/lmplz
> >
> > with no argument.  The help message will include a line e.g.
> >
> > "This machine has 135224176640 bytes of memory."
> >
> > or
> >
> > "Unable to determine the amount of memory on this machine."
> >
> > If it works, then I'll push to Moses.  Trying to not break Moses master
> > for OS X.
> >
> > Kenneth
> >
> > On 11/24/13 22:40, Prasanth K wrote:
> >> Hi Kenneth,
> >>
> >> Thanks for the clarification w.r.t. calculating the memory size. But I
> >> am running these on a Mac (10.9 Mavericks). Do you think I should still
> >> port the lmplz code to Mac for the estimation of probabilities?
> >>
> >> One thing though, I did change the default clang compiler that comes
> >> with this new Mac to a gcc-4.8 (not sure that changes anything in this
> >> context).
> >>
> >> - Prasanth
> >>
> >>
> >>
> >>
> >> On Fri, Nov 22, 2013 at 6:50 PM, Kenneth Heafield <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>      Hi,
> >>
> >>              What OS are you on?  Cygwin?  Apparently every OS reports
> >>      memory size
> >>      in a different way:
> >>
> >>
> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/physmem.c;h=2629936146e3042f927523322f18aca76996cd7f;hb=HEAD
> >>
> >>      The good news is that the above code is LGPLv2:
> >>
> >>
> http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=modules/physmem;h=9644522e0493a85a9fb4ae7c4449741c2c1500ea;hb=HEAD
> >>
> >>      But currently I'm just using this short function that will fail on
> some
> >>      platforms:
> >>
> >>      uint64_t GuessPhysicalMemory() {
> >>      #if defined(_WIN32) || defined(_WIN64)
> >>        return 0;
> >>      #elif defined(_SC_PHYS_PAGES) && defined(_SC_PAGESIZE)
> >>        long pages = sysconf(_SC_PHYS_PAGES);
> >>        if (pages == -1) return 0;
> >>        long page_size = sysconf(_SC_PAGESIZE);
> >>        if (page_size == -1) return 0;
> >>        return static_cast<uint64_t>(pages) *
> >>      static_cast<uint64_t>(page_size);
> >>      #else
> >>        return 0;
> >>      #endif
> >>      }
> >>
> >>      If it fails, I just don't let users specify memory as a
> percentage.  So
> >>      one thing thing to fix is putting physmem.{h,c} in util then
> changing
> >>      calls to GuessPhysicalMemory.  But I'm also not a fan of the way
> the GNU
> >>      code gives up and makes up a number at the end.
> >>
> >>      The second porting issue is that lmplz makes parallel use of pread,
> >>      pwrite, and write.  Windows is unsafe in this regard (POSIX
> requires
> >>      that pread/pwrite not change the file pointer; Windows has no way
> to
> >>      implement that atomically).  To fix this, we'll always specify the
> file
> >>      offset in cases that happen concurrently.  Extend util/stream/io.*
> with
> >>      a PWrite class based on PWriteOrThrow then change FileBuffer to use
> >>      PWrite.  Then I guess one should rename PReadOrThrow/PWriteOrThrow
> to
> >>      something that indicates they're not-quite-POSIX on windows.
>  Also, the
> >>      macros in these functions should detect cygwin, bypassing cygwin's
> >>      "Function not implemented" and calling Windows APIs directly
> (they're
> >>      already there for _WIN32).
> >>
> >>      I don't have a windows box so I can say what should be changed at
> a high
> >>      level, but need an actual user to ensure it compiles and runs
> correctly.
> >>
> >>      Kenneth
> >>
> >>      On 11/22/13 06:49, Prasanth K wrote:
> >>      > Hi,
> >>      >
> >>      > I am trying to use KenLM for building a language model on the
> Europarl
> >>      > corpus. Following the instructions in
> >>      >
> >>      (
> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc19
> ),
> >>      > I added the few lines for getting KenLM to estimate the LM
> >>      probabilities
> >>      > (order/n=5) to my config file to the EMS. The language model
> dies down
> >>      > during training saying that the "Function not implemented" at
> counting
> >>      > and sorting n-grams stage (the first stage itself). Does this
> mean
> >>      there
> >>      > is something wrong with my installation? Or is just insufficient
> >>      memory?
> >>      >
> >>      > Incidentally, when I started giving the amount of memory in
> terms of %
> >>      > (80%) there was an error "Failed to parse .. into memory size
> because
> >>      > physical memory size could not be determined". I am also curious
> why
> >>      > this happens?
> >>      >
> >>      > Kenneth, can you shed some light on this? Thanks.
> >>      >
> >>      > - Regards,
> >>      > Prasanth
> >>      >
> >>      >
> >>      >
> >>      > --
> >>      > "Theories have four stages of acceptance. i) this is worthless
> >>      nonsense;
> >>      > ii) this is an interesting, but perverse, point of view, iii)
> this is
> >>      > true, but quite unimportant; iv) I always said so."
> >>      >
> >>      >   --- J.B.S. Haldane
> >>      >
> >>      >
> >>      > _______________________________________________
> >>      > Moses-support mailing list
> >>      > [email protected] <mailto:[email protected]>
> >>      > http://mailman.mit.edu/mailman/listinfo/moses-support
> >>      >
> >>      _______________________________________________
> >>      Moses-support mailing list
> >>      [email protected] <mailto:[email protected]>
> >>      http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >>
> >>
> >> --
> >> "Theories have four stages of acceptance. i) this is worthless nonsense;
> >> ii) this is an interesting, but perverse, point of view, iii) this is
> >> true, but quite unimportant; iv) I always said so."
> >>
> >>    --- J.B.S. Haldane
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
"Theories have four stages of acceptance. i) this is worthless nonsense;
ii) this is an interesting, but perverse, point of view, iii) this is true,
but quite unimportant; iv) I always said so."

  --- J.B.S. Haldane

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Estimating probabilities with KenLM

Reply via email to