Since we're playing optimize Moses memory usage, what's your language
model?

On 06/29/11 14:32, Dennis Mehay wrote:
> Hi Phil,
> 
> Thanks for the tips.  I already tried reducing the max span for the
> re-ordering grammar (to 35, which is ~5 words more than the average span
> of the training sentences that it was extracted from).  This seemed to
> help. I reduced the # of threads to 2, which also helped.
> 
> I haven't reduced the cube-pruning-pop-limit below 7000, but might try
> that if needed for experiments with larger grammars. 
> 
> As for the last option, does that just reduce some overhead that has to
> do with the data structures built to accommodate the worst-case scenario
> that someone uses 4 translation factors?  I assume, then, that it'll be
> a constant savings times some non-constant number of things (number of
> cells in the chart, number of words in the sentence, or similar), right?
> 
> --D.N.
> 
> On Wed, Jun 29, 2011 at 9:43 AM, Phil Williams <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Dennis,
> 
>     moses_chart maintains a lot of state information during rule lookup
>     and if large numbers of rules can be applied at each span then the
>     memory use can get pretty huge.  Your re-ordering grammar does sound
>     a likely candidate for triggering this.  We're looking at ways to
>     reduce memory use and improvements should trickle into SVN over the
>     next few weeks.  You could also try:
> 
>      * reducing the numbers of threads: each thread translates a
>     separate sentence so has its own rule lookup state (plus hypothesis
>     stacks, etc)
> 
>      * upgrading to revision 4050 -- this should give at least a small
>     improvement (I got a 4GB saving in a string-to-tree experiment that
>     originally used 37GB (approx 20GB static model storage + 17GB active
>     decoding state))
> 
>      * reducing the decoder's -max-chart-span limit for the non-glue
>     grammars
> 
>      * reducing the decoder's -cube-pruning-pop-limit
> 
>      * [unsightly hack] changing this line in moses/src/TypeDef.h:
> 
>             const size_t MAX_NUM_FACTORS = 4;
>         to:
>             const size_t MAX_NUM_FACTORS = 1;
> 
>         and recompiling.
> 
>     Phil
> 
> 
>     On 28 Jun, 2011,at 11:34 PM, Dennis Mehay <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>>     Hi all,
>>
>>     I am MERTing using multithreaded moses_chart on a machine with
>>     quad-core processors (4 threads).  What I find amazing is that
>>     this MERT run is consuming 20G of RAM (yes, 20G!).  The main rule
>>     table is binarized (and it only had 1,926,150 entries in it to
>>     begin with -- those are non-Continental commas, i.e., ~2
>>     million).  So I thought maybe it's the second rule table, which
>>     consists entirely of syntactic re-ordering rules (only ~12
>>     thousand entries). 
>>
>>     First, should multithreaded moses_chart be using so much memory? 
>>     I gave it ttable limits of 50 (for the main rule table), 25 (for
>>     the purely syntactic table) and 1000 (for the glue grammar --
>>     don't know why, just seemed like it wouldn't matter much). 
>>
>>     Second, I tried to binarize (convert to on-disk repr) the 12K
>>     entry table, but CreateOnDistPt isn't co-operating.
>>
>>     $ CreateOnDiskPt 1 1 2 25 1 reordering-table.gz
>>     binarized.reordering-table
>>     Starting : [0] seconds
>>     CreateOnDiskPt: PhraseNode.cpp:98: void
>>     OnDiskPt::PhraseNode::Save(OnDiskPt::OnDiskWrapper&, size_t,
>>     size_t): Assertion `!m_saved' failed.
>>     Aborted
>>
>>     What's going on here.  I told it that the index of p(e | f) is 1,
>>     because the first score is p(mother | e-children, f-children),
>>     which is as close to p(e | f) as we're going to get here.  There
>>     are 2 scores in the table, so it shouldn't be the third parameter
>>
>>     Here's a sample of what the (home brewed) reordering-table.gz file
>>     looks like:
>>
>>     -----------------------------------------------------------
>>     ...
>>     [X][N_num_] [X] ||| [X][N_num_] [(S\NP)\((S\NP)/N_num_)] |||
>>     0.666666666667 1.0 ||| 0-0 ||| 3 2
>>     [X][((S/S)\(S/S))/(S\NP)] [X][(S\NP)/N] [X] |||
>>     [X][((S/S)\(S/S))/(S\NP)] [X][(S\NP)/N] [((S/S)\(S/S))/N] ||| 0.5
>>     0.950152353227 ||| 0-0 1-1 ||| 8 4
>>     [X][((S/S)\(S/S))/NP] [X][NP/N] [X] ||| [X][((S/S)\(S/S))/NP]
>>     [X][NP/N] [((S/S)\(S/S))/N] ||| 0.25 0.900064286132 ||| 0-0 1-1
>>     ||| 8 2
>>     [X][((S/S)\(S/S))/N] [X][N/N] [X] ||| [X][((S/S)\(S/S))/N]
>>     [X][N/N] [((S/S)\(S/S))/N] ||| 0.25 0.96050145793 ||| 0-0 1-1 ||| 8 2
>>     [X][S_b_\(S_b_/NP)] [X][DOT] [X] ||| [X][S_b_\(S_b_/NP)] [X][DOT]
>>     [S_b_\(S_b_/NP)] ||| 0.0845295055821 0.975293919709 ||| 0-0 1-1
>>     ||| 627 53
>>     [X][(S\S)/S] [X][(S/S)/(S\NP)] [X] ||| [X][(S\S)/S]
>>     [X][(S/S)/(S\NP)] [((S\S)/S)/(S\NP)] ||| 0.41935483871
>>     0.968638607969 ||| 0-0 1-1 ||| 31 13
>>     [X][((S\S)/S)/(S_b_\NP)] [X][(S_b_\NP)/(S\NP)] [X] |||
>>     [X][((S\S)/S)/(S_b_\NP)] [X][(S_b_\NP)/(S\NP)] [((S\S)/S)/(S\NP)]
>>     ||| 0.0322580645161 0.969105691057 ||| 0-0 1-1 ||| 31 1
>>     ...
>>     -----------------------------------------------------------
>>
>>     As you can see, it's some CCG categories ||| then p(mother |
>>     children), then a smoothed probability estimate (no need to get
>>     into that) ||| then the alignment btw the non-terminals ||| then
>>     denominator and numerator counts.
>>
>>     This should work, shouldn't it?
>>
>>     Best,
>>     D.N.
>>     _______________________________________________
>>     Moses-support mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to