Hi Phil,

Thanks for the tips.  I already tried reducing the max span for the
re-ordering grammar (to 35, which is ~5 words more than the average span of
the training sentences that it was extracted from).  This seemed to help. I
reduced the # of threads to 2, which also helped.

I haven't reduced the cube-pruning-pop-limit below 7000, but might try that
if needed for experiments with larger grammars.

As for the last option, does that just reduce some overhead that has to do
with the data structures built to accommodate the worst-case scenario that
someone uses 4 translation factors?  I assume, then, that it'll be a
constant savings times some non-constant number of things (number of cells
in the chart, number of words in the sentence, or similar), right?

--D.N.

On Wed, Jun 29, 2011 at 9:43 AM, Phil Williams <[email protected]>wrote:

> Hi Dennis,
>
> moses_chart maintains a lot of state information during rule lookup and if
> large numbers of rules can be applied at each span then the memory use can
> get pretty huge.  Your re-ordering grammar does sound a likely candidate for
> triggering this.  We're looking at ways to reduce memory use and
> improvements should trickle into SVN over the next few weeks.  You could
> also try:
>
>  * reducing the numbers of threads: each thread translates a separate
> sentence so has its own rule lookup state (plus hypothesis stacks, etc)
>
>  * upgrading to revision 4050 -- this should give at least a small
> improvement (I got a 4GB saving in a string-to-tree experiment that
> originally used 37GB (approx 20GB static model storage + 17GB active
> decoding state))
>
>  * reducing the decoder's -max-chart-span limit for the non-glue grammars
>
>  * reducing the decoder's -cube-pruning-pop-limit
>
>  * [unsightly hack] changing this line in moses/src/TypeDef.h:
>
>         const size_t MAX_NUM_FACTORS = 4;
>     to:
>         const size_t MAX_NUM_FACTORS = 1;
>
>     and recompiling.
>
> Phil
>
>
> On 28 Jun, 2011,at 11:34 PM, Dennis Mehay <[email protected]> wrote:
>
> Hi all,
>
> I am MERTing using multithreaded moses_chart on a machine with quad-core
> processors (4 threads).  What I find amazing is that this MERT run is
> consuming 20G of RAM (yes, 20G!).  The main rule table is binarized (and it
> only had 1,926,150 entries in it to begin with -- those are non-Continental
> commas, i.e., ~2 million).  So I thought maybe it's the second rule table,
> which consists entirely of syntactic re-ordering rules (only ~12 thousand
> entries).
>
> First, should multithreaded moses_chart be using so much memory?  I gave it
> ttable limits of 50 (for the main rule table), 25 (for the purely syntactic
> table) and 1000 (for the glue grammar -- don't know why, just seemed like it
> wouldn't matter much).
>
> Second, I tried to binarize (convert to on-disk repr) the 12K entry table,
> but CreateOnDistPt isn't co-operating.
>
> $ CreateOnDiskPt 1 1 2 25 1 reordering-table.gz binarized.reordering-table
> Starting : [0] seconds
> CreateOnDiskPt: PhraseNode.cpp:98: void
> OnDiskPt::PhraseNode::Save(OnDiskPt::OnDiskWrapper&, size_t, size_t):
> Assertion `!m_saved' failed.
> Aborted
>
> What's going on here.  I told it that the index of p(e | f) is 1, because
> the first score is p(mother | e-children, f-children), which is as close to
> p(e | f) as we're going to get here.  There are 2 scores in the table, so it
> shouldn't be the third parameter
>
> Here's a sample of what the (home brewed) reordering-table.gz file looks
> like:
>
> -----------------------------------------------------------
> ...
> [X][N_num_] [X] ||| [X][N_num_] [(S\NP)\((S\NP)/N_num_)] ||| 0.666666666667
> 1.0 ||| 0-0 ||| 3 2
> [X][((S/S)\(S/S))/(S\NP)] [X][(S\NP)/N] [X] ||| [X][((S/S)\(S/S))/(S\NP)]
> [X][(S\NP)/N] [((S/S)\(S/S))/N] ||| 0.5 0.950152353227 ||| 0-0 1-1 ||| 8 4
> [X][((S/S)\(S/S))/NP] [X][NP/N] [X] ||| [X][((S/S)\(S/S))/NP] [X][NP/N]
> [((S/S)\(S/S))/N] ||| 0.25 0.900064286132 ||| 0-0 1-1 ||| 8 2
> [X][((S/S)\(S/S))/N] [X][N/N] [X] ||| [X][((S/S)\(S/S))/N] [X][N/N]
> [((S/S)\(S/S))/N] ||| 0.25 0.96050145793 ||| 0-0 1-1 ||| 8 2
> [X][S_b_\(S_b_/NP)] [X][DOT] [X] ||| [X][S_b_\(S_b_/NP)] [X][DOT]
> [S_b_\(S_b_/NP)] ||| 0.0845295055821 0.975293919709 ||| 0-0 1-1 ||| 627 53
> [X][(S\S)/S] [X][(S/S)/(S\NP)] [X] ||| [X][(S\S)/S] [X][(S/S)/(S\NP)]
> [((S\S)/S)/(S\NP)] ||| 0.41935483871 0.968638607969 ||| 0-0 1-1 ||| 31 13
> [X][((S\S)/S)/(S_b_\NP)] [X][(S_b_\NP)/(S\NP)] [X] |||
> [X][((S\S)/S)/(S_b_\NP)] [X][(S_b_\NP)/(S\NP)] [((S\S)/S)/(S\NP)] |||
> 0.0322580645161 0.969105691057 ||| 0-0 1-1 ||| 31 1
> ...
> -----------------------------------------------------------
>
> As you can see, it's some CCG categories ||| then p(mother | children),
> then a smoothed probability estimate (no need to get into that) ||| then the
> alignment btw the non-terminals ||| then denominator and numerator counts.
>
> This should work, shouldn't it?
>
> Best,
> D.N.
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to