Since we're playing optimize Moses memory usage, what's your language model?
On 06/29/11 14:32, Dennis Mehay wrote: > Hi Phil, > > Thanks for the tips. I already tried reducing the max span for the > re-ordering grammar (to 35, which is ~5 words more than the average span > of the training sentences that it was extracted from). This seemed to > help. I reduced the # of threads to 2, which also helped. > > I haven't reduced the cube-pruning-pop-limit below 7000, but might try > that if needed for experiments with larger grammars. > > As for the last option, does that just reduce some overhead that has to > do with the data structures built to accommodate the worst-case scenario > that someone uses 4 translation factors? I assume, then, that it'll be > a constant savings times some non-constant number of things (number of > cells in the chart, number of words in the sentence, or similar), right? > > --D.N. > > On Wed, Jun 29, 2011 at 9:43 AM, Phil Williams <[email protected] > <mailto:[email protected]>> wrote: > > Hi Dennis, > > moses_chart maintains a lot of state information during rule lookup > and if large numbers of rules can be applied at each span then the > memory use can get pretty huge. Your re-ordering grammar does sound > a likely candidate for triggering this. We're looking at ways to > reduce memory use and improvements should trickle into SVN over the > next few weeks. You could also try: > > * reducing the numbers of threads: each thread translates a > separate sentence so has its own rule lookup state (plus hypothesis > stacks, etc) > > * upgrading to revision 4050 -- this should give at least a small > improvement (I got a 4GB saving in a string-to-tree experiment that > originally used 37GB (approx 20GB static model storage + 17GB active > decoding state)) > > * reducing the decoder's -max-chart-span limit for the non-glue > grammars > > * reducing the decoder's -cube-pruning-pop-limit > > * [unsightly hack] changing this line in moses/src/TypeDef.h: > > const size_t MAX_NUM_FACTORS = 4; > to: > const size_t MAX_NUM_FACTORS = 1; > > and recompiling. > > Phil > > > On 28 Jun, 2011,at 11:34 PM, Dennis Mehay <[email protected] > <mailto:[email protected]>> wrote: > >> Hi all, >> >> I am MERTing using multithreaded moses_chart on a machine with >> quad-core processors (4 threads). What I find amazing is that >> this MERT run is consuming 20G of RAM (yes, 20G!). The main rule >> table is binarized (and it only had 1,926,150 entries in it to >> begin with -- those are non-Continental commas, i.e., ~2 >> million). So I thought maybe it's the second rule table, which >> consists entirely of syntactic re-ordering rules (only ~12 >> thousand entries). >> >> First, should multithreaded moses_chart be using so much memory? >> I gave it ttable limits of 50 (for the main rule table), 25 (for >> the purely syntactic table) and 1000 (for the glue grammar -- >> don't know why, just seemed like it wouldn't matter much). >> >> Second, I tried to binarize (convert to on-disk repr) the 12K >> entry table, but CreateOnDistPt isn't co-operating. >> >> $ CreateOnDiskPt 1 1 2 25 1 reordering-table.gz >> binarized.reordering-table >> Starting : [0] seconds >> CreateOnDiskPt: PhraseNode.cpp:98: void >> OnDiskPt::PhraseNode::Save(OnDiskPt::OnDiskWrapper&, size_t, >> size_t): Assertion `!m_saved' failed. >> Aborted >> >> What's going on here. I told it that the index of p(e | f) is 1, >> because the first score is p(mother | e-children, f-children), >> which is as close to p(e | f) as we're going to get here. There >> are 2 scores in the table, so it shouldn't be the third parameter >> >> Here's a sample of what the (home brewed) reordering-table.gz file >> looks like: >> >> ----------------------------------------------------------- >> ... >> [X][N_num_] [X] ||| [X][N_num_] [(S\NP)\((S\NP)/N_num_)] ||| >> 0.666666666667 1.0 ||| 0-0 ||| 3 2 >> [X][((S/S)\(S/S))/(S\NP)] [X][(S\NP)/N] [X] ||| >> [X][((S/S)\(S/S))/(S\NP)] [X][(S\NP)/N] [((S/S)\(S/S))/N] ||| 0.5 >> 0.950152353227 ||| 0-0 1-1 ||| 8 4 >> [X][((S/S)\(S/S))/NP] [X][NP/N] [X] ||| [X][((S/S)\(S/S))/NP] >> [X][NP/N] [((S/S)\(S/S))/N] ||| 0.25 0.900064286132 ||| 0-0 1-1 >> ||| 8 2 >> [X][((S/S)\(S/S))/N] [X][N/N] [X] ||| [X][((S/S)\(S/S))/N] >> [X][N/N] [((S/S)\(S/S))/N] ||| 0.25 0.96050145793 ||| 0-0 1-1 ||| 8 2 >> [X][S_b_\(S_b_/NP)] [X][DOT] [X] ||| [X][S_b_\(S_b_/NP)] [X][DOT] >> [S_b_\(S_b_/NP)] ||| 0.0845295055821 0.975293919709 ||| 0-0 1-1 >> ||| 627 53 >> [X][(S\S)/S] [X][(S/S)/(S\NP)] [X] ||| [X][(S\S)/S] >> [X][(S/S)/(S\NP)] [((S\S)/S)/(S\NP)] ||| 0.41935483871 >> 0.968638607969 ||| 0-0 1-1 ||| 31 13 >> [X][((S\S)/S)/(S_b_\NP)] [X][(S_b_\NP)/(S\NP)] [X] ||| >> [X][((S\S)/S)/(S_b_\NP)] [X][(S_b_\NP)/(S\NP)] [((S\S)/S)/(S\NP)] >> ||| 0.0322580645161 0.969105691057 ||| 0-0 1-1 ||| 31 1 >> ... >> ----------------------------------------------------------- >> >> As you can see, it's some CCG categories ||| then p(mother | >> children), then a smoothed probability estimate (no need to get >> into that) ||| then the alignment btw the non-terminals ||| then >> denominator and numerator counts. >> >> This should work, shouldn't it? >> >> Best, >> D.N. >> _______________________________________________ >> Moses-support mailing list >> [email protected] <mailto:[email protected]> >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
