actually there are two parts here --building large LMs and deploying them. i currently have a summer MSc project looking at using Hadoop and Hbase to do this Google-style. this really does use a cluster of machines, for both parts. in either case, building them on-disk with a single machine or using a cluster of machines, the same challenges await.
and one lesson i learnt a few years back was that if you want to guarantee an improved translation score, put serious effort into the LM. that is what it gets you. Miles 2008/8/6 amittai axelrod <[EMAIL PROTECTED]> > 2008/8/5 John D. Burger <[EMAIL PROTECTED]>: > > I'm starting to think it's a lost cause to try to get one LM > > implementation to act very much like the other. Thanks for the > > insights, though! > > I also spent some time unsuccessfully trying to exactly match the > SRILM toolkit's output. Aside from the various default settings, there > is some pruning going on when using kndiscount. > It's fairly easy to produce a LM that's within a few digits of > precision, but it's hard to replicate perfectly. Of course, those > pesky few last digits change the LM scores very much. You could just > re-tune, but that's non-deterministic so things are still not directly > comparable; kind of annoying. > > There is also the larger question of "What does it get you?" (aside > from curiosity)... At the time, we were interested in building > monolithic SRI-style LMs on huge corpora. In the end, general interest > seems to have moved towards distributed LMs, mooting the original > exercise. > Um... Good luck! > > ~amittai > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
