ok, I've limited the maximum length of the input paths created with the arguments -max-phrase-length ??? by default, this is 20 which will still consume a fair bit of memory, but should be under 15GB.
In fact, you can set this to -max-phrase-length 7 as by default, the maximum EXTRACTED length is 7. code is here https://github.com/moses-smt/mosesdecoder/commit/8b9d4d1c7dac2f2a53e9a4d5949e10a4511aeb0c On 7 October 2013 11:54, Hieu Hoang <[email protected]> wrote: > @ondrej - Yes, Yulia's lattices look like confusion networks in disguised > so there will be a large number of paths through the lattice. > > the memory explosion is due to my code creating an object for every path. > It was mainly for the reason mention previously above, ie: > > I want to give each feature function the opprtunity to score with full > knowledge of the path. > > However, the old binary phrase-table doesn't require these objects to do > the lookups. Therefore, to enable Yulia and anyone else to decode large > lattices, my code will not run when > 1. decoding lattice/confusion networks, AND > 2. using the old binary phrase table. > > @Liang - thanks for the suggestions. I'm not sure how our lattice were > created. Lexi knows > > thanks for all who responded, was very useful. > > > > On 4 October 2013 22:20, Ondrej Bojar <[email protected]> wrote: > >> Hi, >> >> while you can always run rmepsilon from openfst or other toolkit, epsilon >> edges will be probably particularly useful if one would use different >> semirings for different components of the score vector. With generic >> toolkits, all the components of the score vector are processed in a single >> manner. Depending on whether Moses features do the "plus" of their >> respective scores on their own, each feature can use its own semiring. >> >> The probably (in some sense) maximal explosion in the number of paths is >> achieved when the lattice has the form of a confusion network (no >> epsilons). You get the full cartesian product of choices of the first >> token, the second token etc. >> >> Cheers, Ondrej. >> >> "Hieu Hoang" <[email protected]> wrote: >> >> >@nicola - i didn't see a reason either but some lattices from a speech >> >recognizer contains them so was just curious. I think chris has a point - >> >they may be easier to create. >> > >> >I think they may also more efficient to decode. In a non-deterministic >> >lattice, you might have the 2 edges with the same symbol coming out of 1 >> >node. Each would have to be decoded separately. >> > >> >However, its a pain to decode epsilons and there might be weird edge >> cases, >> >eg. consecutive, beginning and end epsilons, entirely epsiloms. >> > >> >@chris - cheers for the explanation. i might use victor's code and see >> how >> >it goes. >> > >> >Do you have an example (large) lattice that blows up memory that you can >> >share? >> > >> >Yes - i've changed the code to extract all possible paths. In fact, i >> >extract all paths from beginning to end of sentence, without limit. 2 >> >reasons for this >> > 1. I also divorced extracting the path creation from the phrase-table >> >lookup. In the general case there's multiple phrase-tables so it's >> >difficult to keep track of the tries. Also, the intertwinning of the >> binary >> >pt loookup with lattices made it difficult to read. >> > 2. I want to give each feature function the opprtunity to score with >> >full knowledge of the path. >> > >> >This may have to be altered if the memory explosion is too drastic >> > >> > >> > >> > >> >On 4 October 2013 17:49, Chris Dyer <[email protected]> wrote: >> > >> >> It's useful to have epsilons since it simplifies the creation of >> >> lattices in some cases. Yes, you can convert them to a deterministic >> >> equivalent, but that involves implementing FSA determinatization (or >> >> using a tool like https://pypi.python.org/pypi/pyfst), which may not >> >> be convenient. >> >> >> >> Btw, I've also noticed that memory usage with lattices/CNs explodes >> >> with non-binarized phrase tables (maybe also with binarized PTs?). >> >> This is independent of the size of the phrase table and only seems to >> >> be a function of the lattice structure. I'm not sure what's going on >> >> (the code has changed substantially since I last looked at it). But, >> >> you should always match paths in the lattice with paths in the phrase >> >> table trie- maybe moses is now trying to extract all possible paths in >> >> the lattice up to max-phrase-size or something? >> >> >> >> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <[email protected]> >> wrote: >> >> > I don't see any reason why a lattice should contain an EPSILON edge. >> >> > >> >> > In a confusion network, EPSILON are needed to allow the translation >> of >> >> input of different lengths. >> >> > The sausage structure of the CN imposes the same amount of source >> words, >> >> > and the EPSILONs overcome this constraint. >> >> > >> >> > This is not the case for lattice, because you can have any number of >> >> edges/words in a complete source path. >> >> > >> >> > >> >> > cheers, >> >> > Nicola >> >> > >> >> > >> >> > >> >> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote: >> >> > >> >> > I'm just looking at the lattices decoding, as implemented in moses. >> >> > >> >> > for confusion networks, it's fair to have EPSILON words (that >> represent >> >> blank words). However, I don't see the point of them in lattices. >> >> > >> >> > Anyone have an opinion? How is it implemented in cdec & joshua? >> >> > >> >> > -- >> >> > Hieu Hoang >> >> > Research Associate >> >> > University of Edinburgh >> >> > http://www.hoang.co.uk/hieu >> >> > >> >> > _______________________________________________ >> >> > Moses-support mailing list >> >> > [email protected]<mailto:[email protected]> >> >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > >> >> > >> >> > _______________________________________________ >> >> > Moses-support mailing list >> >> > [email protected] >> >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups >> >> "cdec users" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an >> >> email to [email protected]. >> >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > >> > >> > >> >-- >> >Hieu Hoang >> >Research Associate >> >University of Edinburgh >> >http://www.hoang.co.uk/hieu >> >_______________________________________________ >> >Moses-support mailing list >> >[email protected] >> >http://mailman.mit.edu/mailman/listinfo/moses-support >> >> -- >> Ondrej Bojar >> http://www.cuni.cz/~obo >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
