@nicola - i didn't see a reason either but some lattices from a speech recognizer contains them so was just curious. I think chris has a point - they may be easier to create.
I think they may also more efficient to decode. In a non-deterministic lattice, you might have the 2 edges with the same symbol coming out of 1 node. Each would have to be decoded separately. However, its a pain to decode epsilons and there might be weird edge cases, eg. consecutive, beginning and end epsilons, entirely epsiloms. @chris - cheers for the explanation. i might use victor's code and see how it goes. Do you have an example (large) lattice that blows up memory that you can share? Yes - i've changed the code to extract all possible paths. In fact, i extract all paths from beginning to end of sentence, without limit. 2 reasons for this 1. I also divorced extracting the path creation from the phrase-table lookup. In the general case there's multiple phrase-tables so it's difficult to keep track of the tries. Also, the intertwinning of the binary pt loookup with lattices made it difficult to read. 2. I want to give each feature function the opprtunity to score with full knowledge of the path. This may have to be altered if the memory explosion is too drastic On 4 October 2013 17:49, Chris Dyer <[email protected]> wrote: > It's useful to have epsilons since it simplifies the creation of > lattices in some cases. Yes, you can convert them to a deterministic > equivalent, but that involves implementing FSA determinatization (or > using a tool like https://pypi.python.org/pypi/pyfst), which may not > be convenient. > > Btw, I've also noticed that memory usage with lattices/CNs explodes > with non-binarized phrase tables (maybe also with binarized PTs?). > This is independent of the size of the phrase table and only seems to > be a function of the lattice structure. I'm not sure what's going on > (the code has changed substantially since I last looked at it). But, > you should always match paths in the lattice with paths in the phrase > table trie- maybe moses is now trying to extract all possible paths in > the lattice up to max-phrase-size or something? > > On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <[email protected]> wrote: > > I don't see any reason why a lattice should contain an EPSILON edge. > > > > In a confusion network, EPSILON are needed to allow the translation of > input of different lengths. > > The sausage structure of the CN imposes the same amount of source words, > > and the EPSILONs overcome this constraint. > > > > This is not the case for lattice, because you can have any number of > edges/words in a complete source path. > > > > > > cheers, > > Nicola > > > > > > > > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote: > > > > I'm just looking at the lattices decoding, as implemented in moses. > > > > for confusion networks, it's fair to have EPSILON words (that represent > blank words). However, I don't see the point of them in lattices. > > > > Anyone have an opinion? How is it implemented in cdec & joshua? > > > > -- > > Hieu Hoang > > Research Associate > > University of Edinburgh > > http://www.hoang.co.uk/hieu > > > > _______________________________________________ > > Moses-support mailing list > > [email protected]<mailto:[email protected]> > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- > You received this message because you are subscribed to the Google Groups > "cdec users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
