@nicola - i didn't see a reason either but some lattices from a speech
recognizer contains them so was just curious. I think chris has a point -
they may be easier to create.

I think they may also more efficient to decode. In a non-deterministic
lattice, you might have the 2 edges with the same symbol coming out of 1
node. Each would have to be decoded separately.

However, its a pain to decode epsilons and there might be weird edge cases,
eg. consecutive, beginning and end epsilons, entirely epsiloms.

@chris - cheers for the explanation. i might use victor's code and see how
it goes.

Do you have an example (large) lattice that blows up memory that you can
share?

Yes - i've changed the code to extract all possible paths. In fact, i
extract all paths from beginning to end of sentence, without limit. 2
reasons for this
   1. I also divorced extracting the path creation from the phrase-table
lookup. In the general case there's multiple phrase-tables so it's
difficult to keep track of the tries. Also, the intertwinning of the binary
pt loookup with lattices made it difficult to read.
   2. I want to give each feature function the opprtunity to score with
full knowledge of the path.

This may have to be altered if the memory explosion is too drastic




On 4 October 2013 17:49, Chris Dyer <[email protected]> wrote:

> It's useful to have epsilons since it simplifies the creation of
> lattices in some cases. Yes, you can convert them to a deterministic
> equivalent, but that involves implementing FSA determinatization (or
> using a tool like https://pypi.python.org/pypi/pyfst), which may not
> be convenient.
>
> Btw, I've also noticed that memory usage with lattices/CNs explodes
> with non-binarized phrase tables (maybe also with binarized PTs?).
> This is independent of the size of the phrase table and only seems to
> be a function of the lattice structure. I'm not sure what's going on
> (the code has changed substantially since I last looked at it). But,
> you should always match paths in the lattice with paths in the phrase
> table trie- maybe moses is now trying to extract all possible paths in
> the lattice up to max-phrase-size or something?
>
> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <[email protected]> wrote:
> > I don't see any reason why a lattice should contain an EPSILON edge.
> >
> > In a confusion network, EPSILON are needed to allow the translation of
> input of different lengths.
> > The sausage structure of the CN imposes the same amount of source words,
> > and the EPSILONs overcome this constraint.
> >
> > This is not the case for lattice, because you can have any number of
> edges/words in a complete source path.
> >
> >
> > cheers,
> > Nicola
> >
> >
> >
> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
> >
> > I'm just looking at the lattices decoding, as implemented in moses.
> >
> > for confusion networks, it's fair to have EPSILON words (that represent
> blank words). However, I don't see the point of them in lattices.
> >
> > Anyone have an opinion? How is it implemented in cdec & joshua?
> >
> > --
> > Hieu Hoang
> > Research Associate
> > University of Edinburgh
> > http://www.hoang.co.uk/hieu
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]<mailto:[email protected]>
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> You received this message because you are subscribed to the Google Groups
> "cdec users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to