ok, I've limited the maximum length of the input paths created with the
arguments
   -max-phrase-length ???
by default, this is 20 which will still consume a fair bit of memory, but
should be under 15GB.

In fact, you can set this to
   -max-phrase-length 7
as by default, the maximum EXTRACTED length is 7.

code is here

https://github.com/moses-smt/mosesdecoder/commit/8b9d4d1c7dac2f2a53e9a4d5949e10a4511aeb0c




On 7 October 2013 11:54, Hieu Hoang <[email protected]> wrote:

> @ondrej - Yes, Yulia's lattices look like confusion networks in disguised
> so there will be a large number of paths through the lattice.
>
> the memory explosion is due to my code creating an object for every path.
> It was mainly for the reason mention previously above, ie:
>
>    I want to give each feature function the opprtunity to score with full
> knowledge of the path.
>
> However, the old binary phrase-table doesn't require these objects to do
> the lookups. Therefore, to enable Yulia and anyone else to decode large
> lattices, my code will not run when
>    1. decoding lattice/confusion networks, AND
>    2. using the old binary phrase table.
>
> @Liang - thanks for the suggestions. I'm not sure how our lattice were
> created. Lexi knows
>
> thanks for all who responded, was very useful.
>
>
>
> On 4 October 2013 22:20, Ondrej Bojar <[email protected]> wrote:
>
>> Hi,
>>
>> while you can always run rmepsilon from openfst or other toolkit, epsilon
>> edges will be probably particularly useful if one would use different
>> semirings for different components of the score vector. With generic
>> toolkits, all the components of the score vector are processed in a single
>> manner. Depending on whether Moses features do the "plus" of their
>> respective scores on their own, each feature can use its own semiring.
>>
>> The probably (in some sense) maximal explosion in the number of paths is
>> achieved when the lattice has the form of a confusion network (no
>> epsilons). You get the full cartesian product of choices of the first
>> token, the second token etc.
>>
>> Cheers, Ondrej.
>>
>> "Hieu Hoang" <[email protected]> wrote:
>>
>> >@nicola - i didn't see a reason either but some lattices from a speech
>> >recognizer contains them so was just curious. I think chris has a point -
>> >they may be easier to create.
>> >
>> >I think they may also more efficient to decode. In a non-deterministic
>> >lattice, you might have the 2 edges with the same symbol coming out of 1
>> >node. Each would have to be decoded separately.
>> >
>> >However, its a pain to decode epsilons and there might be weird edge
>> cases,
>> >eg. consecutive, beginning and end epsilons, entirely epsiloms.
>> >
>> >@chris - cheers for the explanation. i might use victor's code and see
>> how
>> >it goes.
>> >
>> >Do you have an example (large) lattice that blows up memory that you can
>> >share?
>> >
>> >Yes - i've changed the code to extract all possible paths. In fact, i
>> >extract all paths from beginning to end of sentence, without limit. 2
>> >reasons for this
>> >   1. I also divorced extracting the path creation from the phrase-table
>> >lookup. In the general case there's multiple phrase-tables so it's
>> >difficult to keep track of the tries. Also, the intertwinning of the
>> binary
>> >pt loookup with lattices made it difficult to read.
>> >   2. I want to give each feature function the opprtunity to score with
>> >full knowledge of the path.
>> >
>> >This may have to be altered if the memory explosion is too drastic
>> >
>> >
>> >
>> >
>> >On 4 October 2013 17:49, Chris Dyer <[email protected]> wrote:
>> >
>> >> It's useful to have epsilons since it simplifies the creation of
>> >> lattices in some cases. Yes, you can convert them to a deterministic
>> >> equivalent, but that involves implementing FSA determinatization (or
>> >> using a tool like https://pypi.python.org/pypi/pyfst), which may not
>> >> be convenient.
>> >>
>> >> Btw, I've also noticed that memory usage with lattices/CNs explodes
>> >> with non-binarized phrase tables (maybe also with binarized PTs?).
>> >> This is independent of the size of the phrase table and only seems to
>> >> be a function of the lattice structure. I'm not sure what's going on
>> >> (the code has changed substantially since I last looked at it). But,
>> >> you should always match paths in the lattice with paths in the phrase
>> >> table trie- maybe moses is now trying to extract all possible paths in
>> >> the lattice up to max-phrase-size or something?
>> >>
>> >> On Fri, Oct 4, 2013 at 11:22 AM, Nicola Bertoldi <[email protected]>
>> wrote:
>> >> > I don't see any reason why a lattice should contain an EPSILON edge.
>> >> >
>> >> > In a confusion network, EPSILON are needed to allow the translation
>> of
>> >> input of different lengths.
>> >> > The sausage structure of the CN imposes the same amount of source
>> words,
>> >> > and the EPSILONs overcome this constraint.
>> >> >
>> >> > This is not the case for lattice, because you can have any number of
>> >> edges/words in a complete source path.
>> >> >
>> >> >
>> >> > cheers,
>> >> > Nicola
>> >> >
>> >> >
>> >> >
>> >> > On Oct 4, 2013, at 2:52 PM, Hieu Hoang wrote:
>> >> >
>> >> > I'm just looking at the lattices decoding, as implemented in moses.
>> >> >
>> >> > for confusion networks, it's fair to have EPSILON words (that
>> represent
>> >> blank words). However, I don't see the point of them in lattices.
>> >> >
>> >> > Anyone have an opinion? How is it implemented in cdec & joshua?
>> >> >
>> >> > --
>> >> > Hieu Hoang
>> >> > Research Associate
>> >> > University of Edinburgh
>> >> > http://www.hoang.co.uk/hieu
>> >> >
>> >> > _______________________________________________
>> >> > Moses-support mailing list
>> >> > [email protected]<mailto:[email protected]>
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Moses-support mailing list
>> >> > [email protected]
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups
>> >> "cdec users" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an
>> >> email to [email protected].
>> >> For more options, visit https://groups.google.com/groups/opt_out.
>> >>
>> >
>> >
>> >
>> >--
>> >Hieu Hoang
>> >Research Associate
>> >University of Edinburgh
>> >http://www.hoang.co.uk/hieu
>> >_______________________________________________
>> >Moses-support mailing list
>> >[email protected]
>> >http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> --
>> Ondrej Bojar
>> http://www.cuni.cz/~obo
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to