Re: [Moses-support] search graph to word lattice

Chris Dyer Thu, 04 Mar 2010 04:55:09 -0800

Moses transition costs can be converted to probabilities (i.e., you
can make a search graph into a stochastic FSA), but they do need to be
renormalized. You can do this by computing the posterior probability
of each edge (using the forward-backward algorithm), and then
normalizing all of the out-going edges at each node.


One caveat: the way moses is usually trained (with MERT) means that
the resulting transition probabilities might be scaled in funny ways
(i.e., the best edge might have 99.99% of the probability mass, or it
might just be a miniscule amount over the next best), so you may need
to do some things (like rescaling the probabilities) to make them
useful.

-C

2010/3/4 Jörg Tiedemann <[email protected]>:
>
> One more time about the conversion from search graphs to word lattices:
> In the word lattice I would like to use probabilities for each edge but
> I guess that transition costs cannot be easily interpreted as log
> prob's. For example, I have seen quite a few positive transition values
> in my sample output which would definitely create some problems.
>
> Anyway, what I try to do is to use Moses output to create word lattice
> input for another translation step. Maybe the value at input lattice
> edges do not strictly have to be probabilities and I shouldn't care too
> much?
>
> Jörg
>
>
> Loïc BARRAULT wrote:
>> Hi Jörg,
>>
>> I'll take an example to explain my point of view.
>>
>> Here is an example of a recombined hypo :
>> 0 hyp=319 stack=3 back=1 score=-0.831512 transition=-0.641647
>> recombined=181 forward=3766 fscore=-205.134 covered=1-2 out=. I 'm
>> looking for a , pC=-0.518872, c=-0.31244
>>
>> In my case, hypo number are the nodes of the graph and phrases are
>> represented on links.
>> In this case, to preserve the graph topology, the only thing which can
>> be done is to merge the nodes 319 with 181, which result in creating a
>> link between node 1 (back node) and 181 (the recombined node).
>>
>> (X) ---------->(181)
>> (1)------------->(319)
>>
>> result in
>> (X) ---------->(181)
>> (1)---------------^
>>
>> In your example, you can't merge 5 and 1 because their history is not
>> the same (you pointed this out).
>> But if 6 is recombined and pointing to 4, then the only thing you can do
>> safely is to merge 6 and 4, which means creating a link between 5 and 4.
>>
>> Good luck.
>>
>> Loïc
>>
>>
>> 2010/3/3 Jörg Tiedemann <[email protected]
>> <mailto:[email protected]>>
>>
>>
>>     I try to use the search graph output now for producing a word
>>     lattice in PLF style. I'm still a bit confused on how to use the
>>     recombined hypotheses and their pointers to superior hypo's. Do I
>>     have to copy the relevant parts from the superior hypotheses into
>>     the lattice or should I join the hypotheses that point to recombined
>>     hypo's with the existing graph? To give an example:
>>
>>       who   is    bill    ?
>>     (0)-->(1)-->(2)--->(3)-->(4)
>>      |
>>      |--->(5)------------->(6)
>>      how  |   is bill ?
>>           |
>>           |---->(7)----->(8)
>>            is the   bill
>>
>>     where (6) is a recombined hypo pointing to (4) and covering tokens 1-3
>>     and (8) is a recombined hypo that points to (3)
>>
>>     Should I copy the relevant parts of (4) that cover the same tokens
>>     to the graph as a link to (5) or can I safely join (5) and (1)?
>>     Probably not because this would produce "who is the bill" which is
>>     not necessarily an option ...
>>
>>     Thanks a lot for clarifying this to me!
>>     Jörg
>>
>>
>>
>>     Chris Dyer wrote:
>>
>>         As long as you're just splitting, keeping the weights consistent
>>         isn't
>>         too hard- just keep all the weight in one segment and make all the
>>         rest of the segments have no impact when they multiply (i.e., a
>>         probability of 1, or a cost of 0).  The openFST or AT&T tools
>>         can help
>>         you manipulate lattices if you want to do more interesting
>>         things with
>>         weights, such as pushing them to the start of paths.
>>
>>         Chris
>>
>>         On Mon, Mar 1, 2010 at 1:58 PM, Loïc BARRAULT
>>         <[email protected]
>>         <mailto:[email protected]>> wrote:
>>
>>             Indeed, splitting is not hard, but the trickiest thing is
>>             how much
>>             probability/score amount do you give to each part of the split ?
>>             Maybe it has not any real impact in the end, or has it ?
>>             Loïc
>>
>>             2010/3/1 Chris Dyer <[email protected] <mailto:[email protected]>>
>>
>>                 I guess word-graph doesn't split phrases either (I was
>>                 just guessing).
>>                  It appears to be in SLF format, which is used by a
>>                 number of tools
>>                 (like HTK and the SRI tools).  SRILM can split lattices with
>>                 multi-word arcs into lattices, or you can write your own
>>                 code to do
>>                 it.  It's not terribly hard.
>>
>>                 Chris
>>
>>                 On Mon, Mar 1, 2010 at 12:32 PM, Joerg Tiedemann
>>                 <[email protected]
>>                 <mailto:[email protected]>> wrote:
>>
>>                     Ok thanks. I will use the output-word-graph option.
>>                     However, I also get
>>                     phrases with that option (in the w attribute), for
>>                     example here:
>>
>>                     ....
>>                     J=42    S=0     E=53    a=0, 0, 0, -0.693147,
>>                     0.999896  l=-13.695
>>                     r=-20, 0, -1.60944, 0, 0, 0     w=bill clinton ,
>>                     pC=0.0613498,
>>                     c=-3.23392
>>                     ...
>>
>>                     I'm not sure if I'm using the command line argument
>>                     correctly:
>>                     echo 'who is bill clinton ?' | \
>>                     moses -f moses.ini -output-word-graph test.graph 0
>>
>>                     Jörg
>>
>>
>>                     On 3/1/10 5:35 PM, Chris Dyer wrote:
>>
>>                         I don't have such a tool, but it wouldn't be too
>>                         difficult to write
>>                         one.  I think the difference between word graph
>>                         and search graph is
>>                         the search graph has full phrases on the edges,
>>                         whereas the word graph
>>                         has single words on the edges.  For the input,
>>                         you need single word
>>                         edges.
>>                         -Chris
>>
>>                         2010/3/1 Jörg
>>                         Tiedemann<[email protected]
>>                         <mailto:[email protected]>>:
>>
>>                             Is there a tool to convert output search
>>                             graphs to word lattices in
>>                             PLF
>>                              (moses lattice input format)? It's the
>>                             option -output-search-graph
>>                             that I should use for getting the relevant
>>                             information, right? I'm not
>>                             really sure if I understand the difference
>>                             between -output-word-graph
>>                             and -output-search-graph
>>                             Thanks!
>>
>>                             Jörg
>>
>>
>>
>>
>>                             
>> *******/\/\/\/\/\/\/\/\/\/\/\******************************************
>>                              Jörg Tiedemann
>>                              [email protected]
>>                             <mailto:[email protected]>
>>                              Visiting Professor
>>                              http://stp.lingfil.uu.se/~joerg/
>>                              Dep. of Linguistics and Philology
>>                              Uppsala University                  tel:
>>                             +46 (0)18 - 471 1412
>>                              Box 635, SE-751 26 Uppsala/SWEDEN   fax:
>>                             +46 (0)18 - 471 1094
>>
>>                             
>> *********************************/\/\/\/\/\/\/\/\/\/\/\****************
>>                             _______________________________________________
>>                             Moses-support mailing list
>>                             [email protected]
>>                             <mailto:[email protected]>
>>                             
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>                     _______________________________________________
>>                     Moses-support mailing list
>>                     [email protected] <mailto:[email protected]>
>>                     http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>                 _______________________________________________
>>                 Moses-support mailing list
>>                 [email protected] <mailto:[email protected]>
>>                 http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>             --
>>             ---
>>             Loïc BARRAULT
>>             Post-doctoral researcher
>>             LIUM - University of Le Mans
>>             Tél. +33/0 2 43 83 38 52
>>             http://www-lium.univ-lemans.fr/~barrault
>>             MANY : Open Source MT System Combination
>>             http://www-lium.univ-lemans.fr/~barrault/MANY
>>             ---
>>
>>
>>     --
>>
>>
>>     *******/\/\/\/\/\/\/\/\/\/\/\******************************************
>>      Jörg Tiedemann                      [email protected]
>>     <mailto:[email protected]>
>>      Visiting Professor                  http://stp.lingfil.uu.se/~joerg/
>>      Dep. of Linguistics and Philology
>>      Uppsala University                  tel: +46 (0)18 - 471 1412
>>      Box 635, SE-751 26 Uppsala/SWEDEN   fax: +46 (0)18 - 471 1094
>>     *********************************/\/\/\/\/\/\/\/\/\/\/\****************
>>
>>
>>
>>
>> --
>> ---
>> Loïc BARRAULT
>> Post-doctoral researcher
>> LIUM - University of Le Mans
>> Tél. +33/0 2 43 83 38 52
>> http://www-lium.univ-lemans.fr/~barrault
>> MANY : Open Source MT System Combination
>> http://www-lium.univ-lemans.fr/~barrault/MANY
>> ---
>
> --
>
> Hälsningar,
>
> Jörg
>
> *******/\/\/\/\/\/\/\/\/\/\/\******************************************
>  Jörg Tiedemann                      [email protected]
>  Visiting Professor                  http://stp.lingfil.uu.se/~joerg/
>  Dep. of Linguistics and Philology
>  Uppsala University                  tel: +46 (0)18 - 471 1412
>  Box 635, SE-751 26 Uppsala/SWEDEN   fax: +46 (0)18 - 471 1094
> *********************************/\/\/\/\/\/\/\/\/\/\/\****************
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] search graph to word lattice

Reply via email to