Moses transition costs can be converted to probabilities (i.e., you can make a search graph into a stochastic FSA), but they do need to be renormalized. You can do this by computing the posterior probability of each edge (using the forward-backward algorithm), and then normalizing all of the out-going edges at each node.
One caveat: the way moses is usually trained (with MERT) means that the resulting transition probabilities might be scaled in funny ways (i.e., the best edge might have 99.99% of the probability mass, or it might just be a miniscule amount over the next best), so you may need to do some things (like rescaling the probabilities) to make them useful. -C 2010/3/4 Jörg Tiedemann <[email protected]>: > > One more time about the conversion from search graphs to word lattices: > In the word lattice I would like to use probabilities for each edge but > I guess that transition costs cannot be easily interpreted as log > prob's. For example, I have seen quite a few positive transition values > in my sample output which would definitely create some problems. > > Anyway, what I try to do is to use Moses output to create word lattice > input for another translation step. Maybe the value at input lattice > edges do not strictly have to be probabilities and I shouldn't care too > much? > > Jörg > > > Loïc BARRAULT wrote: >> Hi Jörg, >> >> I'll take an example to explain my point of view. >> >> Here is an example of a recombined hypo : >> 0 hyp=319 stack=3 back=1 score=-0.831512 transition=-0.641647 >> recombined=181 forward=3766 fscore=-205.134 covered=1-2 out=. I 'm >> looking for a , pC=-0.518872, c=-0.31244 >> >> In my case, hypo number are the nodes of the graph and phrases are >> represented on links. >> In this case, to preserve the graph topology, the only thing which can >> be done is to merge the nodes 319 with 181, which result in creating a >> link between node 1 (back node) and 181 (the recombined node). >> >> (X) ---------->(181) >> (1)------------->(319) >> >> result in >> (X) ---------->(181) >> (1)---------------^ >> >> In your example, you can't merge 5 and 1 because their history is not >> the same (you pointed this out). >> But if 6 is recombined and pointing to 4, then the only thing you can do >> safely is to merge 6 and 4, which means creating a link between 5 and 4. >> >> Good luck. >> >> Loïc >> >> >> 2010/3/3 Jörg Tiedemann <[email protected] >> <mailto:[email protected]>> >> >> >> I try to use the search graph output now for producing a word >> lattice in PLF style. I'm still a bit confused on how to use the >> recombined hypotheses and their pointers to superior hypo's. Do I >> have to copy the relevant parts from the superior hypotheses into >> the lattice or should I join the hypotheses that point to recombined >> hypo's with the existing graph? To give an example: >> >> who is bill ? >> (0)-->(1)-->(2)--->(3)-->(4) >> | >> |--->(5)------------->(6) >> how | is bill ? >> | >> |---->(7)----->(8) >> is the bill >> >> where (6) is a recombined hypo pointing to (4) and covering tokens 1-3 >> and (8) is a recombined hypo that points to (3) >> >> Should I copy the relevant parts of (4) that cover the same tokens >> to the graph as a link to (5) or can I safely join (5) and (1)? >> Probably not because this would produce "who is the bill" which is >> not necessarily an option ... >> >> Thanks a lot for clarifying this to me! >> Jörg >> >> >> >> Chris Dyer wrote: >> >> As long as you're just splitting, keeping the weights consistent >> isn't >> too hard- just keep all the weight in one segment and make all the >> rest of the segments have no impact when they multiply (i.e., a >> probability of 1, or a cost of 0). The openFST or AT&T tools >> can help >> you manipulate lattices if you want to do more interesting >> things with >> weights, such as pushing them to the start of paths. >> >> Chris >> >> On Mon, Mar 1, 2010 at 1:58 PM, Loïc BARRAULT >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Indeed, splitting is not hard, but the trickiest thing is >> how much >> probability/score amount do you give to each part of the split ? >> Maybe it has not any real impact in the end, or has it ? >> Loïc >> >> 2010/3/1 Chris Dyer <[email protected] <mailto:[email protected]>> >> >> I guess word-graph doesn't split phrases either (I was >> just guessing). >> It appears to be in SLF format, which is used by a >> number of tools >> (like HTK and the SRI tools). SRILM can split lattices with >> multi-word arcs into lattices, or you can write your own >> code to do >> it. It's not terribly hard. >> >> Chris >> >> On Mon, Mar 1, 2010 at 12:32 PM, Joerg Tiedemann >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Ok thanks. I will use the output-word-graph option. >> However, I also get >> phrases with that option (in the w attribute), for >> example here: >> >> .... >> J=42 S=0 E=53 a=0, 0, 0, -0.693147, >> 0.999896 l=-13.695 >> r=-20, 0, -1.60944, 0, 0, 0 w=bill clinton , >> pC=0.0613498, >> c=-3.23392 >> ... >> >> I'm not sure if I'm using the command line argument >> correctly: >> echo 'who is bill clinton ?' | \ >> moses -f moses.ini -output-word-graph test.graph 0 >> >> Jörg >> >> >> On 3/1/10 5:35 PM, Chris Dyer wrote: >> >> I don't have such a tool, but it wouldn't be too >> difficult to write >> one. I think the difference between word graph >> and search graph is >> the search graph has full phrases on the edges, >> whereas the word graph >> has single words on the edges. For the input, >> you need single word >> edges. >> -Chris >> >> 2010/3/1 Jörg >> Tiedemann<[email protected] >> <mailto:[email protected]>>: >> >> Is there a tool to convert output search >> graphs to word lattices in >> PLF >> (moses lattice input format)? It's the >> option -output-search-graph >> that I should use for getting the relevant >> information, right? I'm not >> really sure if I understand the difference >> between -output-word-graph >> and -output-search-graph >> Thanks! >> >> Jörg >> >> >> >> >> >> *******/\/\/\/\/\/\/\/\/\/\/\****************************************** >> Jörg Tiedemann >> [email protected] >> <mailto:[email protected]> >> Visiting Professor >> http://stp.lingfil.uu.se/~joerg/ >> Dep. of Linguistics and Philology >> Uppsala University tel: >> +46 (0)18 - 471 1412 >> Box 635, SE-751 26 Uppsala/SWEDEN fax: >> +46 (0)18 - 471 1094 >> >> >> *********************************/\/\/\/\/\/\/\/\/\/\/\**************** >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> <mailto:[email protected]> >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] <mailto:[email protected]> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] <mailto:[email protected]> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> -- >> --- >> Loïc BARRAULT >> Post-doctoral researcher >> LIUM - University of Le Mans >> Tél. +33/0 2 43 83 38 52 >> http://www-lium.univ-lemans.fr/~barrault >> MANY : Open Source MT System Combination >> http://www-lium.univ-lemans.fr/~barrault/MANY >> --- >> >> >> -- >> >> >> *******/\/\/\/\/\/\/\/\/\/\/\****************************************** >> Jörg Tiedemann [email protected] >> <mailto:[email protected]> >> Visiting Professor http://stp.lingfil.uu.se/~joerg/ >> Dep. of Linguistics and Philology >> Uppsala University tel: +46 (0)18 - 471 1412 >> Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094 >> *********************************/\/\/\/\/\/\/\/\/\/\/\**************** >> >> >> >> >> -- >> --- >> Loïc BARRAULT >> Post-doctoral researcher >> LIUM - University of Le Mans >> Tél. +33/0 2 43 83 38 52 >> http://www-lium.univ-lemans.fr/~barrault >> MANY : Open Source MT System Combination >> http://www-lium.univ-lemans.fr/~barrault/MANY >> --- > > -- > > Hälsningar, > > Jörg > > *******/\/\/\/\/\/\/\/\/\/\/\****************************************** > Jörg Tiedemann [email protected] > Visiting Professor http://stp.lingfil.uu.se/~joerg/ > Dep. of Linguistics and Philology > Uppsala University tel: +46 (0)18 - 471 1412 > Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094 > *********************************/\/\/\/\/\/\/\/\/\/\/\**************** > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
