Hi, this is technical possible, although there is currently no good generic support for additional information for the phrase table (there are several other things that may be useful as well). One bit of a problem is that some phrase pairs (think translations of "the") occur in many many many sentence pairs, so this would require very large additional memory.
-phi On Mon, Jan 14, 2013 at 9:38 AM, David Wilson-Parr <[email protected]> wrote: > Hi, > > I was wondering if there was any way to get a list of sentence ids in > the final phrase table corresponding to where that phrase occurred? > > I noticed that the 'extract' program used in step (5) takes the argument > '--IncludeSentenceId' and I tried this and it does include the ID (line > number in corpus) in the extract.sorted and extract.inv.sorted however I > don't suppose that these are still completely valid after the final > phrase table is calculated after the score phrases (6.6) step which > consolidates the normal and the inverse files together. Is there any > 'idiots' process description of what the consolidate process does? I > found the source code quite hard to follow. > > Also I didn't understand why the 'aligned.grow-diag-final-and' file is > generated earlier which is an already combined version of the normal and > inverse word alignments (I think - at least it seems to have many to > many relationships in it!) if the processing then needs to go back to > using them both separately. > > Sorry if I misunderstood something, I am just scratching the surface at > the moment. > > Kind regards, > > Dave > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
