Hi, Hieu Thank you for your quick reply :-).
But then, if the decoder can handle any number of non-terminal, it means I still don't understand why the rules I am generating do not seem to be used properly by the decoder. I think I will try to make a minimal example of the problem I have and come back for more wisdom... Thanks again for the answers, Fabien On Fri, Nov 9, 2012 at 12:15 AM, Hieu Hoang <[email protected]> wrote: > hi fabien > > > On 08/11/2012 01:25, Fabien Cromières wrote: > > Dear Moses community, > > First, thanks for providing the Research community with such a nice > open-source tool. > > Second, I have some rather involved questions about the moses-chart > decoder for syntax-based MT. Hopefully, someone having a good familiarity > with it can answer me. Thanks in advance! > > **** Short version of the questions: **** > > * Is there any support for Synchronized Tree Substitution Grammar in > moses-chart? (or any trick to use such a grammar with the decoder). > > no, moses-chart only does scfg. > The decoder accepts tree input for tree-to-string decoding. This can > approximate stsg decoding. > > * Can moses-chart handle rules with more than 2 Non Terminals? > > yes, the decoding algorithm is CKY+. It can handle arbitrary number of > non-terms. > > * Can moses-chart be directly given (in any way) a compact > representation of a set of parse of an input sentence? (and then just do > the remaining work of selecting the best parse) > > i believe so, but i've never tried it. However, you can't assign > probabilities of the input parse though. It wasn't really designed for > forest decoding, so i'm not sure if it will be any good. > > > **** Longer, more detailed, version: **** > > I am trying to improve a tree-to-string MT system (it is actually > tree-to-tree, but it will be easier for me to describe it as a > tree-to-string system). And I was hoping I could somehow re-use part of the > Moses toolchain. Basically, my system uses a dependency tree representation > of the input sentence. > > For example: he ->is <- (a->boy) (not easy to represent trees with > strings in a readable way;hopefully, this notation is intuitive enough). > > I then have some Synchronized Tree Substitution rules, eg. for > English-French (although unlike normal TSG, here the target side will be a > flat string). > > R1: Y-> he | il > R2: X-> Y->is<-Z| Y est Z > R3: Z->a -> boy| un garcon > R4: Z->a ->boy| un enfant > R5: Z-> V->boy| V garcon > R6: V->a |un > > The rules are already extracted, selected and "mapped" to the input by > my system (in other words, the source side parsing is already done). This > mean I have a somehow compact representation of every possible alternatives > derivations like this: > R1->R2<-(R3|R4|(R5<-R6)) > > Each derivation gives a different target sentence. From there, the > problem is extracting the best derivation/translation according to language > model and other features. It is essentially possible to do that with cube > pruning or other beam-search approaches. However it would be interesting > for me to re-use the work done for the moses-chart implementation. > > My first idea was to try and convert my TSG rules to the rule format > expected by moses-chart. Unfortunately, this way, I cannot tell moses-chart > in any way that the source side has already been parsed. However, I was > hoping that I could make it easy for moses-chart to do the source-side > parsing. > > Here is how I did it. For each input sentence, I use the position of > the words instead of the words (to reduce parsing ambiguity). > For example, the input sentence: > he is a boy > becomes: > 1 2 3 4 > when I feed it to moses-chart. > > Then I generate a rule file by adapting my rules accordingly (note that > each Non Terminal encode the source position it should match in its name, > ensuring the possible derivations that can be found by moses-chart is the > same as those given by the parsing done by my system): > 1 [X][X1] ||| il [X][X1] ||| ||| ... > [X][X1] 2 [X][X4] [X][TOP] ||| [X][X1] est [X][X4] [X][TOP] ||| 0-0 2-2 ||| > 3 4 [X][X4] ||| un garcon [X][X4] ||| ||| ... > 3 4 [X][X4] ||| un enfant [X][X4] ||| ||| ... > [X][X3] 4 [X][X4] ||| [X][X3] garcon [X][X4] ||| 0-0 ||| ... > 3 [X][X3] ||| un [X][X1] ||| ||| ... > > However, I quickly found this was not giving the expected result > (actually, no translation is found most of the time). After analyzing a bit > more, I think I found out the problem is that I am generating some rules > with more than 2 non-terminals. moses-chart do not complain about them, but > it does not seem to be able to use them properly. > > I know hiero-style parsing will not work well with rules having more > than 2 NT, so I guess this is to be expected. However, since it is not > explicitly mentioned anywhere, I wanted to confirm that. > > Also, has anyone any suggestion for doing what I want to do with > moses-chart? (since I have already the source-side parsing figured out, I > don't think there is theoretical problems with having more than 2 NT per > rule in my case; however, I probably need to somehow provide the set of > possible parses to moses-chart) > > I could also consider hacking into the moses-cmd code and try to replace > the source-parsing component by my own, but re-use some of the machinery. > Does that make sense or am I better off giving up on the idea of re-using > moses code?). Or is anybody aware of another open source decoder that would > be more adapted for my case? > > Thanks a lot to anyone that took the time to read my long explanations, > anyway :-) > > Fabien > > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
