Hi, Hieu

Thank you for your quick reply :-).

But then, if the decoder can handle any number of non-terminal, it means I
still don't understand why the rules I am generating do not seem to be used
properly by the decoder. I think I will try to make a minimal example of
the problem I have and come back for more wisdom...

Thanks again for the answers,
Fabien


On Fri, Nov 9, 2012 at 12:15 AM, Hieu Hoang <[email protected]> wrote:

>  hi fabien
>
>
> On 08/11/2012 01:25, Fabien Cromières wrote:
>
> Dear Moses community,
>
>  First, thanks for providing the Research community with such a nice
> open-source tool.
>
>  Second, I have some rather involved questions about the moses-chart
> decoder for syntax-based MT. Hopefully, someone having a good familiarity
> with it can answer me. Thanks in advance!
>
>  **** Short version of the questions:  ****
>
>  * Is there any support for Synchronized Tree Substitution Grammar in
> moses-chart? (or any trick to use such a grammar with the decoder).
>
> no, moses-chart only does scfg.
> The decoder accepts tree input for tree-to-string decoding. This can
> approximate stsg decoding.
>
>  * Can moses-chart handle rules with more than 2 Non Terminals?
>
> yes, the decoding algorithm is CKY+. It can handle arbitrary number of
> non-terms.
>
>  * Can moses-chart be directly given (in any way) a compact
> representation of a set of parse of an input sentence?  (and then just do
> the remaining work of selecting the best parse)
>
> i believe so, but i've never tried it. However, you can't assign
> probabilities of the input parse though. It wasn't really designed for
> forest decoding, so i'm not sure if it will be any good.
>
>
>  **** Longer, more detailed, version: ****
>
>  I am trying to improve a tree-to-string MT system (it is actually
> tree-to-tree, but it will be easier for me to describe it as a
> tree-to-string system). And I was hoping I could somehow re-use part of the
> Moses toolchain. Basically, my system uses a dependency tree representation
> of the input sentence.
>
>  For example: he ->is <- (a->boy)  (not easy to represent trees with
> strings in a readable way;hopefully, this notation is intuitive enough).
>
>  I then have some Synchronized Tree Substitution rules, eg. for
> English-French (although unlike normal TSG, here the target side will be  a
> flat string).
>
>  R1: Y-> he | il
> R2: X-> Y->is<-Z| Y est Z
> R3: Z->a -> boy| un garcon
> R4: Z->a ->boy| un enfant
>  R5: Z-> V->boy| V garcon
>  R6: V->a |un
>
>  The rules are already extracted, selected and "mapped" to the input by
> my system (in other words, the source side parsing is already done). This
> mean I have a somehow compact representation of every possible alternatives
> derivations like this:
> R1->R2<-(R3|R4|(R5<-R6))
>
>  Each derivation gives a different target sentence. From there, the
> problem is extracting the best derivation/translation according to language
> model and other features. It is essentially possible to do that with cube
> pruning or other beam-search approaches. However it would be interesting
> for me to re-use the work done for the moses-chart implementation.
>
>  My first idea was to try and convert my TSG rules to the rule format
> expected by moses-chart. Unfortunately, this way, I cannot tell moses-chart
> in any way that the source side has already been parsed. However, I was
> hoping that I could make it easy for moses-chart to do the source-side
> parsing.
>
>  Here is how I did it. For each input sentence, I use  the position of
> the words instead of the words (to reduce parsing ambiguity).
> For example, the input sentence:
> he is a boy
> becomes:
> 1 2 3 4
> when I feed it to moses-chart.
>
>  Then I generate a rule file by adapting my rules accordingly (note that
> each Non Terminal encode the source position it should match in its name,
> ensuring the possible derivations that can be found by moses-chart is the
> same as those given by the parsing done by my system):
> 1 [X][X1] ||| il [X][X1] ||| ||| ...
> [X][X1] 2 [X][X4] [X][TOP] ||| [X][X1] est [X][X4] [X][TOP] ||| 0-0 2-2 |||
> 3 4 [X][X4] ||| un garcon [X][X4] ||| ||| ...
>  3 4 [X][X4] ||| un enfant [X][X4] ||| ||| ...
>  [X][X3] 4 [X][X4] ||| [X][X3] garcon [X][X4] ||| 0-0 ||| ...
>  3 [X][X3] ||| un [X][X1] ||| ||| ...
>
>  However, I quickly found this was not giving the expected result
> (actually, no translation is found most of the time). After analyzing a bit
> more, I think I found out the problem is that I am generating some rules
> with more than 2 non-terminals. moses-chart do not complain about them, but
> it does not seem to be able to use them properly.
>
>  I know hiero-style parsing will not work well with rules having more
> than 2 NT, so I guess this is to be expected. However, since it is not
> explicitly mentioned anywhere, I wanted to confirm that.
>
>  Also, has anyone any suggestion for doing what I want to do with
> moses-chart? (since I have already the source-side parsing figured out, I
> don't think there is theoretical problems with having more than 2 NT per
> rule in my case; however, I probably need to somehow provide the set of
> possible parses to moses-chart)
>
>  I could also consider hacking into the moses-cmd code and try to replace
> the source-parsing component by my own, but re-use some of the machinery.
> Does that make sense or am I better off giving up on the idea of  re-using
> moses code?). Or is anybody aware of another open source decoder that would
> be more adapted for my case?
>
>  Thanks a lot to anyone that took the time to read my long explanations,
> anyway :-)
>
>  Fabien
>
>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to