Re: [Moses-support] help for reusing moses-chart in a different (dependency tree - to - string) MT system

Hieu Hoang Thu, 08 Nov 2012 07:18:57 -0800

hi fabien

On 08/11/2012 01:25, Fabien Cromières wrote:

Dear Moses community,
First, thanks for providing the Research community with such a niceopen-source tool.
Second, I have some rather involved questions about the moses-chartdecoder for syntax-based MT. Hopefully, someone having a goodfamiliarity with it can answer me. Thanks in advance!
**** Short version of the questions:  ****
* Is there any support for Synchronized Tree Substitution Grammar inmoses-chart? (or any trick to use such a grammar with the decoder).

no, moses-chart only does scfg.

The decoder accepts tree input for tree-to-string decoding. This canapproximate stsg decoding.

* Can moses-chart handle rules with more than 2 Non Terminals?

yes, the decoding algorithm is CKY+. It can handle arbitrary number ofnon-terms.

* Can moses-chart be directly given (in any way) a compactrepresentation of a set of parse of an input sentence? (and then justdo the remaining work of selecting the best parse)

i believe so, but i've never tried it. However, you can't assignprobabilities of the input parse though. It wasn't really designed forforest decoding, so i'm not sure if it will be any good.

**** Longer, more detailed, version: ****
I am trying to improve a tree-to-string MT system (it is actuallytree-to-tree, but it will be easier for me to describe it as atree-to-string system). And I was hoping I could somehow re-use partof the Moses toolchain. Basically, my system uses a dependency treerepresentation of the input sentence.
For example: he ->is <- (a->boy) (not easy to represent trees withstrings in a readable way;hopefully, this notation is intuitive enough).
I then have some Synchronized Tree Substitution rules, eg. forEnglish-French (although unlike normal TSG, here the target side willbe a flat string).
R1: Y-> he | il
R2: X-> Y->is<-Z| Y est Z
R3: Z->a -> boy| un garcon
R4: Z->a ->boy| un enfant
R5: Z-> V->boy| V garcon
R6: V->a |un
The rules are already extracted, selected and "mapped" to the input bymy system (in other words, the source side parsing is already done).This mean I have a somehow compact representation of every possiblealternatives derivations like this:
R1->R2<-(R3|R4|(R5<-R6))
Each derivation gives a different target sentence. From there, theproblem is extracting the best derivation/translation according tolanguage model and other features. It is essentially possible to dothat with cube pruning or other beam-search approaches. However itwould be interesting for me to re-use the work done for themoses-chart implementation.
My first idea was to try and convert my TSG rules to the rule formatexpected by moses-chart. Unfortunately, this way, I cannot tellmoses-chart in any way that the source side has already been parsed.However, I was hoping that I could make it easy for moses-chart to dothe source-side parsing.
Here is how I did it. For each input sentence, I use the position ofthe words instead of the words (to reduce parsing ambiguity).
For example, the input sentence:
he is a boy
becomes:
1 2 3 4
when I feed it to moses-chart.
Then I generate a rule file by adapting my rules accordingly (notethat each Non Terminal encode the source position it should match inits name, ensuring the possible derivations that can be found bymoses-chart is the same as those given by the parsing done by my system):
1 [X][X1] ||| il [X][X1] ||| ||| ...
[X][X1] 2 [X][X4] [X][TOP] ||| [X][X1] est [X][X4] [X][TOP] ||| 0-02-2 |||
3 4 [X][X4] ||| un garcon [X][X4] ||| ||| ...
3 4 [X][X4] ||| un enfant [X][X4] ||| ||| ...
[X][X3] 4 [X][X4] ||| [X][X3] garcon [X][X4] ||| 0-0 ||| ...
3 [X][X3] ||| un [X][X1] ||| ||| ...
However, I quickly found this was not giving the expected result(actually, no translation is found most of the time). After analyzinga bit more, I think I found out the problem is that I am generatingsome rules with more than 2 non-terminals. moses-chart do not complainabout them, but it does not seem to be able to use them properly.
I know hiero-style parsing will not work well with rules having morethan 2 NT, so I guess this is to be expected. However, since it is notexplicitly mentioned anywhere, I wanted to confirm that.
Also, has anyone any suggestion for doing what I want to do withmoses-chart? (since I have already the source-side parsing figuredout, I don't think there is theoretical problems with having more than2 NT per rule in my case; however, I probably need to somehow providethe set of possible parses to moses-chart)
I could also consider hacking into the moses-cmd code and try toreplace the source-parsing component by my own, but re-use some of themachinery. Does that make sense or am I better off giving up on theidea of re-using moses code?). Or is anybody aware of another opensource decoder that would be more adapted for my case?
Thanks a lot to anyone that took the time to read my longexplanations, anyway :-)
Fabien



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] help for reusing moses-chart in a different (dependency tree - to - string) MT system

Reply via email to