hi joo-young the rules created by the moses extract program are the same as hiero but the formatting is different. Its just to make it more flexible & easier to convert on-disk, load-on-demand files. The drawback with the format is that it's not very user-readable.
So the normal hiero rule: [X] ||| [X,1] trace ' ||| [X,1] 추적 ' ||| 0-0 ||| 0.727273 0.444625 1 0.172348 2.718 is formatted as [X][X] trace ' [X] ||| [X][X] 추적 ' [X] ||| 0-0 ||| 0.727273 0.444625 1 0.172348 2.718 ||| 0.366667 0.266667 A syntax rule: [NP] ||| all [NP,1] ||| 모든 [NP,1] ||| 0.869565 0.627907 0.645161 0.243243 2.718 is formatted as all [X][NP] [X] ||| 모든 [X][NP] [NP] ||| 1-1 ||| 0.869565 0.627907 0.645161 0.243243 2.718 ||| 23 31 the key features are: 1. the non-terminal index in the hiero format is replaced with the alignment information. eg. the 1 in [NP,1] becomes 1-1 2. each terminal has 2 symbols, [NN][NP]. The 1st is the source constraint, for example, if you input is a parse tree the source span the non-terminal covers must be an [NN]. The 2nd symbol [NP] is the usual symbol used to label the node in the tree. 3. The left-hand-side symbol of the rule is the last symbol in the string. Again, there is a source and target symbol & the source symbol acts as a constraint. Ignoring the source constraint, the rewrite rule [A] --> a b c [B,1] d , x[B,1] y z is formatted as a b c [X][B] d [X] ||| x [X][B] y z [A] ||| 3-1 hope that makes some sense _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
