hi joo-young

the rules created by the moses extract program are the same as hiero but
the formatting is different. Its just to make it more flexible & easier
to convert on-disk, load-on-demand files. The drawback with the format
is that it's not very user-readable.

So the normal hiero rule:

   [X] ||| [X,1] trace ' ||| [X,1] 추적 ' ||| 0-0 ||| 0.727273 0.444625 1 
0.172348 2.718 

is formatted as

   [X][X] trace ' [X] ||| [X][X] 추적 ' [X] ||| 0-0 ||| 0.727273 0.444625 1 
0.172348 2.718 ||| 0.366667 0.266667

A syntax rule:

[NP] ||| all [NP,1] ||| 모든 [NP,1] ||| 0.869565 0.627907 0.645161 0.243243 2.718 

is formatted as

all [X][NP] [X] ||| 모든 [X][NP] [NP] ||| 1-1 ||| 0.869565 0.627907 0.645161 
0.243243 2.718 ||| 23 31

the key features are:
1. the non-terminal index in the hiero format is replaced with the
alignment information. eg. the 1 in [NP,1] becomes 1-1
2. each terminal has 2 symbols, [NN][NP]. The 1st is the source
constraint, for example, if you input is a parse tree the source span
the non-terminal covers must be an [NN]. The 2nd symbol [NP] is the
usual symbol used to label the node in the tree.
3. The left-hand-side symbol of the rule is the last symbol in the
string. Again, there is a source and target symbol & the source symbol
acts as a constraint. Ignoring the source constraint, the rewrite rule
[A] --> a b c [B,1] d , x[B,1] y z
is formatted as

   a b c [X][B] d [X] ||| x [X][B] y z [A] ||| 3-1

hope that makes some sense


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to