hi steven

please subscribe to the Moses mailing list before posting to it. You can subscribe here: http://mailman.mit.edu/mailman/listinfo/moses-support <http://mailman.mit.edu/mailman/listinfo/moses-support>

Have you tried using hierarchical model yet? This uses the same algorithms as the syntax model, without needing linguistic information. It also requires less CPU and memory to run than many syntax models.

Once you manage to run the hierachical model, then you can think about adding syntax.

You can set up the hierarchical training, tuning and evaluation ssetup by looking at the difference between these two EMS config files.
http://www.statmt.org/moses/RELEASE-2.1/models/de-en/config.pb.recase
http://www.statmt.org/moses/RELEASE-2.1/models/de-en/config.hiero.recase


-------- Forwarded Message --------
Subject:        Moses-support post from [email protected] requires approval
Date:   Mon, 24 Nov 2014 07:35:29 -0500
From:   [email protected]
To:     [email protected]



As list administrator, your authorization is requested for the
following mailing list posting:

    List:    [email protected]
    From:    [email protected]
    Subject: How to train a tree-based model?
    Reason:  Post by non-member to a members-only list

At your convenience, visit:

    http://mailman.mit.edu/mailman/admindb/moses-support
to approve or deny the request.



--- Begin Message ---
Hi,

I am trying to do English-Chinese translation.
I've build a factored model successfully.
However, I am not quite clear about how to build a tree-based model after
reading the tutorial.

What I have in hand:
1. English-Chinese parallel corpus with 3 factors (surface, lemma and POS).
2. English-Chinses parallel corpus parsed with Stanford-Parser, and
formatted as XMLs in MOSES format.
3. The training command for my factored model is shown below:

$MOSES_DIR/scripts/training/train-model.perl \
-mgiza -mgiza-cpus 20 \
--root-dir train \
--corpus $WORK_DIR/en-ch.clean \
--f en \
--e ch \
--alignment grow-diag-final-and \
--reordering msd-bidirectional-fe \
--lm 0:3:$LANG_MOD_DIR/en-ch-surface.arpa.ch:8 \
--lm 2:3:$LANG_MOD_DIR/en-ch-pos.arpa.ch:8 \
--translation-factors 1,2-1,2+0-0,2 \
--generation-factors 1,2-0+0,2-0 \
--reordering-factors 0,2-0,2 \
--decoding-steps t0,g0:t1,g1 \
--external-bin-dir $MOSES_DIR/tools > $WORK_DIR/training.out 2>&1


The question is:
1. Can I use all the 3 factors when training tree-based model? If yes, how
the parallel corpus should be like? The XML format shown in the MOSES
tutorial seems not able to accept factors except surface.
2. I want to use trees on both source and target side, is it correct to add
the following arguments to train-model.perl?

--ghkm \
--source-syntax \
--target-syntax \
--LeftBinarize \

3. I noticed that after using Stanford-Parser to generate trees for
parallel corpus, the resulted trees might be 1 to many (or many to 1) for a
particular sentence. e.g., the sentence of source language is parsed into a
single tree, while the target language sentence is parsed into 2 trees.
Will this break the "parallel" property of parallel corpus?




-- 
Best regards,
Steven Huang

--- End Message ---
--- Begin Message ---
If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message.  Do this if the message is
spam.  If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list.  The Approved: header can also appear in the first line
of the body of the reply.

--- End Message ---
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to