Re: [Moses-support] phrase extraction step

Philipp Koehn Thu, 28 Mar 2013 06:31:35 -0700

Hi,

> I'm hereby attaching a file. I got it when executed 5th step.
> I don't why phrase table,extract.sorted.gz etc. files are not extracted.
> please help me.

How do the input files to the extract step look like. Is the
word alignment file correct and has the same number of
lines as the others?

Do you have any forbidden characters (especially "|") in your
data that may cause problems?

You can run each step in isolation by running the train-model.perl
with specifying the --first-step and --last-step switches.
The numbers of the steps are listed here:
http://www.statmt.org/moses/?n=FactoredTraining.HomePage

A common mistake is to forget to clean the parallel corpus
(throw out long sentences or length-mismatched sentence pairs)
which causes faulty word alignment which then causes
phrase extraction to fail.

> And also I want to know about tokenization step.
> In tokenization step, rather than dividing a sentence into tokens, will any
> extra
> processing is done?

A typical additional step is lowercasing or truecasing, which
normalizes words that occur at the beginning at the sentence ("The")
or in all caps ("THE") to a common form ("the").

-phi

On Thu, Mar 28, 2013 at 6:14 AM, Nikhila Achukatla
<[email protected]> wrote:
> Hi,
>

>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] phrase extraction step

Reply via email to