Hi, > I'm hereby attaching a file. I got it when executed 5th step. > I don't why phrase table,extract.sorted.gz etc. files are not extracted. > please help me.
How do the input files to the extract step look like. Is the word alignment file correct and has the same number of lines as the others? Do you have any forbidden characters (especially "|") in your data that may cause problems? You can run each step in isolation by running the train-model.perl with specifying the --first-step and --last-step switches. The numbers of the steps are listed here: http://www.statmt.org/moses/?n=FactoredTraining.HomePage A common mistake is to forget to clean the parallel corpus (throw out long sentences or length-mismatched sentence pairs) which causes faulty word alignment which then causes phrase extraction to fail. > And also I want to know about tokenization step. > In tokenization step, rather than dividing a sentence into tokens, will any > extra > processing is done? A typical additional step is lowercasing or truecasing, which normalizes words that occur at the beginning at the sentence ("The") or in all caps ("THE") to a common form ("the"). -phi On Thu, Mar 28, 2013 at 6:14 AM, Nikhila Achukatla <[email protected]> wrote: > Hi, > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
