Hi, you are on the right track.
EMS is typically configured with pointers to raw data and scripts for tokenization, factorization, truecasing, etc. But you can also always specify fully prepared data. So, for instance, instead of specifying raw data for the parallel corpus raw-stem = $toy-data/my-corpus you can specify factorized data factorized-stem = $toy-data/my-corpus.factored The setting names for the different data sets (parallel corpus, LM training corpus, tune and test set) are in the example configuration files or can be found in experiment.meta (which is the official authority). -phi On Wed, Jan 15, 2014 at 8:51 AM, burak aydın <[email protected]> wrote: > Hi all, > > In moses web page, there are instructions for factored training. What I want > to to experiment is to give already factorized corpus. What will be the new > factor definition and new Lm definition in the EMS config? > > When I try to replicate the original config.factor experiment, I observe > that there are files created under corpus/ directory : > > factored.<expnum>.en > factored.<expnum>.en.pos > > So in order to give factorized corpus as input, should I also prepare one or > more of the files above and write it in the EMS config as well? > > Regards > Burak > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
