Re: [Moses-support] Factorized corpus in EMS

Philipp Koehn Wed, 15 Jan 2014 06:16:51 -0800

Hi,

you are on the right track.


EMS is typically configured with pointers to raw data and scripts
for tokenization, factorization, truecasing, etc. But you can also
always specify fully prepared data.

So, for instance, instead of specifying raw data for the parallel corpus

raw-stem = $toy-data/my-corpus

you can specify factorized data

factorized-stem = $toy-data/my-corpus.factored

The setting names for the different data sets (parallel corpus,
LM training corpus, tune and test set) are in the example configuration
files or can be found in experiment.meta (which is the official
authority).

-phi

On Wed, Jan 15, 2014 at 8:51 AM, burak aydın <[email protected]> wrote:
> Hi all,
>
> In moses web page, there are instructions for factored training. What I want
> to to experiment is to give already factorized corpus. What will be the new
> factor definition and new Lm definition in the EMS config?
>
> When I try to replicate the original config.factor experiment, I observe
> that there are files created under corpus/ directory :
>
> factored.<expnum>.en
> factored.<expnum>.en.pos
>
> So in order to give factorized corpus as input, should I also prepare one or
> more of the files above and write it in the EMS config as well?
>
> Regards
> Burak
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Factorized corpus in EMS

Reply via email to