Hi Sonja, > I will also repeat my other two questions in case someone could answer them: > >>> Second, since my data is already tokenised, parsed, factorised and >>> lowercased, how can I tell EMS to skip those steps and, if possible, >>> evaluate the result without truecasing, detokenising and wrapping?
Typically in the config file, whenever you specify a corpus you can specify that it's already been tokenised, etc. For example, in the [CORPUS] section, instead of giving your training data file as the value of 'raw-stem', you can list it as 'tokenized-stem' (already been tokenised), 'lowercased-stem' (already been tokenised, filtered and lowercased), etc. You may need to familiarise yourself with the experiment.meta file to work out exactly what your options are. Best, Suzy -- Suzy Howlett http://www.showlett.id.au/ _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
