You can give a tagged corpus to the EMS, using the format: *word1|POS1 word2|POS2 word3|POS3 * *I think you have to set the variable factorized-stem = [filePath] * *instead of raw-stem = [filePath] * *However, when you give the EMS raw-stem, it will tokenize, escape special characters, and clean the corpus, before word alignment. If you give the EMS factorized-stem, it will assume that the data is already tokenized, escaped and cleaned. You must make sure that is the case.
* *Also, you must make sure the input sentences you give to the decoder is tokenized and escape using the same method as you gold standard data. * * * * * * * On 22 April 2013 06:15, jayendra rakesh <[email protected]> wrote: > Hi, > > I have a gold POS tagged parallel corpus available for usage, which is the > format > > *word1/POS1 word2/POS2 word3/POS3* > > Is there a way to use the gold corpus directly (and in what specific > format should it be used ) from the EMS config file instead of writing > intermediate factor generation scripts. > > Also is it possible to add morphological analysis as factors alongside to > the POS tagged corpus, directly to the corpus ? > > -- > - Jayendra Rakesh. > BTech CSD. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
