You can give a tagged corpus to the EMS, using the format:
  *word1|POS1 word2|POS2 word3|POS3
*
*I think you have to set the variable
  factorized-stem = [filePath]
*
*instead of
  raw-stem = [filePath]
*
*However, when you give the EMS raw-stem, it will tokenize, escape special
characters, and clean the corpus, before word alignment. If you give the
EMS factorized-stem, it will assume that the data is already tokenized,
escaped and cleaned. You must make sure that is the case.

*
*Also, you must make sure the input sentences you give to the decoder is
tokenized and escape using the same method as you gold standard data.


*
*
*
*
*
*
*


On 22 April 2013 06:15, jayendra rakesh <[email protected]> wrote:

> Hi,
>
> I have a gold POS tagged parallel corpus available for usage, which is the
> format
>
> *word1/POS1 word2/POS2 word3/POS3*
>
> Is there a way to use the gold corpus directly (and in what specific
> format should it be used ) from the EMS config file instead of writing
> intermediate factor generation scripts.
>
> Also is it possible to add morphological analysis as factors alongside to
> the POS tagged corpus, directly to the corpus ?
>
> --
> - Jayendra Rakesh.
>    BTech CSD.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to