Dear Wang,

Here are the links to the publicly available Persian-English corpora:

   - TEP: Tehran English-Persian parallel corpus, built on subtitles. It is
   free and you can find it here: download
link<http://opus.lingfil.uu.se/download.php?f=OpenSubtitles2011/xml/en-fa.xml.gz>
    - ELRA-W0051, generic domain. to obtain this corpus take a look at this
   link <http://catalog.elra.info/product_info.php?products_id=1111>.
   - PEN: Parallel English-Persian News corpus, which is a small corpus
   built on news stories. It is not publicly available yet, but I am going to
   release it soon. (link to the paper<http://world-comp.org/p2011/ICA4953.pdf>
   )

For tokenization you can use every tokenizer available, such as the moses
tokenizer.


 If you have more questions, feel free to ask.


 Regards,
Amin


On 04/18/2013 10:45 AM, Wang, JinPeng(AWF) wrote:

 Hi, everyone****

** **

         Have you got any Persian and English parallel text or related
corpus links? And how to tokenize the Persian language?****

** **

Thanks****

Regards****

** **

Wang, JinPeng(AWF)****

eBay, Inc.****

Stubhub****


_______________________________________________
Moses-support mailing
[email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to