For basic language tools for Persian: check http://stp.lingfil.uu.se/~mojgan/
Jörg On Thu, Apr 18, 2013 at 12:38 PM, amin farajian <[email protected]>wrote: > Dear Wang, > > Here are the links to the publicly available Persian-English corpora: > > - TEP: Tehran English-Persian parallel corpus, built on subtitles. It > is free and you can find it here: download > link<http://opus.lingfil.uu.se/download.php?f=OpenSubtitles2011/xml/en-fa.xml.gz> > - ELRA-W0051, generic domain. to obtain this corpus take a look at > this link <http://catalog.elra.info/product_info.php?products_id=1111>. > - PEN: Parallel English-Persian News corpus, which is a small corpus > built on news stories. It is not publicly available yet, but I am going to > release it soon. (link to the > paper<http://world-comp.org/p2011/ICA4953.pdf> > ) > > For tokenization you can use every tokenizer available, such as the moses > tokenizer. > > > If you have more questions, feel free to ask. > > > Regards, > Amin > > > > On 04/18/2013 10:45 AM, Wang, JinPeng(AWF) wrote: > > Hi, everyone**** > > ** ** > > Have you got any Persian and English parallel text or related > corpus links? And how to tokenize the Persian language?**** > > ** ** > > Thanks**** > > Regards**** > > ** ** > > Wang, JinPeng(AWF)**** > > eBay, Inc.**** > > Stubhub**** > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- ********************************************************************************** Jörg Tiedemann http://stp.lingfil.uu.se/~joerg/
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
