Thank you all for your comments and ressources. I will go through them and let you know if I stumble upon something interesting during the process.
Vito Mandorino 2016-03-28 9:47 GMT+02:00 Graham Neubig <[email protected]>: > Hello Vishal, > > Yes, that's what pre-ordering means. Specifically it means re-ordering the > source side. > > Graham > > On Mon, Mar 28, 2016 at 2:16 PM, Vishal Goyal(विशाल गोयल) < > [email protected]> wrote: > >> Dear Graham, >> Greetings. >> Please clarify that Pre-Ordering in your reply means, that making the >> word order of both Source Language Sentences and Target Language Sentences >> similar in Source-Target Language Pair before going for training so that it >> becomes similar to the scenario of closely related language Pair. >> >> On Sat, Mar 26, 2016 at 10:57 AM, Graham Neubig <[email protected]> >> wrote: >> >>> Hi Vito, >>> >>> English-Japanese and Japanese-English translation are very difficult due >>> to the grammatical differences between the languages. >>> >>> You have a couple options to overcome this problem: >>> 1) If you want to use phrase-based Moses, you will have to perform some >>> variety of pre-ordering, in which you rearrange the words in the source >>> sentence before training/testing. >>> 2) You can use a syntax-based system, either using the functionality in >>> Moses (http://www.statmt.org/moses/?n=Moses.SyntaxTutorial), or using >>> another decoder specifically designed for syntax-based MT such as my >>> Travatar decoder (http://www.phontron.com/travatar/). I have released >>> the setup for training our strongest Japanese-English and English-Japanese >>> systems here: https://github.com/neubig/wat2014 >>> >>> Regarding the different types of characters, I would leave them as-is. >>> It is possible to perform normalization, which will help in a limited >>> number of cases, but if you're just starting out this is really the least >>> of your problems. >>> >>> Graham >>> >>> >>> On Fri, Mar 25, 2016 at 7:51 PM, Vito Mandorino < >>> [email protected]> wrote: >>> >>>> Dear all, >>>> >>>> does anyone have ever done experiments for English-Japanese and >>>> Japanese-English translation? Do you know about useful ressources for this >>>> language pair, or some specific gotchas one should be aware of? >>>> >>>> More specifically, what is the best policy for dealing with alphabets? >>>> Do you think it is a good idea to keep different alphabets (Kanji, >>>> Hiragana, Katakana, ...) in the corpus, or should one try to convert Kanji >>>> into one of the other alphabets? >>>> >>>> Best regards, >>>> >>>> Vito Mandorino >>>> >>>> -- >>>> *M**. Vito MANDORINO -- Chief Scientist* >>>> >>>> >>>> [image: Description : Description : lingua_custodia_final full logo] >>>> >>>> *The Translation Trustee* >>>> >>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >>>> >>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >>>> <%2B33%206%2084%2065%2068%2089>* >>>> >>>> *Email :* *[email protected] >>>> <[email protected]>* >>>> >>>> *Website :* >>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>* >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> >> -- >> *Regards,* >> Vishal Goyal, >> Ph.D., M.Tech., MCA, M.C.S.D. >> Associate Professor(Stage IV), >> Department of Computer Science, >> Punjabi University Patiala-147002. >> >> >> *Machine Translation Systems:* >> [*Online Hindi to Punjabi Machine Translation Tool -* >> http://h2p.learnpunjabi.org ] >> [*Statistical Approach Based Hindi to Punjabi Machine Translation >> System * >> - http://statmt.org/~vishal/hp/index.cgi >> - http://tdil-dc.in/hi2pu/index.cgi >> ] >> *Online Journal: [Research Cell: An International Journal of Engineering >> Sciences, http://ijoes.vidyapublications.com >> <http://ijoes.vidyapublications.com>]* >> *Book: A Simplified Approach to Data Structures, Shroff Publications and >> Distributors* >> http://www.shroffpublishers.com/detail.aspx?title=6163 >> > > -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89* *Email :* *[email protected] <[email protected]>* *Website :* *www.linguacustodia.finance <http://www.linguacustodia.com/>*
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
