Hi, concerning the building of a Japanese-English translation engine, I have some doubts about the handling of Japanese numbers and digits, notably the portability of the placeholder approach. It seems that Japanese has several ways of expressing numbers, and both detection and conversion (to English number representation) are not straightforward. This makes the placeholder approach somehow more involved. Did you experience any problems when translating numbers in Japanese-English machine translation? Do you think it is feasible to adapt the placeholder approach?
Best regards, Vito Mandorino 2016-03-29 13:59 GMT+02:00 Vito Mandorino <[email protected] >: > Thank you all for your comments and ressources. I will go through them and > let you know if I stumble upon something interesting during the process. > > Vito Mandorino > > 2016-03-28 9:47 GMT+02:00 Graham Neubig <[email protected]>: > >> Hello Vishal, >> >> Yes, that's what pre-ordering means. Specifically it means re-ordering >> the source side. >> >> Graham >> >> On Mon, Mar 28, 2016 at 2:16 PM, Vishal Goyal(विशाल गोयल) < >> [email protected]> wrote: >> >>> Dear Graham, >>> Greetings. >>> Please clarify that Pre-Ordering in your reply means, that making the >>> word order of both Source Language Sentences and Target Language Sentences >>> similar in Source-Target Language Pair before going for training so that it >>> becomes similar to the scenario of closely related language Pair. >>> >>> On Sat, Mar 26, 2016 at 10:57 AM, Graham Neubig <[email protected]> >>> wrote: >>> >>>> Hi Vito, >>>> >>>> English-Japanese and Japanese-English translation are very difficult >>>> due to the grammatical differences between the languages. >>>> >>>> You have a couple options to overcome this problem: >>>> 1) If you want to use phrase-based Moses, you will have to perform some >>>> variety of pre-ordering, in which you rearrange the words in the source >>>> sentence before training/testing. >>>> 2) You can use a syntax-based system, either using the functionality in >>>> Moses (http://www.statmt.org/moses/?n=Moses.SyntaxTutorial), or using >>>> another decoder specifically designed for syntax-based MT such as my >>>> Travatar decoder (http://www.phontron.com/travatar/). I have released >>>> the setup for training our strongest Japanese-English and English-Japanese >>>> systems here: https://github.com/neubig/wat2014 >>>> >>>> Regarding the different types of characters, I would leave them as-is. >>>> It is possible to perform normalization, which will help in a limited >>>> number of cases, but if you're just starting out this is really the least >>>> of your problems. >>>> >>>> Graham >>>> >>>> >>>> On Fri, Mar 25, 2016 at 7:51 PM, Vito Mandorino < >>>> [email protected]> wrote: >>>> >>>>> Dear all, >>>>> >>>>> does anyone have ever done experiments for English-Japanese and >>>>> Japanese-English translation? Do you know about useful ressources for this >>>>> language pair, or some specific gotchas one should be aware of? >>>>> >>>>> More specifically, what is the best policy for dealing with alphabets? >>>>> Do you think it is a good idea to keep different alphabets (Kanji, >>>>> Hiragana, Katakana, ...) in the corpus, or should one try to convert Kanji >>>>> into one of the other alphabets? >>>>> >>>>> Best regards, >>>>> >>>>> Vito Mandorino >>>>> >>>>> -- >>>>> *M**. Vito MANDORINO -- Chief Scientist* >>>>> >>>>> >>>>> [image: Description : Description : lingua_custodia_final full logo] >>>>> >>>>> *The Translation Trustee* >>>>> >>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >>>>> >>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >>>>> <%2B33%206%2084%2065%2068%2089>* >>>>> >>>>> *Email :* *[email protected] >>>>> <[email protected]>* >>>>> >>>>> *Website :* >>>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>* >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >>> >>> -- >>> *Regards,* >>> Vishal Goyal, >>> Ph.D., M.Tech., MCA, M.C.S.D. >>> Associate Professor(Stage IV), >>> Department of Computer Science, >>> Punjabi University Patiala-147002. >>> >>> >>> *Machine Translation Systems:* >>> [*Online Hindi to Punjabi Machine Translation Tool -* >>> http://h2p.learnpunjabi.org ] >>> [*Statistical Approach Based Hindi to Punjabi Machine Translation >>> System * >>> - http://statmt.org/~vishal/hp/index.cgi >>> - http://tdil-dc.in/hi2pu/index.cgi >>> ] >>> *Online Journal: [Research Cell: An International Journal of Engineering >>> Sciences, http://ijoes.vidyapublications.com >>> <http://ijoes.vidyapublications.com>]* >>> *Book: A Simplified Approach to Data Structures, Shroff Publications and >>> Distributors* >>> http://www.shroffpublishers.com/detail.aspx?title=6163 >>> >> >> > > > -- > *M**. Vito MANDORINO -- Chief Scientist* > > > [image: Description : Description : lingua_custodia_final full logo] > > *The Translation Trustee* > > *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* > > *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 > <%2B33%206%2084%2065%2068%2089>* > > *Email :* *[email protected] > <[email protected]>* > > *Website :* > *www.linguacustodia.finance <http://www.linguacustodia.com/>* > -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89* *Email :* *[email protected] <[email protected]>* *Website :* *www.linguacustodia.finance <http://www.linguacustodia.com/>*
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
