On 22 March 2013 21:58, Bernard Chardonneau <[email protected]> wrote: > It is not an answer to sphinx, but rather an occasion to speak about > several GSOC question I found strangely formulated. >
I find some of the questions a little strange too, but... not the ones you've chosen. > >> Date: Wed, 20 Mar 2013 22:09:51 +0800 >> From: sphinx jiang <[email protected]> >> To: [email protected], [email protected] >> Reply-To: [email protected] >> Subject: Re: [Apertium-stuff] Idea for GSOC >> >> Dear Fran: >> >> I have been going through the "HOWTO", and the papers about translation >> project. >> And for the questions: >> >> (a) Are there existing machine translation (MT) systems for this pair? >> >> *The package of it don't contain MT.* >> https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-zh_CN-zh_TW/ >>> * >> * >> * >> * >> *But if you mean the MT outside apertium, yes, google can translate this >> pair.* > > The question should be "Are there existing machine translation (MT) systems > MAKING DIRECT TRANSLATION for this pair?" > > According to Google, it may translate zh_CN to English and then English to > zh_TW (and the same on the other side). So, if Chinesse and English are not > close languages, the result may be strange or in any case degraded. > First of all... That Google use English as an interlingua is undeniable, but not to the extent that you suggest. They definitely use triangulation of phrase tables to 'fill in the blanks' - they specifically mention it in this paper (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/33430.pdf), which outlines a method for finding the best language to use as a bridge. Now, just because they've written a research paper about it does not necessarily mean that they _use_ that method, but it seems much more likely to me than using English as an interlingua: for one thing, Franz Och (who leads Google Translate) is a co-author; for another, running two translation instances - which would need to be coordinated across data centres at Google's scale - is simply less efficient than running one. Consider the following: * Open source translations (.po files, etc.) will quite often include partly translated strings, with a mix of the correct language, and English. * Even with a method to find better bridge languages, the vast majority of existing translations on the internet involve English, so it will be dominant. * Language models (which influence which potential translation is selected) for any other language are highly likely to be contaminated with English, *particularly* when they are based on data gathered from the internet (count how many times you need to reload http://fr.wikipedia.org/wiki/Sp%C3%A9cial:Page_au_hasard before you see something in English). That said... The answer mentions Google, not the question. The question prompts basic research into the language pair - the kind of information the student will need to mention in their proposal. >> >> (b) If there are existing systems, how good are they? -- Could you do >> better in three months? >> >> *Sorry, I can't overcome the engineers in Google in 3 months.* >> > > The question should exclude commercial products. > > So, the question a) about existing systems should be > "Are there FREE SOFTWARE existing machine translation (MT) systems ..." > > For instance, there are several French <-> English commercial translators. > > But for me, the fact a French entreprise sells its translator for few > dozens of euro in Windows version, but for thousands of dollars in a > UNIX based version is a good enough reason to develop a free software > alternative. Um... do we need to have the 'free as in freedom, not free as in beer' conversation? :) (Price is not a concern. The freedom to modify and share is.) Again, the answer mentions Google, not the question. And this is a follow-on from the first part of the question. Investigating the state of existing machine translation systems is a good way for a student to see what can be done, and what remains to be done. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
