Re: [off topic] A new project - automatic translation

Uri Even-Chen Sun, 13 Nov 2005 10:05:30 -0800

I'm replying to a few people together.

Orna Agmon wrote:

It is not so much a question of implementation, free or not, website or
program, but of algorithms. Before you run to implement, you need to
research the subject.  Many researchers already are researcing it,
and as you can see, the results are far from perfect, since this is a
problem of natural languages.


Yes, I'm aware of it.

Nadav Har'El wrote:

The idea of machine translation is obviously a good one, and it would be
even better if we had one that was free, both in price and in freedom to
inspect and to improve the code.

BUT, between saying that it's a good idea, and actually being able to
implement it, there's a VERY VERY LONG road.


I agree.

Machine translation has, or at least good one, has long been an open question
in AI research. Better and better algorithms are appearing, and your first
course of action should probably to read up on known approaches (in linguistic
journals, books, university courses, etc.). What will very likely NOT WORK
is any naive approach, including the one which you seem to imply above (some
sort of simplistic "machine learning" approach) - people already tried these
simplistic approaches, long ago, and they just didn't work.


I want to have both an algorithm and a database of languages (words,
phrases etc) that will improve over time.  That is, start with a simple
algorithm, and feed data into it.  The data will be sources and
translations in any language.  When there is enough data for a given
pair of languages, the software will be able to try to translate.
People will correct & improve the translations and feed them back into
the system.  This will improve the quality of translation for the given
languages.  In addition, the algorithms will be improved over time.

All feedback & improvements will be done by volunteers, in the spirit of
Wikipedia and similar projects.  Using the website for all users will be
free of charge.

When we started Hspell (http://ivrix.org.il/projects/spell-checker/), we
envisioned it as the first step toward more sophisticated linguistic
applications, including machine translation. But it was only the first step,
in the journey of many miles :(


I want to consider using existing databases, which are free to use, such
as your Hspell project or Wikipedia - to feed initial data into the
system.  The main goal is to have (for each pair of languages) a list of
translations of words, phrases and maybe even sentences.  Then, the
algorithm will just do "search and replace" - for every word, phrase or
sentence it will replace it with its equivalent in the target languages.
I think it's quite a simple algorithm to start with.  And then it will
be improved in the future.  (Even Linux was not written in one day!).

My estimate (based on nothing but pure guesswork) is that you can get something
sort-of-working in 5 person-years. This is about 10 times more work than went
into Hspell so far... But then again, I'm not a translation expert (or even
a novice) and maybe I'm grossly underestimating the complexity involved.


I hope that for writing the first version (alpha), it will require less
than one person-year.  Not including feeding the data into the system.

I also suggest you take a look at http://www.mila.cs.technion.ac.il/,
which is the Knowledge Center for Processing Hebrew. This is a cooperation
of people from the Academia who work in the field of Computational
Lingustics, in Hebrew, and they finally started to cooperate in building
basic building blocks that are necessary to advance Hebrew linguistic
research. These building blocks will be released as free software, and
they include (or will include) a morphological analyzer (similar in purpose
to Hspell), worse-sense disambiguators, tagged texts, grammar analyzers,
and so on. I assume that they are interested as well to advance their
toolset to the point that they will also have translation tools, text
understanding and generation tools, and so on. But they are also quite
far from this goal.


I want the first algorithm (alpha) to be independent of language - it
should work for any pair of languages.  Of course I want to support
Hebrew, but many other languages too.

I am not sure I understand the "quality of translation will improve over time"
thing. What makes you think that a translator, whether computerized or even
human, can learn to improve his translation capabilities meerly by translating
more texts (without any feedback)? And even if there's feedback, how will it
be used? Are you aware of any papers on machine-translation that actually can
learn from past experience?


In order for the quality of translation to improve over time, there is
need for feedback.  People (who understand both languages) should
correct translations and feed them back into the system.  The algorithm
should remember the feedback and update the database.  The next time the
same sentence (or phrase, or word) is translated by the system, the
corrected translation will be used.  Of course, there is also need to
prevent abuse and incorrect translations and I have been thinking of it.
There are a few ways to distinguish between good and bad translations.
But of course, as long as it's a free system it will be vulnerable to
abuse.  I just want to minimize the abuse as much as possible.

Danny Lieberman wrote:

This is a ripe research area as Orna has pointed out, I might add that
it has been around for almost 30 years with no significant breakthroughs.


Many problems were not solved for many years until somebody solved them.
Also a free operating system and a free encyclopedia didn't exist
until somebody created them.  I think the idea of an automatic
translation system which is improved by many volunteers all over the
world has never been tried before.  The idea is similar to Wikipedia
(which also didn't exist until somebody created it).

Having just finished the first phase of translation of our main
project's Web site to french, german and Italian - I can vouch that
there is an enormous amount of quality translation resources available
online - go to http://www.proz.com/  that has leveled the  playing field
of pricing and translation service providers.

In other words, IMHO, you don't have a business proposition.


Currently, it's a free project without business intentions.  Much like
GNU and Linux were at the beginning.  In the future, if it succeeds, I'm
sure it will be possible to make money out of it (like all the companies
making money from Linux).

Best Regards,

Uri Even-Chen
Speedy Net
Raanana, Israel.

E-mail: [EMAIL PROTECTED]
Phone: +972-9-7715013
Website: www.uri.co.il
--------------------------------------------------------


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: [off topic] A new project - automatic translation

Reply via email to