Date:03/04/2011 URL:
http://www.thehindu.com/2011/04/03/stories/2011040354101000.htm
National
Get Sampark, go multilingual
Swathi V
Basic version translates four Indian language pairs
Within a year, 14 other language pairs will be covered
Inbuilt Morphological Analyser and Parser do the translation
HYDERABAD: It is generally understood that translation from one language to
another requires an adaptive human brain, and not the rule-based rigidity of
a machine. Even to a human being, it poses the trickiest of problems, and a
successful translator jubilant over his product would have brought it only
closer to the original.
However, in a 'Robotesque' effort to infuse 'thought' into a gizmo, a
consortium of 11 academic and research institutions across the country came
together
to design the 'Sampark Machine Translation Systems for Indian Languages,' which
was launched here on Wednesday by the former President, A.P.J. Abdul Kalam.
It was the most successful among the three machine translation systems released
at the World Wide Web International (W3I) Conference, the others being AnglaMT
and Anuvadaksh providing translation from English to Indian languages.
Conceived to deliver translation in 18 Indian language pairs, Sampark is ready
in its basic version for four among them - Punjabi to Hindi, Hindi to Punjabi,
Urdu to Hindi and Telugu to Tamil.
Within a year, translational capabilities in 14 other bi-directional language
pairs too will be launched. These include Tamil-Hindi, Telugu-Hindi, Hindi-Urdu,
Kannada-Hindi, Punjabi-Hindi, Marathi-Hindi, Bengali-Hindi, Tamil-Telugu and
Malayalam-Tamil, said Rajeev Sangal, Director of the International Institute
of Information Technology, Hyderabad, which was part of the consortium. The
project was executed under the Technology Development for Indian Languages
(TDIL) Programme of the Department of Information Technology.
The programme is aimed at multiplying web content in Indian languages and
improving Internet usage among these language speakers. In short, Sampark is a
web application that translates content available in one Indian language into
another. It can offer better quality in translation if the input text conforms
to standard language, say the developers. To address the syntactic differences
of grammar in various scripts, Computational Paninian Grammar is used as
the unifying logical framework, Professor Sangal said.
"To begin with, large chunks of data are taken, and each word is tagged with
the respective part of speech to enable the machine to learn. Then, the machine
is fed with data to allow it to tag the words on its own. The work is then
analysed to discover conflict areas and address them," Rahmat Yousufzai, the
IIIT-H professor who spearheaded the Urdu-Hindi team, said, explaining the
'machine learning' process.
Understanding the meaning, performing a dictionary look-up and structure
transfer will be the components of the machine translation towards generating
the
target language output.
As soon as the text is fed, the in-built Morphological Analyser begins
identifying the verb in each sentence, and the Parser uses Paninian grammar
rules
to zero in on the kind of nouns it can support and arrive at the apt one. Long
names such as those of institutions (e.g. University of Hyderabad), are
made out to be proper nouns through recognition of repeated collocations. All
unidentified words are considered proper nouns and transliterated. However,
literature is a big no-no for translation on this system, as it cannot identify
metaphors.
"We are at present focussing only on comprehensibility and not fluency. So,
there may be errors of grammar at times. We hope to bring in future improvements
based on user feedback," Professor Sangal said.
AnglaMT System translates from English to Bengali, Malayalam, Punjabi and Urdu,
while Anuvadaksh does it from English to Hindi, Bengali, Marathi, Oriya,
Urdu and Tamil. The other institutions involved in the development of Sampark
include IIT-Bombay and Kharagpur, C-DAC, Noida and Pune, the University of
Hyderabad, Jadavpur University, Anna University-KBC Research Centre, Tamil
University, IIIT-Allahabad, and IISc-Bangalore.
In all, 200 researchers worked on the project, which began in 2006. The three
systems are available on www.tdil-dc.in.
To unsubscribe send a message to
[email protected]
with the subject unsubscribe.
To change your subscription to digest mode or make any other changes, please
visit the list home page at
http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in