[AI] Get Sampark, go multilingual

Renuka Warriar Sun, 03 Apr 2011 07:04:55 -0700

Date:03/04/2011 URL: 
http://www.thehindu.com/2011/04/03/stories/2011040354101000.htm


National

                  Get Sampark, go multilingual 

                                                              Swathi V 
          Basic version translates four Indian language pairs                   
                                                                       
                                                                                
                                                                      
                                                                                
                                                                      
                                                                                
                                                                      
                                                                                
                                                                      
     

Within a year, 14 other language pairs will be covered

Inbuilt Morphological Analyser and Parser do the translation

HYDERABAD: It is generally understood that translation from one language to 
another requires an adaptive human brain, and not the rule-based rigidity of
a machine. Even to a human being, it poses the trickiest of problems, and a 
successful translator jubilant over his product would have brought it only
closer to the original.

However, in a 'Robotesque' effort to infuse 'thought' into a gizmo, a 
consortium of 11 academic and research institutions across the country came 
together
to design the 'Sampark Machine Translation Systems for Indian Languages,' which 
was launched here on Wednesday by the former President, A.P.J. Abdul Kalam.

It was the most successful among the three machine translation systems released 
at the World Wide Web International (W3I) Conference, the others being AnglaMT
and Anuvadaksh providing translation from English to Indian languages.

Conceived to deliver translation in 18 Indian language pairs, Sampark is ready 
in its basic version for four among them - Punjabi to Hindi, Hindi to Punjabi,
Urdu to Hindi and Telugu to Tamil.

Within a year, translational capabilities in 14 other bi-directional language 
pairs too will be launched. These include Tamil-Hindi, Telugu-Hindi, Hindi-Urdu,
Kannada-Hindi, Punjabi-Hindi, Marathi-Hindi, Bengali-Hindi, Tamil-Telugu and 
Malayalam-Tamil, said Rajeev Sangal, Director of the International Institute
of Information Technology, Hyderabad, which was part of the consortium. The 
project was executed under the Technology Development for Indian Languages
(TDIL) Programme of the Department of Information Technology.

The programme is aimed at multiplying web content in Indian languages and 
improving Internet usage among these language speakers. In short, Sampark is a
web application that translates content available in one Indian language into 
another. It can offer better quality in translation if the input text conforms
to standard language, say the developers. To address the syntactic differences 
of grammar in various scripts, Computational Paninian Grammar is used as
the unifying logical framework, Professor Sangal said.

"To begin with, large chunks of data are taken, and each word is tagged with 
the respective part of speech to enable the machine to learn. Then, the machine
is fed with data to allow it to tag the words on its own. The work is then 
analysed to discover conflict areas and address them," Rahmat Yousufzai, the
IIIT-H professor who spearheaded the Urdu-Hindi team, said, explaining the 
'machine learning' process.

Understanding the meaning, performing a dictionary look-up and structure 
transfer will be the components of the machine translation towards generating 
the
target language output.

As soon as the text is fed, the in-built Morphological Analyser begins 
identifying the verb in each sentence, and the Parser uses Paninian grammar 
rules
to zero in on the kind of nouns it can support and arrive at the apt one. Long 
names such as those of institutions (e.g. University of Hyderabad), are
made out to be proper nouns through recognition of repeated collocations. All 
unidentified words are considered proper nouns and transliterated. However,
literature is a big no-no for translation on this system, as it cannot identify 
metaphors.

"We are at present focussing only on comprehensibility and not fluency. So, 
there may be errors of grammar at times. We hope to bring in future improvements
based on user feedback," Professor Sangal said.

AnglaMT System translates from English to Bengali, Malayalam, Punjabi and Urdu, 
while Anuvadaksh does it from English to Hindi, Bengali, Marathi, Oriya,
Urdu and Tamil. The other institutions involved in the development of Sampark 
include IIT-Bombay and Kharagpur, C-DAC, Noida and Pune, the University of
Hyderabad, Jadavpur University, Anna University-KBC Research Centre, Tamil 
University, IIIT-Allahabad, and IISc-Bangalore.

In all, 200 researchers worked on the project, which began in 2006. The three 
systems are available on www.tdil-dc.in.
To unsubscribe send a message to
[email protected]
with the subject unsubscribe.

To change your subscription to digest mode or make any other changes, please 
visit the list home page at
http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in

[AI] Get Sampark, go multilingual

Reply via email to