I appreciate the efforts of Mr. Prashant  and x.r.c.v.i. team for this 
comprehensive presentation. Good luck to the team.
Dr. Kalpana
----- Original Message ----- 
From: "Prashant Naik" <[EMAIL PROTECTED]>
To: <accessindia@accessindia.org.in>
Sent: Sunday, August 03, 2008 1:12 PM
Subject: [AI] White Paper: OCR Softwares for Indian languages


> Dear Access India Members,
>
>
>
> During the Daisy Forum of India meeting held in Mumbai on 11th and 12th
> April 2008, I was given the responsibility to find information on the 
> status
> of OCR Softwares for Indian languages.  So here I am presenting the 
> findings
> that I am able to research.  I have prepared a White Paper on it which I
> posted in the PDF format on the daisy forum of India's mailing list 3 days
> back.    But for benefit and awareness of others I am pasting content of 
> it
> below this message. This will also help those who had posted queries on A 
> I
> regarding this.
>
>
>
> White Paper: OCR Softwares for Indian languages
>
> Date: July 31st, 2008
>
> Introduction :
>
> OCR softwares are available for English and other foreign languages but 
> what
> is
>
> the status of OCR software availability for Indian languages?
>
> During the Daisy Forum of India meeting held in Mumbai on 11th and 12th
> April
>
> 2008, I was given the responsibility to find information on this. So here 
> I
> am
>
> presenting the findings that I am able to research.
>
>
>
> Definitions :
>
> OCR: - Optical character recognition, usually abbreviated to OCR, is the
>
> mechanical or electronic translation of images of handwritten, typewritten
> or
>
> printed text (usually captured by a scanner) into machine-editable text.
>
> OCR Software: - OCR Software converts paper documents into electronic 
> data,
>
> so that you can handle the information (electronic text) in your computer
> system.
>
> Indian Languages: - Indian Constitution recognizes Hindi in Devanāgarī
> script
>
> as the official language of the central government India the Constitution 
> of
> India
>
> recognizes 22 languages, spoken in different parts of the country,
>
> {All definitions source is "Wikipedia")
>
>
>
> Findings :
>
> As per the research on the web highlighted one workshop / seminar 
> organized
> by
>
> Rediff Centre for Indian Language Content Management
>
> On the theme of "Brainstorming Workshop on OCR for Indian Languages" on
>
> 16-17 March, 2007, at Hotel Regalis, Mysore.
>
> Reference. Link: http://www.isim.ac.in/RCILCM/index.htm
>
> Further research on Access India (mailing group for the blind) querying 
> more
> on
>
> this and contact with NAB Karnataka to get more info on this theme did not
> throw
>
> up anything significant.
>
>
>
> Visit by Mr. Venki, rediff.com Technical Head :
>
> During a meeting with Mr. Venki at the XRCVC in the month of June 2008,
>
> Some more information about the conference was secured. This was because
>
> Mr. Venki himself was a one of the members of the organizing team from
> rediff.
>
> He made the following observation. "Overall the conference was good.
> Speakers
>
> had shared new ideas on developing Indian OCR."
>
> However further following up with regard to this conference it seems no
>
> significant progress have been made thereafter.
>
>
>
> Chennai "Print Access" Seminar Findings"
>
> Our XRCVC team member Neha learned about many technological
>
> developments from the "Print Access" conference which was held at Chennai 
> on
>
> April 19th, 2008. She shared lot of information, contacts and links.
>
> E.g. Acharya website (http://acharya.iitm.ac.in) TTS translator in 22
> languages,
>
> Ravi TTS for Telgu, C-DAC softwares like Mantra, Shruti Drishti, Shrut
> Lekhan
>
> and very important lead on Indian OCR software developed by C-DAC Pune.
>
>
>
> Visit to C-DAC Pune :
>
> On May 14th and 15th, the XRCVC team visited C-DAC Pune. The visit was 
> very
>
> fruitful. A fully developed off-the-shelf product for Hindi-Devnagri 
> Indian
>
> language software named as CHITRANKAN developed by GIST Development
>
> Team, C-DAC, Pune, Maharashtra. They demonstrated the product. The result
>
> was very good. CHITRANKAN is commercially used by 2-3 organizations in
>
> Pune.
>
> Other C-DAC resources :
>
> OCR softwares in Hindi called CHITRANKAN, in Marathi called
>
> CHITRAKSHARIKA and in Malayalam called NAYANA.
>
>
>
> About NAYANA :
>
> Source: http://www.malayalamresourcecentre.org/Mrc/products/nayana.html
>
> NAYANA is a product that enables the user to convert printed Malayalam
>
> documents to editable computer files. This system is very simple to use 
> and
>
> requires no prior expertise.
>
> FEATURES
>
> - NAYANA processes all types of printed Malayalam Documents.
>
> - Supports TIFF and BMP image formats.
>
> - Supports document Images with resolution 300 dpi and above.
>
> - Detection and correction of document skew of -5o to +5o.
>
> - The output document can be stored in both ISCII and ISFOC form.
>
> - The output document can be saved as TXT, RTF, HTML or ACI file formats.
>
> - User friendly interface.
>
> - Recognition speed of 50 char /sec.
>
> - Conversion of printed documents to editable text.
>
> - Optical Character Recognition combined with Text–To–Speech technology 
> can
>
> be used for text reading system.
>
> EXPANDABILITY
>
> - A layout analyzer can be added to the system to reproduce the input
> document
>
> in its original layout.
>
> - Can be expanded to cater to hand writ ten and old document.
>
> The linguistic resource generation tools such as Prabandhika and 
> Vishleshika
>
> Source: http://delnet.nic.in/news-naclin-report.htm
>
>
>
> About CHITRANKAN :
>
> Source: http://www.cdac.in/html/gist/products/chitra.asp
>
> CHITRANKAN - the first OCR (Optical Character Recognition) system for 
> Indian
>
> Languages.
>
> The OCR process involves:
>
> • Conversion of printed matter into an electronic image - the printed 
> matter
> can
>
> be converted into an image using Scanner or a Digital Camera
>
> • Electronic Image Processing - this involves identifying text information
> by
>
> analyzing the image for noise and skew. Once text information is available
>
> another algorithm reads and recognizes the printed matter
>
> • Storing the extracted text information as an electronic data: the
> recognized
>
> input is converted to a standard format, which can be opened in any word
>
> processing application, facilitating the user to edit the text data.
>
> Chitrankan archives Indian Language content in electronic form through 
> OCR.
> It
>
> enables the user to take a book, magazine or printed text in an Indian
> Language,
>
> feed it directly into an electronic computer file, and edit the file using 
> a
> word
>
> processor. Once the data is in the form of electronic text it can be
> searched,
>
> sorted and indexed.
>
> Chitrankan saves the user the effort of typing an entire document.
>
> Chitrankan scans a document to screen by recognizing the text and other
> images
>
> as objects. These scanned images are flawless and can be stored or printed
> time
>
> and again.
>
> Exceedingly user-friendly with features that can edit, move, resize or
> duplicate
>
> the scanned document, Chitrankan also provides a spell check facility.
>
> The potential of Chitrankan is enormous as it enables users to harness the
> power
>
> of computers to access printed documents in Indian Languages.
>
> Software Advantage:
>
> •Recognizes Hindi and Marathi languages along with Embedded English Text.
>
> •Skew detection and correction for input image upto ± 15°
>
> •Grabs images directly from the scanner for processing
>
> •Automatic Text and Picture region detection
>
> •Supports all TWAIN compatible scanners and digital cameras
>
> •Supports 256 grayscale/color, .bmp/.tiff images scanned at 300 dpi as 
> input
>
> image for recognition
>
> •Ideal for font sizes between 10 pt. and 36 pt, and all popular fonts.
>
> •Saves scanned/modified images as .BMP files
>
> •Saves recognized text in ISCII format or exporting as .RTF for editing
> using
>
> GIST range of software
>
> •Uses advanced DSP (Digital Signal Processing) algorithms to remove 
> "Noise"
>
> and "Back Page Reflection"
>
> •Enables printing both - the input image as well as the recognized text.
>
> •Provided with inbuilt Flip, Rotate and Negate options for Input Image
>
> User Advantage:
>
> •Allows deletion of associated pictures from the image by using the ERASE
>
> option
>
> •Provides painting tools to join the breaks in the characters to get good
> results
>
> •Allows OCR to be applied on an image rotated by 180° or flipped
>
> •Applies OCR to image having text in reverse by using INVERT option
>
> •Provides inbuilt spell checking facility
>
> •Provides editing tools like cut, copy, paste, find and replace options 
> for
> use on
>
> recognized text
>
> System Requirements:
>
> •Minimum Configuration:
>
> Pentium II with 64 MB RAM
>
> Virtual Memory requirement 300 MB (Swap File Space in Hard Disk)
>
> •Recommended Configuration:
>
> Pentium III with 128 MB RAM and above
>
> Virtual Memory requirement 400 MB
>
> •Operating Systems Supported:
>
> Window NT ver. 4.0, Service Pack 6.0 and above/ Windows 9X and above,
>
> Windows 2000 and Windows XP.
>
> Price: CHITRANKAN Single user license for CHITRANKAN Rs. 10,000/-
>
>
>
> Contacts: channel partner list URL -
> http://www.cdac.in/html/gist/ch_part.asp
>
> CHITRANKAN demo can be downloaded from
>
> http://www.cdac.in/html/gist/down/chtri_d.asp
>
> File Size: 45 MB
>
> Experimenting with CHITRANKAN at the XRCVC – findings :
>
> At the XRCVC demo of CHITRANKAN was installed and put through tests.
>
> The Rajyasabha website webpage were used for testing Hindi-Devnagri script
>
> which uses Yogesh font typeset. Its accuracy can be described as good
>
> approximately 70%. This can be improved by using font training mode.
>
> Additional documents in Hindi and Marathi were tested. Results from those
> were
>
> fair amount approximately 40% accuracy level. The font training module
>
> however can increase the accuracy.
>
> The software supports the Yogesh Hindi font by default. Mare fonts can be
>
> added on by training OCR using font recognition module.
>
> Font training modules enable the user to train the software to decipher
>
> documents in particular fonts. To make the software even more useful,
>
> CHITRANKAN incorporates a set of application program interfaces (APIs) 
> which
>
> allow software developers the flexibility to build features from 
> CHITRANKAN
> into
>
> their software application.
>
> You can save recognized output in RTF format and even choose recognition
>
> language as either Hindi or Marathi.
>
>
>
> Screen reader access with Chitrankan -
>
> Graphical User Interface of Chitrankan is very friendly with menus and
> shortcuts
>
> are available for all important options.
>
> In the workspace area it has mainly three windows such as input image
> window,
>
> recognized output text window and digitized image windows. However screen
>
> reader (SAFA) is not able to read the recognized text.
>
>
>
> Conclusion :
>
> One can definitely contribute to the development of the Indian language 
> OCR
>
> through download, testing and the feedback can be given to C-DAC that 
> would
>
> help in product enhancement. Those who are familiar with Malayalam would 
> do
>
> well to test the NAYANA OCR software.
>
> Prashant Naik
>
> The Xavier's Resource Centre for the Visually Challenged (XRCVC)
>
> St. Xavier's College, Mumbai.
> ----
> VISION WITHOUT ACTION IS MERELY A DREAM,
> ACTION WITHOUT VISION JUST PASSES THE TIME,
> VISION WITH ACTION CAN CHANGE THE WORLD.
> Join Access India convention: For updates on it visit: 
> http://accessindia.org.in/harish/convention.htm
> Registration is now open!
>
> To unsubscribe send a message to [EMAIL PROTECTED] 
> with the subject unsubscribe.
>
> To change your subscription to digest mode or make any other changes, 
> please visit the list home page at
>  http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in 


Join Access India convention: For updates on it visit: 
http://accessindia.org.in/harish/convention.htm
Registration is now open!

To unsubscribe send a message to [EMAIL PROTECTED] with the subject unsubscribe.

To change your subscription to digest mode or make any other changes, please 
visit the list home page at
  http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in

Reply via email to