Hello everyone,

  Here are notes on the broad objectives, and the immediate
deliverables of the software development projects that people
from Sarai will be discussing at the IndLinux meet this weekend.

A. Spell-checking, and dictionaries:
1. Objectives
   (a) Incorporate existing phonetic rules that have been made
       for various Indian languages at
       
<http://workouts.foss.in/2008/index.php/Sorting_in_Indic_locales/Indian_language_spell-checking_enhancements>
   (b) Demo making of an aspell dictionary distribution (Assamese).
   (c) Take inputs for phonetic rules for other languages
   (d) New word lists for any languages
   (e) Santhosh: Convert aspell phonetic rules to Hunspell.
2. Deliverables from meet:
   (a) (Offline?) aspell dictionary distributions for all languages.
       Submit on return from Pune.
   (b) Hunspell dictionary distributions? I do not know how to make
       these.

B. Font converters:
1. Objectives: Desktop GUI front-end for:
   (a) Javascript-based converters from Google technical groups
   (b) Padma (the Firefox plugin for on-the-fly conversion)
   (c) Homegrown m17n-based converter
   (d) Others?
   Needs to be extensible, i.e., end-user should be easily able
   to classify more fonts.
2. Immediate deliverables:
   (a) Demo of what is working so far.
   (b) Gather input on needs of various Indian languages. Is
       reordering of some matras the only requirement?
   (c) Plan GUI for classification of fonts. The current
       stumbling block for most converters seems to be the need
       of some technical expertise to classify each individual
       font.

C. Simple OCR:
1. Objectives:
   (a) Start with Oriya, and Devanagari
   (b) Minimalistic approach: Identify characters from clean image
       of a known font in a known script at a known size.
   (c) Extend a step at a time:
       * Automatically classify glyphs in font, so font does not
         need to be known a priori.
       * Automatically identify font size.
       * Automatically identify script (low priority)
2. Deliverables: This work is behind schedule, so will be partially
   complete:
   (a) Describe technique. Demo existing code which shows steps
       of simple OCR.
   (b) Get input on letter-segmentation techniques, and complicating
       factors for other languages.

D. Other development work: Our inputs here:
   (a) Spell-checking bindings using enchant.
   (b) Overview of Indic text rendering using Harfbuzz and FontMatrix.
       leading up to Indic text in Scribus.
   (c) Portal for IndLinux code development.
   (d) Review of projects on FedoraHosted.

Regards,
Gora

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
IndLinux-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/indlinux-group

Reply via email to