Hello everyone,
Here are notes on the broad objectives, and the immediate
deliverables of the software development projects that people
from Sarai will be discussing at the IndLinux meet this weekend.
A. Spell-checking, and dictionaries:
1. Objectives
(a) Incorporate existing phonetic rules that have been made
for various Indian languages at
<http://workouts.foss.in/2008/index.php/Sorting_in_Indic_locales/Indian_language_spell-checking_enhancements>
(b) Demo making of an aspell dictionary distribution (Assamese).
(c) Take inputs for phonetic rules for other languages
(d) New word lists for any languages
(e) Santhosh: Convert aspell phonetic rules to Hunspell.
2. Deliverables from meet:
(a) (Offline?) aspell dictionary distributions for all languages.
Submit on return from Pune.
(b) Hunspell dictionary distributions? I do not know how to make
these.
B. Font converters:
1. Objectives: Desktop GUI front-end for:
(a) Javascript-based converters from Google technical groups
(b) Padma (the Firefox plugin for on-the-fly conversion)
(c) Homegrown m17n-based converter
(d) Others?
Needs to be extensible, i.e., end-user should be easily able
to classify more fonts.
2. Immediate deliverables:
(a) Demo of what is working so far.
(b) Gather input on needs of various Indian languages. Is
reordering of some matras the only requirement?
(c) Plan GUI for classification of fonts. The current
stumbling block for most converters seems to be the need
of some technical expertise to classify each individual
font.
C. Simple OCR:
1. Objectives:
(a) Start with Oriya, and Devanagari
(b) Minimalistic approach: Identify characters from clean image
of a known font in a known script at a known size.
(c) Extend a step at a time:
* Automatically classify glyphs in font, so font does not
need to be known a priori.
* Automatically identify font size.
* Automatically identify script (low priority)
2. Deliverables: This work is behind schedule, so will be partially
complete:
(a) Describe technique. Demo existing code which shows steps
of simple OCR.
(b) Get input on letter-segmentation techniques, and complicating
factors for other languages.
D. Other development work: Our inputs here:
(a) Spell-checking bindings using enchant.
(b) Overview of Indic text rendering using Harfbuzz and FontMatrix.
leading up to Indic text in Scribus.
(c) Portal for IndLinux code development.
(d) Review of projects on FedoraHosted.
Regards,
Gora
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
IndLinux-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/indlinux-group