On Thu, Jan 18, 2007 at 09:25:34AM +0100, Hugo Coolens wrote: > Even though I know most written Arabic doesn't use diacritic marks, it > is very helpful for beginners. My problem is that I have a lot of words > written with diacritics, when trying to spellcheck them with aspell, > aspell refuses them as correct. I thought it would be just a matter of > telling aspell not to look at the diacritics as follows: > cat tekst.ar |aspell -a -d ar --ignore-accents=true > > but it seems not the right way to do this > I also thought of using a filter something like: > cat tekst.ar | tr 'harakaat' -d |aspell -a -d ar > > but I don't know what to use for 'harakaat' >
I once wrote a small program to do this. I've cleaned it a bit and here it is: http://home.foolab.org/cgi-bin/viewcvs.cgi/src/clean_arabic.c?rev=1.1&view=auto You'll need glib 2.x To compile it: gcc -o clean_arabic clean_arabic.c `pkg-config glib-2.0 --cflags --libs` Pass it a file as an argument or it'll try to read from the standard input. It'll write the "cleaned" Arabic on the standard output. Good luck. -- GNU/Linux registered user #224950 Proud Egyptian GNU/Linux User Group <www.eglug.org> Member. Life powered by Debian, Homepage: www.foolab.org -- Don't send me any attachment in Micro$oft (.DOC, .PPT) format please Read http://www.gnu.org/philosophy/no-word-attachments.html Preferable attachments: .PDF, .HTML, .TXT Thanx for adding this text to Your signature
signature.asc
Description: Digital signature
_______________________________________________ Aspell-user mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-user
