Thanks for your reply, Jon. > Thanks for asking. All the words are in > tab-separated text files, as in noun.lex, verb.lex, > etc. They get converted to a kimmo-usable file such > as fa-noun.lex, fa-verb.lex, etc. using the db2lex perl scripts in the > scripts directory. The verb and adjective files use a specific script > written for them; all others use the plain script. Also see the > orthography.txt file for the romanization scheme. It also has some > other goodies. > > I would love add any additions you might make to the lexicon in the > next release.
I suppose I can use roman2unicode to convert the roman encoding into readable plain text (I'm not fast on reading the roman notation). That way, I can import the data into Excel, sort it alphabetically, and start adding new stuff... > As you can see, it needs a little more work on the morphophonemic > rules, but it should work fine for stemming purposes. Yes, it's pretty good at recognizing the stem of the word. > Hans Nelson is the man to talk to. He's working on a Kimmo output to > XML program. I don't know much about > it, but here's his email: [EMAIL PROTECTED] Thanks for your hint. I'll try to contact him. In case you're interested, I can send the final result of our discussion to you off-list. ------------- Ehsan Akhgari Farda Technology (http://www.farda-tech.com/) List Owner: [EMAIL PROTECTED] [ Email: [EMAIL PROTECTED] ] [ WWW: http://www.beginthread.com/Ehsan ] _______________________________________________ PersianComputing mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/persiancomputing