Thanks for your reply, Jon.

> Thanks for asking.   All the words are in
> tab-separated text files, as in noun.lex, verb.lex,
> etc.   They get converted to a kimmo-usable file such
> as fa-noun.lex, fa-verb.lex, etc. using the db2lex perl scripts in the
> scripts directory.  The verb and adjective files use a specific script
> written for them; all others use the plain script.  Also see the
> orthography.txt file for the romanization scheme.  It also has some
> other goodies.
>
> I would love add any additions you might make to the lexicon in the
> next release.

I suppose I can use roman2unicode to convert the roman encoding into
readable plain text (I'm not fast on reading the roman notation).  That way,
I can import the data into Excel, sort it alphabetically, and start adding
new stuff...

> As you can see, it needs a little more work on the morphophonemic
> rules, but it should work fine for stemming purposes.

Yes, it's pretty good at recognizing the stem of the word.

> Hans Nelson is the man to talk to.  He's working on a Kimmo output to
> XML program.  I don't know much about
> it, but here's his email:   [EMAIL PROTECTED]

Thanks for your hint.  I'll try to contact him.  In case you're interested,
I can send the final result of our discussion to you off-list.

-------------
Ehsan Akhgari

Farda Technology (http://www.farda-tech.com/)

List Owner: [EMAIL PROTECTED]

[ Email: [EMAIL PROTECTED] ]
[ WWW: http://www.beginthread.com/Ehsan ]



_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to