Ok, it's time to spill the beans. My goal in Clean parsing has to do with Sanskrit.
Sanskrit is written in a certain script <http://en.wikipedia.org/wiki/Devanagari>. But prior to modern typography, people used to approximate that script with ASCII character sets. <http://en.wikipedia.org/wiki/Devanagari_transliteration> Over time, several systems evolved. My goals are 1 - bidirectional transliteration between the various ascii schemes: // Given Harvard-Kyoto, produce Velthuis encoding: translit Harvard Velthuis "ajJAna" // output will be aj~naana 2 - unidirectional translation from any ascii scheme to Unicode // Given Harvard-Kyoto, produce Unicode encoding: // an expansion on http://www.iit.edu/~laksvij/language/sanskrit.html translit Harvard Unicode "ajJAna" // output will be अज्ञान I'm thinking of using the Velthuis encoding <http://en.wikipedia.org/wiki/Devanagari_transliteration#Velthuis> as the "Abstract Syntax Tree" for the whole project. Regardless of what ascii I get, convert it to Velthuis and then convert the Velthuis to the specified target. I still have a few more days of banging my head against the MetarParser, but I wanted to at least let people know where I'm heading with all these questions. Errata: ==== A major hitch in converting ascii to unicode is that all of the ascii schemes are purely linear: you read them the way you would read english, left to right. However, Devanagari is non-linear in at least two places: * short "i" precedes the consonants that it is pronounced after ... in other words "agni" is written in Devanaagarii with the "i" between "a" and "g" --- "aign" even though pronounced "agni" * "r" goes to the far right of the consonants it _precedes_... in other words "rgo" is written in Devanaagarii with the "r" after the "go" There already is a good converter from Harvard-Kyoto to Devanaagarii <http://www.iit.edu/~laksvij/language/sanskrit.html> so I may just focus on bidirectional ASCII translation and then when I need Unicode, simply use his online tool. It would be nice to have all resources available in a Clean program though. -- View this message in context: http://www.nabble.com/Sanskrit-Transliteration---Parsing-into-Abstract-Syntax-Trees-tp19187901p19187901.html Sent from the Clean mailing list archive at Nabble.com. _______________________________________________ clean-list mailing list [email protected] http://mailman.science.ru.nl/mailman/listinfo/clean-list
