Salam Idris,
I do think that this is an excellent idea, and this is something that I have often thought about myself. Hopefully this could be a useful resource that might be incorporated as a future part of the Quranic Arabic Corpus, at a later stage. I've sent this e-mail to the wider discussion list, since as you say below this may be useful and interesting to discuss with other researchers. A good way to approach this might be through word-sense tagging. Something similar that I have proposed in the past, is a related project that we might consider one day: "Quranic Arabic WordNet". This could be similar to English WordNet, at Princeton University: http://en.wikipedia.org/wiki/WordNet Since the original WordNet was developed, similar projects have been produced for several other languages, including a good attempt for Modern Standard Arabic. The idea for the Quran, would be to group related words into sets, and show how these different sets are related semantically. You are indeed correct - this could result in a new and interesting search for the Quran for students and researchers. I don't think that this would be a very difficult task for someone else to pick up. But we don't need to start from scratch. We already have a word-by-word dictionary, of sorts, through the interlinear translation on the website: http://corpus.quran.com/qurandictionary.jsp In addition, the words in the Quranic Arabic corpus are already organized according to root, and then further subdivded by lemma. According to the Quranic Arabic Corpus website, the current tagging of the Quran indicates that there are 3,673 unique lemmas: http://corpus.quran.com/lemmas.jsp For the Quran, this is only 2.5% of the 150,000 unique words modelled by English WordNet. Still, not a trivial task, although I think achievable within a reasonable time frame, as part of an interesting and useful future research project. This might be part of a wider deep dictionary of the Quran, which not only lists word definitions by root then lemma (as per other Arabic dictionaries), but then also relates sets of lemmas in the style of WordNet. There is also the possiblity of saving time by relating word sets automatically through statistical analysis, which could then form the basis of a Quranic Arabic WordNet which could be verified and then compiled in its final form manually. Certaintly something interesting to think about. Kind Regards, -- Kais Dukes Language Research Group School of Computing University of Leeds http://corpus.quran.com - The Quranic Arabic Corpus comp-quran@comp.leeds.ac.uk - Computational Quranic Arabic discussion list
2010/2/3 Idris Mokhtarzada <idr...@gmail.com>: Salam Kais, I just thought of another idea that may be useful to look into. It would be useful if words could somehow be linked, similar to what you did with the ontology, but linked by related meanings. For example, shukr (شكر) and hamd (حمد) would be "related (or similar) words". This would be useful when we're thinking about or searching for something involving شكر, so that a program could suggest results with حمد as well. What do you think? I'm sure you have you're piled up with work, but I like to throw ideas out there just in case something sounds useful or interesting to others. Wasalam, Idris