Hello Ted, Great to hear from the University of Minnesota. I must confess I am not an expert on clustering. However, you may be interested in the following references that I had previously looked up as part of my general background reading on computational techniques applied to the Quran:
[1] Abdul-Baquee M. Sharaf Report on Text Mining the Quran: - http://www.comp.leeds.ac.uk/scsams/transfer/TransferReport-Sharaf.pdf [2] Hermann Moisl Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur’an - http://portal.acm.org/citation.cfm?id=1644886&dl=ACM&coll=PORTAL&CFID=71864491&CFTOKEN=23791269 - http://www.staff.ncl.ac.uk/hermann.moisl/MoislRevised.doc [3] Naglaa Thabet The thematic structure of the Qur'an: an exploratory multivariate approach - http://acl.ldc.upenn.edu/P/P05/P05-2002.pdf I see from your message below that you are already familiar with the Thabet reference [3]. Abdul-Baquee at the Language Research Group, University of Leeds has found some quite interesting and exciting preliminary results and has kindly made his transfer report available on-line [1]. Although not specifically focusing on clustering, you may find this report an interesting read with regards to possible computational analysis in general, and for feature-based machine learning applied to aspects of the Quran. With regards to the specific research interest mentioned below, you may also find the recently published paper by Hermann Moisl [2] quite relevant, especially given its title and content. Speaking for myself, I would be quite excited about discussing how to apply clustering (or any other suitable statistical or machine-learning techniques) to the recently tagged Quranic Arabic Corpus (http://corpus.quran.com). Whereas previous approaches have worked with available features (e.g. verse or text length, etc), we now have a whole new set of linguistic features to work with (root, part-of-speech, lemma, etc). Perhaps this could now reveal more interesting analyses? I think that the recently tagged and verified part-of-speech and lemma features alone could add a whole new set of dimensions to this sort of investigation, especially as they be closely related to semantics and meaning than other features (see [1]). I would imagine that this sort of investigation would also depend on exactly what a particular statistical analysis is aiming to model. Please feel free to ask any further questions related to the tagged Quranic Arabic Corpus. I am sure that either myself or other researchers involved in the project would be more than happy to provide any required further information. Kind Regards, -- Kais Dukes Language Research Group School of Computing University of Leeds http://corpus.quran.com - The Quranic Arabic Corpus On Sun, Jan 17, 2010 at 10:12 PM, Ted Pedersen <tpede...@d.umn.edu> wrote: > Greetings all, > > I am especially interested in applications of clustering techniques to > the Quran. One example I'm aware of (and I guess it's the only example > I know of truth be told) is the following: > > Naglaa Thabet > Understanding the Thematic Structure of the Qur’an: An Exploratory > Multivariate Approach > ACL 2002 Student Research Workshop > http://www.aclweb.org/anthology/P/P05/P05-2002.pdf > > Can anyone direct me to other work that applies clustering (or other > unsupervised learning techniques) to the Quran? > > Cordially, > Ted > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse