Hello Ted,

Great to hear from the University of Minnesota. I must confess I am not an 
expert on clustering. However, you may be interested in the following 
references that I had previously looked up as part of my general background 
reading on computational techniques applied to the Quran:

[1] Abdul-Baquee M. Sharaf
Report on Text Mining the Quran:
- http://www.comp.leeds.ac.uk/scsams/transfer/TransferReport-Sharaf.pdf

[2] Hermann Moisl
Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur’an
- 
http://portal.acm.org/citation.cfm?id=1644886&dl=ACM&coll=PORTAL&CFID=71864491&CFTOKEN=23791269
- http://www.staff.ncl.ac.uk/hermann.moisl/MoislRevised.doc

[3] Naglaa Thabet
The thematic structure of the Qur'an: an exploratory multivariate approach
- http://acl.ldc.upenn.edu/P/P05/P05-2002.pdf

I see from your message below that you are already familiar with the Thabet 
reference [3]. Abdul-Baquee at the Language Research Group, University of Leeds 
has found some quite interesting and exciting preliminary results and has 
kindly made his transfer report available on-line [1]. Although not 
specifically focusing on clustering, you may find this report an interesting 
read with regards to possible computational analysis in general, and for 
feature-based machine learning applied to aspects of the Quran. With regards to 
the specific research interest mentioned below, you may also find the recently 
published paper by Hermann Moisl [2] quite relevant, especially given its title 
and content.

Speaking for myself, I would be quite excited about discussing how to apply 
clustering (or any other suitable statistical or machine-learning techniques) 
to the recently tagged Quranic Arabic Corpus (http://corpus.quran.com). Whereas 
previous approaches have worked with available features (e.g. verse or text 
length, etc), we now have a whole new set of linguistic features to work with 
(root, part-of-speech, lemma, etc). Perhaps this could now reveal more 
interesting analyses? I think that the recently tagged and verified 
part-of-speech and lemma features alone could add a whole new set of dimensions 
to this sort of investigation, especially as they be closely related to 
semantics and meaning than other features (see [1]).

I would imagine that this sort of investigation would also depend on exactly 
what a particular statistical analysis is aiming to model. Please feel free to 
ask any further questions related to the tagged Quranic Arabic Corpus. I am 
sure that either myself or other researchers involved in the project would be 
more than happy to provide any required further information.

Kind Regards,

-- Kais Dukes

Language Research Group
School of Computing
University of Leeds

http://corpus.quran.com - The Quranic Arabic Corpus

On Sun, Jan 17, 2010 at 10:12 PM, Ted Pedersen <tpede...@d.umn.edu> wrote:
> Greetings all,
>
> I am especially interested in applications of clustering techniques to
> the Quran. One example I'm aware of (and I guess it's the only example
> I know of truth be told) is the following:
>
> Naglaa Thabet
> Understanding the Thematic Structure of the Qur’an: An Exploratory
> Multivariate Approach
> ACL 2002 Student Research Workshop
> http://www.aclweb.org/anthology/P/P05/P05-2002.pdf
>
> Can anyone direct me to other work that applies clustering (or other
> unsupervised learning techniques) to the Quran?
>
> Cordially,
> Ted
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse

Reply via email to