This topic is sometimes called wordprinting or stylometry.  The spring
2003 issue of Chance magazine had several articles on the topic.

A colleague of mine and I have been working on a perl program (along
with various graduate students) to extract many of the common statistics
used in wordprinting (counts/percentages of non-contextual words, word
pattern ratios, vocabulary richness).  The data can then be loaded into
R (or any other stats package) to be analyzed.

The program is currently in a beta state (usable, but we want to
possibly add more features and documentation), but I can send a copy to
anyone who is interested (specify if you have perl, or need a stand
alone copy (windows only)).

hope this helps,

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
[EMAIL PROTECTED]
(801) 408-8111

>>> Werner Bier <[EMAIL PROTECTED]> 06/12/05 01:29PM >>>
Hi R-help,
 
I have a database of 10 students who have written an overall of 78
essays. 
The challenge? I would like to identify who wrote the 79th essay.
 
Has anybody used R in this context? 
 
Even if not, would you suggest me which pattern recognition technique I
might possibly apply?
 
Thanks a lot and regards,
Tom 


                
---------------------------------


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to