This topic is sometimes called wordprinting or stylometry. The spring 2003 issue of Chance magazine had several articles on the topic.
A colleague of mine and I have been working on a perl program (along with various graduate students) to extract many of the common statistics used in wordprinting (counts/percentages of non-contextual words, word pattern ratios, vocabulary richness). The data can then be loaded into R (or any other stats package) to be analyzed. The program is currently in a beta state (usable, but we want to possibly add more features and documentation), but I can send a copy to anyone who is interested (specify if you have perl, or need a stand alone copy (windows only)). hope this helps, Greg Snow, Ph.D. Statistical Data Center, LDS Hospital Intermountain Health Care [EMAIL PROTECTED] (801) 408-8111 >>> Werner Bier <[EMAIL PROTECTED]> 06/12/05 01:29PM >>> Hi R-help, I have a database of 10 students who have written an overall of 78 essays. The challenge? I would like to identify who wrote the 79th essay. Has anybody used R in this context? Even if not, would you suggest me which pattern recognition technique I might possibly apply? Thanks a lot and regards, Tom --------------------------------- [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html