On Tue, 2004-10-12 at 08:36, Prof Brian Ripley wrote: > > LuÃs Torgo, Data Mining with R. Learning by case > > studies, Maggio 2003 > > http://www.liacc.up.pt/~ltorgo/DataMiningWithR/ > > Please note that that reference is not about large datasets, nor about > `data mining' in the generally used sense. It has two studies, one > incomplete, on linear regression (with 200 samples) and on time series.
I would like to add a few information on these incomplete comments on the book I'm writing. The book is unfinished as mentioned on its Web page. It has currently two reasonably finished chapters: an introduction to R and MySQL and a case study. As mentioned in the book, the first case study is small by data mining standards (200 observations) and has the goal of illustrating techniques that are shared by data mining and other disciplines as well as smoothly introducing the reader to R and its power. It addresses data pre-processing techniques, data visualization, model construction (yes, linear regression but also regression trees), and model evaluation, selection and combination, so I think it is a bit incorrect to say that it is about linear regression that corresponds to 5 of the 50 pages of that chapter. The third (unfinished) chapter (2nd case study) is about financial trading. It includes topics like connections to data bases as well as many other components of a knowledge discovery process. Among those components it includes model construction that involves obviously time series models given the nature of the data. The chapter will include other steps like issues concerning moving from predictions into actions, creation of variables from the original time series, etc.. It is currently being re-written and I expect to upload soon a new revised version of this chapter. The book will include at least two further cases studies that will be larger. Still, I would note that the financial trading case study is potentially very large, as it is a problem where data is constantly growing. The final version of that chapter addresses this issue of having a system that is online in the sense that it is receiving new data in real time (also known as mining data streams in the data mining field). I'm sorry for being so long, but I think it is dangerous to try to resume around 200 pages of an unfinished work in two lines of text. Still, all comments on this on going project are very well welcome and I would like to take this opportunity to thank all people that have been sending me encouraging comments/emails. Luis Torgo -- Luis Torgo FEP/LIACC, University of Porto Phone : (+351) 22 607 88 30 Machine Learning Group Fax : (+351) 22 600 36 54 R. Campo Alegre, 823 email : [EMAIL PROTECTED] 4150 PORTO - PORTUGAL WWW : http://www.liacc.up.pt/~ltorgo ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html