For merging, selecting and aggregating, R is not too bad: I believe the following code is more or less equivalent to your Vilno example, isn't?
# to input data, replace the following two lline by 'read.table' labresults <- data.frame(patient.id=gl(10,5), visit.num=gl(5,10), sodium=rnorm(50)) demo <- data.frame(patient.id=gl(10,1), gender=gl(2,5,labels=c('F','M'))) data <- merge(labresults, demo) data <- subset(data, visit.num!=2) with(data, aggregate(sodium, list(gender=gender), mean) ) If the data sets are very large, then doing merges/selection/aggregation outside of R can be a good idea. Christophe On 6/27/07, Robert Wilkins <[EMAIL PROTECTED]> wrote: > > In response to those who asked for a better explanation of what the > Vilno software does, here's a simple example that gives some idea of > what it does. > > LABRESULTS is a dataset with multiple rows per patient , with lab > sodium measurements. It has columns: PATIENT_ID, VISIT_NUM, and > SODIUM. > > DEMO is a dataset with one row per patient, with demographic data. > It has columns: PATIENT_ID, GENDER. > > Here's a simple example, the following paragraph of code is a > data processing function (dpf) : > > > inlist LABRESULTS DEMO ; > mergeby PATIENT_ID ; > if (SODIUM == -9) SODIUM = NULL ; > if (VISIT_NUM != 2) deleterow ; > select AVERAGE_SODIUM = avg(SODIUM) by GENDER ; > sendoff(RESULTS_DATASET) GENDER AVERAGE_SODIUM ; > turnoff; // just means end-of-paragraph , version 1.0 won't need this. > > Can you guess what it does? The lab result rows are merged with the > demographic rows, just to get the gender information merged in. > Obviously, they are merged by patient. The code -9 is used to denote > "missing", so convert that to NULL. I'm about to take a statistic for > visit 2, so rows with visit 0 or 1 must be deleted. I'm assuming, for > visit 2, each patient has at most one row. Now, for each sex group, > take the average sodium level. After the select statement, I have just > two rows, for male and female, with the average sodium level in the > AVERAGE_SODIUM column. Now the sendoff statement just stores the > current data table into a datafile, called RESULTS_DATASET. > > So you have a sequence of data tables, each calculation reading in the > current table , and leaving a new data table for the next calculation. > > So you have input datasets, a bunch of intermediate calculations, and > one or more output datasets. Pretty simple idea. > > ***************************************** > > Some caveats: > > LABRESULTS and DEMO are binary datasets. The asciitobinary and > binarytoascii statements are used to convert between binary datasets > and comma-separated ascii data files. (You can use any delimiter: > comma, vertical bar , etc). An asciitobinary statement is typically > just two lines of code. > > The dpf begins with the inlist statement , and , for the moment , > needs "turnoff ;" as the last line. Version 1.0 won't require the use > of "turnoff;", but version 0.85 does. It only means this paragraph of > code ends here ( a program can , of course , contain many paragraphs: > data processing functions, print statements, asciitobinary statements, > etc.). > > If you've worked with lab data, you know lab data does not look so > simplistic. I need a simple example. > > Vilno has a lot of functionality, many-to-many joins, adding columns, > firstrow() and lastrow() flags, and so forth. A fair amount of complex > data manipulations have already been tested with test programs ( in > the tarball ). Of course a simple example cannot show you that, it's > just a small taste. > > ********************************************* > > If you've never used SPSS or SAS before, you won't care, but this > programming language falls in the same family as the SPSS and SAS > programming languages. All three programming languages have a fair > amount in common, but are quite different from the S programming > language. The vilno data processing function can replace the SAS > datastep. (It can also replace PROC TRANSPOSE and much of PROC MEANS, > except standard deviation calculations still need to be included in > the select statement). > > ******************************************** > > I hope that helps. > > http://code.google.com/p/vilno > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Christophe Pallier (http://www.pallier.org) [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.