On Thu, Jan 31, 2013 at 10:51 AM, Weijia Wang <wwang....@gmail.com> wrote: > On Thu, Jan 31, 2013 at 10:29 AM, Weijia Wang <wwang....@gmail.com> wrote: > >> Hi, >> >> >> >> I have a new question about subsetting in R. >> >> >> >> Say we have this data frame: >> >> >> >> PT_ID Blood_Pressure OBS_TYPE >> >> 92 1900 90.0 DBP >> >> 94 1900 90.0 DBP >> >> 174 2900 140.0 SBP >> >> 176 2900 130.0 SBP >> >> 180 3900 120.0 SBP >> >> 268 3900 150.0 SBP >> >> 268 3900 90.0 DBP >> >> >> >> I need to obtain those with 2+ DBP>=90 or 2+ SBP>=140. >> >> >> >> PT_ID=1900, he has 2 DBP>=90, so he will be included. >> >> PT_ID=2900, he has 1 SBP>=140, so he will NOT be included. >> >> PT_ID=3900, he has 1 SBP>=140 and 1 DBP>=90, so he will still NOT be >> included. >> >> >> >> So, the condition requires TWO OR MORE values higher than the threshold. >> It could be either SBP or DBP or both of them. >> >> >> >> I have tried ddply, but I don’t know how to add the condition 2+ inside >> ddply. >>
This can be specified in a reasonably natural fashion using SQL. Here DF is the input data frame.: > library(sqldf) > sqldf("select + PT_ID, + sum(Blood_Pressure >= 90 and OBS_TYPE == 'DBP') DBP, + sum(Blood_Pressure >= 140 and OBS_TYPE == 'SBP') SBP + from DF + group by PT_ID + having DBP >= 2 or SBP >= 2") PT_ID DBP SBP 1 1900 2 0 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.