Perhaps I am missing something but I do not get the same result: x <- read.table(textConnection("Family.ID Sample.ID Relationship 2702 349 mother 2702 3456 sibling 2702 9980 sibling 3064 3 father 3064 4 mother 3064 5 sibling 3064 86 sibling 3064 87 sibling"), header = TRUE) closeAllConnections()
xs <- with(x, split(x, Family.ID)) res <- do.call(rbind, lapply(xs, function(l){ l$PID <- l$MID <- 0 father <- with(l, Relationship == 'father') mother <- with(l, Relationship == 'mother') if(sum(father) == 0) l$PID[l$Relationship == 'sibling'] <- 0 else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father] if(sum(mother) == 0) l$MID[l$Relationship == 'sibling'] <- 0 else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother] l })) #Family.ID Sample.ID Relationship MID PID #2702.1 2702 349 mother 0 0 #2702.2 2702 3456 sibling 349 0 #2702.3 2702 9980 sibling 349 0 #3064.4 3064 3 father 0 0 #3064.5 3064 4 mother 0 0 #3064.6 3064 5 sibling 4 3 #3064.7 3064 86 sibling 4 3 #3064.8 3064 87 sibling 4 3 HTH, Jorge.- On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius <kate.ignat...@gmail.com> wrote: > Yep - you're right - missing parents are indicated as zero in the M/PID > field. > > The above code worked with a few errors: > > 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : > number of items to replace is not a multiple of replacement length > 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : > number of items to replace is not a multiple of replacement length > 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : > number of items to replace is not a multiple of replacement length > 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] : > number of items to replace is not a multiple of replacement length > > looking at the output I get numbers where the father/mother ID should > be in the M/PID field. For example: > > 2702 349 mother 0 0 > 2702 3456 sibling 0 842 > 2702 9980 sibling 0 842 > 3064 3 father 0 0 > 3064 4 mother 0 0 > 3064 5 sibling 879 880 > 3064 86 sibling 879 880 > 3064 87 sibling 879 880 > > On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanve...@gmail.com> > wrote: > > Dear Kate, > > > > Try this: > > > > res <- do.call(rbind, lapply(xs, function(l){ > > l$PID <- l$MID <- 0 > > father <- with(l, Relationship == 'father') > > mother <- with(l, Relationship == 'mother') > > if(sum(father) == 0) > > l$PID[l$Relationship == 'sibling'] <- 0 > > else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father] > > if(sum(mother) == 0) > > l$MID[l$Relationship == 'sibling'] <- 0 > > else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother] > > l > > })) > > > > It is assumed that when either parent is not available the M/PID is 0. > > > > Best, > > Jorge.- > > > > > > On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignat...@gmail.com > > > > wrote: > >> > >> Actually - I didn't check this before, but these are not all nuclear > >> families (as I assumed they were). That is, some don't have a father > >> or don't have a mother.... Usually if this is the case PID or MID will > >> become 0, respectively, for the child. How can the code be edit to > >> account for this? > >> > >> On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignat...@gmail.com > > > >> wrote: > >> > Thanks! > >> > > >> > I think I know what is being done here but not sure how to fix the > >> > following error: > >> > > >> > Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : > >> > replacement has length zero > >> > > >> > > >> > > >> > On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez > >> > <jorgeivanve...@gmail.com> wrote: > >> >> Dear Kate, > >> >> > >> >> Assuming you have nuclear families, one option would be: > >> >> > >> >> x <- read.table(textConnection("Family.ID Sample.ID Relationship > >> >> 14 62 sibling > >> >> 14 94 father > >> >> 14 63 sibling > >> >> 14 59 mother > >> >> 17 6004 father > >> >> 17 6003 mother > >> >> 17 6005 sibling > >> >> 17 368 sibling > >> >> 130 202 mother > >> >> 130 203 father > >> >> 130 204 sibling > >> >> 130 205 sibling > >> >> 130 206 sibling > >> >> 222 9 mother > >> >> 222 45 sibling > >> >> 222 34 sibling > >> >> 222 10 sibling > >> >> 222 11 sibling > >> >> 222 18 father"), header = TRUE) > >> >> closeAllConnections() > >> >> > >> >> xs <- with(x, split(x, Family.ID)) > >> >> res <- do.call(rbind, lapply(xs, function(l){ > >> >> l$PID <- l$MID <- 0 > >> >> father <- with(l, Relationship == 'father') > >> >> mother <- with(l, Relationship == 'mother') > >> >> l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father] > >> >> l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother] > >> >> l > >> >> })) > >> >> res > >> >> > >> >> HTH, > >> >> Jorge.- > >> >> > >> >> > >> >> Best regards, > >> >> Jorge.- > >> >> > >> >> > >> >> > >> >> On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius > >> >> <kate.ignat...@gmail.com> > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> I have a data.table question (as well as if else statement query). > >> >>> > >> >>> I have a large list of families (file has 935 individuals that are > >> >>> sorted by famiy of varying sizes). At the moment the file has the > >> >>> columns: > >> >>> > >> >>> SampleID FamilyID Relationship > >> >>> > >> >>> To prevent from having to make a pedigree file by hand - ie adding a > >> >>> PaternalID and a MaternalID one by one I want to try write a script > >> >>> that will quickly do this for me (I eventually want to run this > >> >>> through a program such as plink) Is there a way to use data.table > >> >>> (maybe in conjucntion with ifelse to do this effectively)? > >> >>> > >> >>> An example of the file is something like: > >> >>> > >> >>> Family.ID Sample.ID Relationship > >> >>> 14 62 sibling > >> >>> 14 94 father > >> >>> 14 63 sibling > >> >>> 14 59 mother > >> >>> 17 6004 father > >> >>> 17 6003 mother > >> >>> 17 6005 sibling > >> >>> 17 368 sibling > >> >>> 130 202 mother > >> >>> 130 203 father > >> >>> 130 204 sibling > >> >>> 130 205 sibling > >> >>> 130 206 sibling > >> >>> 222 9 mother > >> >>> 222 45 sibling > >> >>> 222 34 sibling > >> >>> 222 10 sibling > >> >>> 222 11 sibling > >> >>> 222 18 father > >> >>> > >> >>> But the goal is to have a file like this: > >> >>> > >> >>> Family.ID Sample.ID Relationship PID MID > >> >>> 14 62 sibling 94 59 > >> >>> 14 94 father 0 0 > >> >>> 14 63 sibling 94 59 > >> >>> 14 59 mother 0 0 > >> >>> 17 6004 father 0 0 > >> >>> 17 6003 mother 0 0 > >> >>> 17 6005 sibling 6004 6003 > >> >>> 17 368 sibling 6004 6003 > >> >>> 130 202 mother 0 0 > >> >>> 130 203 father 0 0 > >> >>> 130 204 sibling 203 202 > >> >>> 130 205 sibling 203 202 > >> >>> 130 206 sibling 203 202 > >> >>> 222 9 mother 0 0 > >> >>> 222 45 sibling 18 9 > >> >>> 222 34 sibling 18 9 > >> >>> 222 10 sibling 18 9 > >> >>> 222 11 sibling 18 9 > >> >>> 222 18 father 0 0 > >> >>> > >> >>> I've tried searches for this but with no luck. Greatly appreciate > any > >> >>> help - even if its just a link to a great example/solution! > >> >>> > >> >>> Thanks! > >> >>> > >> >>> ______________________________________________ > >> >>> R-help@r-project.org mailing list > >> >>> https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> PLEASE do read the posting guide > >> >>> http://www.R-project.org/posting-guide.html > >> >>> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> >> > > > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.