Hi, (apologies for initial html posting)
I have a data frame where one column is a list of lists. I would like to subset the data frame based on membership of the lists in that column and be able to 'denormalise' the data frame so that a row is duplicated for each of its list elements. Example code follows: # The data is read in in this form with the c2 list values in single strings which I then split to give lists: > f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E", "F,G,H")) > f1$Split <- strsplit(as.character(f1$c2), ",") > f1 c1 c2 Split 1 0 A,B,C A, B, C 2 1 A,E A, E 3 2 F,G,H F, G, H # So f1$Split is the list of lists column I want to denormalise or use as the subject for subsetting # f2 is data to use to select subsets from f1 > f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples", "Badger","Camel","Dog","Elephants","Fish","Goat","Horse")) > f2 c1 c2 1 A Apple 2 B Badger 3 C Camel 4 D Dog 5 E Elephant 6 F Fish 7 G Goat 8 H Horse # I was able to find which rows of f2 are represented in the f1 lists (not entirely sure if this is the best way to do this): > f3 <- f2[f2$c1 %in% unlist(f1$Split),] > f3 c1 c2 1 A Apple 2 B Badger 3 C Camel 5 E Elephant 6 F Fish 7 G Goat 8 H Horse # Note that 'D' is missing from f3 because it is not in any of the f1$Split lists # f4 is a subset of f3 and I want to find the rows of f1 where f1$Split contains any of f4$c1: > f4 <- f3[c(1,3),] > f4 c1 c2 1 A Apple 3 C Camel # I tried this and it didn't work, presumably because it's trying to match against each list object rather than the list elements, but unlist doesn't do the trick here because I need the individual rows, I need to unlist on a row by row basis. > f1[f1$Split %in% f4$c1,] [1] c1 c2 Split <0 rows> (or 0-length row.names) > f1[f4$c1 %in% f1$Split,] [1] c1 c2 Split <0 rows> (or 0-length row.names) > f1[match(f4$c1, f1$Split),] c1 c2 Split NA NA <NA> NULL NA.1 NA <NA> NULL I also looked at reshape which I don't think helps. I thought I might be able to create a new data frame with the f1$Split denormalised and use that, but couldn't find a way to do this, the result I'd want there is something like: > f1_denorm c1 c2 Split SplitDenorm 1 0 A,B,C A, B, C A 2 0 A,B,C A, B, C B 3 0 A,B,C A, B, C C 4 1 A,E A, E A 5 1 A,E A, E E 6 2 F,G,H F, G, H F 7 2 F,G,H F, G, H G 8 2 F,G,H F, G, H H I thought perhaps for loops would be the next thing to try, but there must be a better way! Thanks for any help. Tim ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.