Correction: df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)]) df[!duplicated(df[,2:3]), ] ## Note the ! sign
Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Nov 8, 2019 at 7:59 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > Sorry, but you ask basic questions.You really need to spend some more time > with an R tutorial or two. This list is not meant to replace your own > learning efforts. > > You also do not seem to be reading the docs carefully. Under ?unique, it > links ?duplicated and tells you that it gives indices of duplicated rows of > a data frame. These then can be used by subscripting to remove those rows > from the data frame. Here is a reproducible example: > > df <- data.frame(a = 1:3, b = letters[c(1,1,2)], d = LETTERS[c(1,1,2)]) > df[-duplicated(df[,2:3]), ] ## Note the - sign > > If you prefer, the "Tidyverse" world has what are purported to be more > user-friendly versions of such data handling functionality that you can use > instead. > > > Bert > > On Fri, Nov 8, 2019 at 7:38 AM Ana Marija <sokovic.anamar...@gmail.com> > wrote: > >> would you know how would I extract from my original data frame, just >> these unique rows? >> because this gives me only those 3 columns, and I want all columns >> from the original data frame >> >> > head(udt) >> chr pos gene_id >> 1 chr1 54490 ENSG00000227232 >> 2 chr1 58814 ENSG00000227232 >> 3 chr1 60351 ENSG00000227232 >> 4 chr1 61920 ENSG00000227232 >> 5 chr1 63671 ENSG00000227232 >> 6 chr1 64931 ENSG00000227232 >> >> > head(dt) >> chr pos gene_id pval_nominal pval_ret wl wr >> META >> 1: chr1 54490 ENSG00000227232 0.608495 0.783778 31.62278 21.2838 >> 0.7475480 >> 2: chr1 58814 ENSG00000227232 0.295211 0.897582 31.62278 21.2838 >> 0.6031214 >> 3: chr1 60351 ENSG00000227232 0.439788 0.867959 31.62278 21.2838 >> 0.6907182 >> 4: chr1 61920 ENSG00000227232 0.319528 0.601809 31.62278 21.2838 >> 0.4032200 >> 5: chr1 63671 ENSG00000227232 0.237739 0.988039 31.62278 21.2838 >> 0.7482519 >> 6: chr1 64931 ENSG00000227232 0.276679 0.907037 31.62278 21.2838 >> 0.5974800 >> >> On Fri, Nov 8, 2019 at 9:30 AM Ana Marija <sokovic.anamar...@gmail.com> >> wrote: >> > >> > Thank you so much! Converting it to data frame resolved the issue! >> > >> > On Fri, Nov 8, 2019 at 9:19 AM Gerrit Eichner >> > <gerrit.eich...@math.uni-giessen.de> wrote: >> > > >> > > It seems as if dt is not a (base R) data frame but a >> > > data table. I assume, you will have to transform dt >> > > into a data frame (maybe with as.data.frame) to be >> > > able to apply unique in the suggested way. However, >> > > I am not familiar with data tables. Perhaps somebody >> > > else can provide a more profound guess. >> > > >> > > Regards -- Gerrit >> > > >> > > --------------------------------------------------------------------- >> > > Dr. Gerrit Eichner Mathematical Institute, Room 212 >> > > gerrit.eich...@math.uni-giessen.de Justus-Liebig-University Giessen >> > > Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany >> > > http://www.uni-giessen.de/eichner >> > > --------------------------------------------------------------------- >> > > >> > > Am 08.11.2019 um 16:02 schrieb Ana Marija: >> > > > I tried it but I got this error: >> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")]) >> > > > Error in `[.data.table`(dt, c("chr", "pos", "gene_id")) : >> > > > When i is a data.table (or character vector), the columns to >> join by >> > > > must be specified using 'on=' argument (see ?data.table), by keying >> x >> > > > (i.e. sorted, and, marked as sorted, see ?setkey), or by sharing >> > > > column names between x and i (i.e., a natural join). Keyed joins >> might >> > > > have further speed benefits on very large data due to x being sorted >> > > > in RAM. >> > > > >> > > > On Fri, Nov 8, 2019 at 8:58 AM Gerrit Eichner >> > > > <gerrit.eich...@math.uni-giessen.de> wrote: >> > > >> >> > > >> Hi, Ana, >> > > >> >> > > >> doesn't >> > > >> >> > > >> udt <- unique(dt[c("chr", "pos", "gene_id")]) >> > > >> nrow(udt) >> > > >> >> > > >> get close to what you want? >> > > >> >> > > >> Hth -- Gerrit >> > > >> >> > > >> >> --------------------------------------------------------------------- >> > > >> Dr. Gerrit Eichner Mathematical Institute, Room >> 212 >> > > >> gerrit.eich...@math.uni-giessen.de Justus-Liebig-University >> Giessen >> > > >> Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, >> Germany >> > > >> http://www.uni-giessen.de/eichner >> > > >> >> --------------------------------------------------------------------- >> > > >> >> > > >> Am 08.11.2019 um 15:38 schrieb Ana Marija: >> > > >>> Hello, >> > > >>> >> > > >>> I have a data frame like this: >> > > >>> >> > > >>>> head(dt,20) >> > > >>> chr pos gene_id pval_nominal pval_ret >> wl wr >> > > >>> 1: chr1 54490 ENSG00000227232 0.6084950 0.7837780 31.62278 >> 21.2838 >> > > >>> 2: chr1 58814 ENSG00000227232 0.2952110 0.8975820 31.62278 >> 21.2838 >> > > >>> 3: chr1 60351 ENSG00000227232 0.4397880 0.8679590 31.62278 >> 21.2838 >> > > >>> 4: chr1 61920 ENSG00000227232 0.3195280 0.6018090 31.62278 >> 21.2838 >> > > >>> 5: chr1 63671 ENSG00000227232 0.2377390 0.9880390 31.62278 >> 21.2838 >> > > >>> 6: chr1 64931 ENSG00000227232 0.2766790 0.9070370 31.62278 >> 21.2838 >> > > >>> 7: chr1 81587 ENSG00000227232 0.6057930 0.6167630 31.62278 >> 21.2838 >> > > >>> 8: chr1 115746 ENSG00000227232 0.4078770 0.7799110 31.62278 >> 21.2838 >> > > >>> 9: chr1 135203 ENSG00000227232 0.4078770 0.9299130 31.62278 >> 21.2838 >> > > >>> 10: chr1 138593 ENSG00000227232 0.8464560 0.5696060 31.62278 >> 21.2838 >> > > >>> >> > > >>> it is very big, >> > > >>>> dim(dt) >> > > >>> [1] 73719122 8 >> > > >>> >> > > >>> To count number of unique rows for all 3 columns: chr, pos and >> gene_id >> > > >>> I could just join those 3 columns and than count. But how would I >> find >> > > >>> unique number of rows for these 4 columns without joining them? >> > > >>> >> > > >>> Thanks >> > > >>> Ana >> > > >>> >> > > >>> ______________________________________________ >> > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > > >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > > >>> and provide commented, minimal, self-contained, reproducible code. >> > > >>> >> > > >> >> > > >> ______________________________________________ >> > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > > >> https://stat.ethz.ch/mailman/listinfo/r-help >> > > >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > > >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.