Re: [R] Issue with results from 'summary' function in R
Hi Thierry, Thank you for the response. I should have looked at the help page. Kind Regards, Praveen. On 18/09/2015 15:51, "Thierry Onkelinx" <thierry.onkel...@inbo.be> wrote: >This is described in ?summary > >> x <- 22072 >> getOption("digits") >[1] 7 >> summary(x) > Min. 1st Qu. MedianMean 3rd Qu.Max. > 22070 22070 22070 22070 22070 22070 >> options(digits = 10) >> summary(x) > Min. 1st Qu. MedianMean 3rd Qu.Max. > 22072 22072 22072 22072 22072 22072 >ir. Thierry Onkelinx >Instituut voor natuur- en bosonderzoek / Research Institute for Nature >and Forest >team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance >Kliniekstraat 25 >1070 Anderlecht >Belgium > >To call in the statistician after the experiment is done may be no >more than asking him to perform a post-mortem examination: he may be >able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher >The plural of anecdote is not data. ~ Roger Brinner >The combination of some data and an aching desire for an answer does >not ensure that a reasonable answer can be extracted from a given body >of data. ~ John Tukey > > >2015-09-18 14:08 GMT+02:00 Praveen Surendran <ps...@medschl.cam.ac.uk>: >> Hi all, >> >> Attached table (that contains summary for a genetic association study) >>was read using the command: >> >> test <- read.table('testDat.txt',header=FALSE,stringsAsFactors=FALSE) >> >> Results from the summary of the attached table is provided below: >> >>> summary(test$V5) >>Min. 1st Qu. MedianMean 3rd Qu.Max. >> 22070 22070 22070 22070 22070 22070 >> >> As we can see column 5 of this table contains only one value - 22072 >> I am confused as to why I am getting a value 22070 in the summary of >>this column. >> >> I tested this using versions of R including - R version 3.2.1 >>(2015-06-18) -- "World-Famous Astronaut" >> >> Thank you for looking at this issue. >> Kind Regards, >> >> Praveen. >> >> >> >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with results from 'summary' function in R
Hi all, Attached table (that contains summary for a genetic association study) was read using the command: test <- read.table('testDat.txt',header=FALSE,stringsAsFactors=FALSE) Results from the summary of the attached table is provided below: > summary(test$V5) Min. 1st Qu. MedianMean 3rd Qu.Max. 22070 22070 22070 22070 22070 22070 As we can see column 5 of this table contains only one value - 22072 I am confused as to why I am getting a value 22070 in the summary of this column. I tested this using versions of R including - R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut" Thank you for looking at this issue. Kind Regards, Praveen. 1 762320 C T 22072 0.0169445 0.0169445 748 1 0.00350047 21324 748 0 0.38843753.9888 0.000133264 0.994259 1 785989 T C 22072 0.7928370.79283722182 0.6337890.0166803 647 45028840-84.0255137.518 -0.00444316 0.54119 1 865545 G A 22072 0.00021447 0.00021447 6 0.6337441 13982 6 0 11.2623 4.91838 0.465569 0.0220305 1 865584 G A 22072 9.06125e-05 9.06125e-05 4 1 1 22068 4 0 0.82775 4.01634 0.0513144 0.836716 1 865628 G A 22072 0.00236503 0.00236503 58 0.451 12204 58 0 -0.662004 15.2589 -0.00284324 0.965395 1 865662 G A 22072 0.00493838 0.00493838 218 1 1 21854 218 0 -97.186229.5061 -0.11163 0.000988553 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix Multiplication using R.
Dear Doran, Bert and Roger, Thank you for attending my query and for your valuable responses. The task is slightly more complex. Here's the real case... I have genetic variation data (40,000 single nucleotide polymorphisms) from 90,000 individuals. This makes the 90,000 (samples) rows/columns of the matrix and 40,000 (SNPs) rows/columns of the matrix. Matrix data are genetic variations with values 0,1,2 or 3 where 0 is missing. There will be very few individuals with missing data. The task is to identify the relatedness between these 90,000 individuals using their genetic data (0,1,2 or 3). These values needs to be standardised before matrix multiplication. This will make the matrix much larger compared to the 0/1/2/3 matrix and most of these will be real numbers with decimals. Bert, I will not be doing a 90,000 x 40,000 %*% 40,000 x 90,000. The plan is to load this 9 x 4 matrix into R, then standardise and multiply this in batches of 90,000 samples against 500 samples using these 40,000 variants and process these in parallel to get 90,000 x 90,000 comparisons. Does that sort of clarifies the situation? I tried loading a 90,000 x 40,000 matrix as a matrix in R this morning on the cluster with specifications described in my previous e-mail. This crashed due to memory overflow. I am trying for possibilities Any comments or thoughts will be greatly appreciated. Regards, Praveen. -Original Message- From: Roger Koenker [mailto:rkoen...@illinois.edu] Sent: 14 August 2013 23:06 To: Praveen Surendran Cc: r-help@r-project.org Subject: Re: [R] Matrix Multiplication using R. In the event that these are moderately sparse matrices, you could try Matrix or SparseM. Roger Koenker rkoen...@illinois.edu On Aug 14, 2013, at 10:40 AM, Praveen Surendran wrote: Dear all, I am exploring ways to perform multiplication of a 9 x 4 matrix with it's transpose. As expected even a 4 x 100 %*% 100x4 didn't work on my desktop... giving the error Error: cannot allocate vector of length 16 However I am trying to run this on one node (64GB RAM; 2.60 GHz processor) of a high performance computing cluster. Appreciate if anyone has any comments on whether it's advisable to perform a matrix multiplication of this size using R and also on any better ways to handle this task. Kind Regards, Praveen. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrix Multiplication using R.
Dear all, I am exploring ways to perform multiplication of a 9 x 4 matrix with it's transpose. As expected even a 4 x 100 %*% 100x4 didn't work on my desktop... giving the error Error: cannot allocate vector of length 16 However I am trying to run this on one node (64GB RAM; 2.60 GHz processor) of a high performance computing cluster. Appreciate if anyone has any comments on whether it's advisable to perform a matrix multiplication of this size using R and also on any better ways to handle this task. Kind Regards, Praveen. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Opening SAS file using read.sas7bdat() function in sas7bdat library.
Hi, I have a file in .sas7bdat format. I tried to open this file using read.sas7bdat() function. This gave me an error - Error in read.sas7bdat(bnp_genetic.sas7bdat) : unknown host X64_7PRO. Could someone tell me what this error means? Thank you, Praveen. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Opening SAS file using read.sas7bdat() function in sas7bdat library.
Dear all, Thank you for the response and.. thanks Marc. It works with the source file which Matt has at https://github.com/BioStatMatt/sas7bdat/blob/master/R/sas7bdat.R which is also attached. Cheers, Praveen. -Original Message- From: Marc Schwartz [mailto:marc_schwa...@me.com] Sent: 29 October 2012 19:14 To: Duncan Murdoch Cc: Praveen Surendran; r-help@r-project.org Subject: Re: [R] Opening SAS file using read.sas7bdat() function in sas7bdat library. On Oct 29, 2012, at 2:04 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 29/10/2012 2:54 PM, Marc Schwartz wrote: On Oct 29, 2012, at 1:28 PM, Praveen Surendran praveen.surend...@ucd.ie wrote: Hi, I have a file in .sas7bdat format. I tried to open this file using read.sas7bdat() function. This gave me an error - Error in read.sas7bdat(bnp_genetic.sas7bdat) : unknown host X64_7PRO. Could someone tell me what this error means? Thank you, Praveen. More than likely, a similar problem as in this recent thread, but for 64 bit, rather than 32 bit: https://stat.ethz.ch/pipermail/r-help/2012-October/325257.html If you look at the source for the function, it checks the SAS_host against a known list, containing # Host systems known to work KNOWNHOST - c(WIN_PRO, WIN_NT, WIN_NTSV, WIN_SRV, WIN_ASRV, XP_PRO, XP_HOME, NET_ASRV, NET_DSRV, NET_SRV, WIN_98, W32_VSPR, WIN, WIN_95, X64_VSPR, X64_ESRV) Praveen's host is not in that list, so the package author has never tested it. But nothing else in the code appears to depend on the host, so it's a good guess that adding another host string to that list (or changing the error to a warning) will make it work properly. Duncan Murdoch As per that prior thread, Matt has added those to the source on GitHub: https://github.com/BioStatMatt/sas7bdat/blob/master/R/sas7bdat.R at line 86: # Host systems known to work KNOWNHOST - c(WIN_PRO, WIN_NT, WIN_NTSV, WIN_SRV, WIN_ASRV, XP_PRO, XP_HOME, NET_ASRV, NET_DSRV, NET_SRV, WIN_98, W32_VSPR, WIN, WIN_95, X64_VSPR, X64_ESRV, W32_ESRV, W32_7PRO, W32_VSHO, X64_7HOM, X64_7PRO, X64_SRV0) It's presumably just a matter of Matt releasing an updated version of the package. There were some comments in that prior thread of communication issues with Matt, so not sure what is going on there relative to time frame. Regards, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] create string with using paste.
Hi, I am trying to create a string - Double quotes : using paste. a command something like paste('Double','quotes : \',sep= ) prints Double quotes : \ where backslash is also printed. Is there a way to print just ? Regards, Praveen. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting dates in an SPSS file in right format.
Dear all, I am trying to read an SPSS file into a data frame in R using method read.spss(), sample - read.spss(file.name,to.data.frame=TRUE) But dates in the data.frame 'sample' are coming as integers and not in the actual date format given in the SPSS file. Appreciate if anyone can help me to solve this problem. Kind Regards, Praveen Surendran 2G, Complex and Adaptive Systems Laboratory (UCD CASL) School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. Office : +353-(0)1716 5334 Mobile : +353-(0)8793 13071 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading .sav (SPSS) files.
Hi, I used the method read.spss() in library(foreign) to read a .sav file using commands. library(foreign) data.sav - read.spss('masterfile.sav') mydat - as.data.frame(data.sav) It's throwing some warnings.. Warning messages: 1: In read.spss(masterfile.sav) : masterfile.sav: File-indicated character representation code (1252) looks like a Windows codepage 2: In read.spss(masterfile.sav) : masterfile.sav: Unrecognized record type 7, subtype 20 encountered in system file In the resulting data.frame-'mydat' all date variables are represented by integers. Q. Are there any other options which can be used with the method read.spss() to get date variables in the right format? Thanks in advance. Kind Regards, Praveen Surendran 2G, Complex and Adaptive Systems Laboratory (UCD CASL) School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Replacing multiple elements in a vector !
Hi, I have a vector with elements rs.id=c('rs100','rs101','rs102','rs103') And a dataframe 'snp.id' 1 SNP_100 rs100 2 SNP_101 rs101 3 SNP_102 rs102 4 SNP_103 rs103 Task is to replace rs.id vector with corresponding 'SNP_' ids in snp.id. Thanks in advance. Praveen Surendran 2G, Complex and Adaptive Systems Laboratory (UCD CASL) School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting indeices of intersecting elements.
Hi, Is there a command to get the indices of intersecting elements of two vectors as intersect() will give the elements and not its indices. Thanks in advance. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfield, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] index of intersect()
Hi all, Is there a way to get the index of elements in intersect(x,y) where x and y are vectors with few common elements. Appreciate your response. Praveen Surendran. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding missing elements by comparing vectors
Hi, Is there a function in R to do a-b where a and b are two non-numeric sets (or intersection complement of these two sets). Kind Regards, Praveen. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R regular expression to extract words with the query string.
Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R regular expression to extract words with the query string.
Thanks Henrique. This is indeed short and quite simple compared to what I was using which goes like... unlist(strsplit(i,split= ))[grep(ENSP,unlist(strsplit(i,split= )))] J Cheers, Praveen. From: Henrique Dallazuanna [mailto:www...@gmail.com] Sent: 08 July 2009 14:18 To: praveen.surend...@ucd.ie Cc: r-help@r-project.org Subject: Re: [R] R regular expression to extract words with the query string. Try this: sapply(strsplit(i, ' '), grep, pattern='ENSP', value = T) On Wed, Jul 8, 2009 at 10:04 AM, Praveen Surendran praveen.surend...@ucd.ie wrote: Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i - transcript:ENST112334 pid:ENSP12345 What I need is the string pid:ENSP12345 from $i using the query ENSP. Appreciate your comments. Praveen Surendran School of Medicine and Medical Sciences University College Dublin Belfiled, Dublin 4 Ireland. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Alternate ways of finding number of occurrence of an element in a vector.
Hi, I have a vector v and would like to find the number of occurrence of element x in the same. Is there a way other than, sum(as.integer(v==x)) or length(which(x==v)) to do the this. I have a huge file to process and do this. Both the above described methods are pretty slow while dealing with a large vector. Please have your comments. Praveen Surendran. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.