[R] csv file with two header rows
Is there a way to use read.csv() on such a file without deleting one of the header rows? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] C50 package in R
Hi All, I am trying to use the C50 package to build classification trees in R. Unfortunately there is not enought documentation around its use. Can anyone explain to me - how to prune the decision trees? Regards, Indrajit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with dataEllipse function
Hi Everyone, I am working with the R function dataEllipse. I plot the 95% confidence ellipses for several different samples in the same plot and I color-code the ellipse of each sample, but I do not know how to specify a different line pattern for each ellipse. I can only modify the pattern for all ellipses with the lty argument. Any help will be highly appreciated. Thanks in advance! Jana -- Jana Makedonska, B.Sc. Biology, Universite Paul Sabatier Toulouse III M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II Ph.D. candidate in Physical Anthropology and Part-time lecturer Department of Anthropology College of Arts Sciences State University of New York at Albany 1400 Washington Avenue 1 Albany, NY Office phone: 518-442-4699 http://electricsongs.academia.edu/JanaMakedonska [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls: example code throws error
Greets, I'm trying to learn to use nls and was running the example code for an exponential model: x - -(1:100)/10 y - 100 + 10 * exp(x / 2) + rnorm(x)/10 nlmod - nls(y ~ Const + A * exp(B * x)) Error in B * x : non-numeric argument to binary operator In addition: Warning message: In nls(y ~ Const + A * exp(B * x)) : No starting values specified for some parameters. Initializing Const to '1.'. Consider specifying 'start' or using a selfStart model Presumably, the code should work if it is part of an example on the help page. In perusing various help forums for similar problems, it also appears that others believe this syntax should work in the model formula. Any ideas? Perhaps also, a pointer to a comprehensive and correct document that details model formulae syntax if someone has one? Thanks Best Regards, Steven [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls: example code throws error
On Thu, Apr 25, 2013 at 7:16 PM, Steven LeBlanc ores...@gmail.com wrote: Greets, I'm trying to learn to use nls and was running the example code for an exponential model: x - -(1:100)/10 y - 100 + 10 * exp(x / 2) + rnorm(x)/10 nlmod - nls(y ~ Const + A * exp(B * x)) Error in B * x : non-numeric argument to binary operator In addition: Warning message: In nls(y ~ Const + A * exp(B * x)) : No starting values specified for some parameters. Initializing ‘Const’ to '1.'. Consider specifying 'start' or using a selfStart model Presumably, the code should work if it is part of an example on the help page. In perusing various help forums for similar problems, it also appears that others believe this syntax should work in the model formula. Any ideas? Try running in a clean session. Having B - X in your workspace would cause such an error. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split number into array
Hi,Not sure about the criteria for deciding number of zeros. vec1- c(23,244,1343,45,153555,546899,75) lst1- strsplit(as.character(vec1),) m1-max(sapply(lst1,length)) res- t(sapply(lst1,function(x) as.numeric(c(rep(0,m1-length(x)),x res # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #[1,] 0 0 0 0 0 0 0 0 2 3 #[2,] 0 0 0 0 0 0 0 2 4 4 #[3,] 0 0 0 0 0 0 1 3 4 3 #[4,] 0 0 0 0 0 0 0 0 4 5 #[5,] 0 0 0 0 1 5 3 5 5 5 #[6,] 5 4 6 8 9 9 9 9 9 9 #[7,] 0 0 0 0 0 0 0 0 7 5 A.K. hi, I'm a new R user. I have a for cycle which generates number from 0 to N, and i wanna put this number into an array: i.e. number 23 into an array of int: a(0,0,0,0,2,3) can you help me? Federico __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance matrices Combinations
Hi, Do you want this? el- matrix(1:100,ncol=20) set.seed(25) el1- matrix(sample(1:100,20,replace=TRUE),ncol=1) indx-sort(el1,index.return=TRUE)$ix[1:3] list(el[,indx],sort(el1)[1:3]) #[[1]] # [,1] [,2] [,3] #[1,] 41 21 11 #[2,] 42 22 12 #[3,] 43 23 13 #[4,] 44 24 14 #[5,] 45 25 15 # #[[2]] #[1] 7 13 15 A.K. From: eliza botto eliza_bo...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Thursday, April 25, 2013 4:45 PM Subject: RE: [R] Distance matrices Combinations dear arun, I will see through it thoroughly if you give some 10 mins. Meanwhile can you please tell me that how we can change the following of your codes so that in el1 we could see the values not the indexes?? thanks, Elisa el1-matrix(o,ncol=1) indx-sort(el1,index.return=F)$ix[1:3] list(el[,indx],indx) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data from a text file conditionally skipping lines
Hi, It would be better to give an example. If your dataset is like the one attached: con-file(Trial1.txt) Lines1- readLines(con) close(con) #If the data you wanted to extract is numeric and the header and footer are characters, dat1-read.table(text=Lines1[-grep([A-Za-z],Lines1)],sep=\t,header=FALSE) dat1 # V1 V2 V3 V4 V5 #1 38 43 39 44 45 #2 39 44 36 49 46 #3 42 45 47 49 37 #4 34 43 39 45 45 #5 38 42 39 44 47 #6 43 44 46 42 37 #7 32 49 38 42 45 #8 34 45 35 49 46 #9 44 45 46 49 37 #10 34 43 39 48 49 #11 38 42 39 47 47 #12 43 44 46 42 37 #13 37 43 39 44 45 #14 39 42 36 49 46 #15 42 45 47 49 37 #or You mentioned that the data is repeated every so many lines. Here also, there is repeating pattern. head(Lines1,10) #[1] Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat #volutpat. #[2] Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit #lobortis # [3] 38\t43\t39\t44\t45 #[4] 39\t44\t36\t49\t46 #[5] 42\t45\t47\t49\t37 #[6] Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie #consequat. #[7] Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis #dolore te feugait nulla facilisi. #[8] 34\t43\t39\t45\t45 #[9] 38\t42\t39\t44\t47 #[10] 43\t44\t46\t42\t37 dat2-read.table(text=Lines1[rep(rep(c(FALSE,TRUE),times=c(2,3)),5)],sep=\t,header=FALSE) identical(dat1,dat2) #[1] TRUE A.K. I have a text file that is nicely formatted (tab separated). However, it has some header and footer information after every so many lines. I do not want to read this information in my dataframe. What is the best way to read this data into R. Thanks for all the help! Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis 38 43 39 44 45 39 44 36 49 46 42 45 47 49 37 Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat. Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. 34 43 39 45 45 38 42 39 44 47 43 44 46 42 37 Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis 32 49 38 42 45 34 45 35 49 46 44 45 46 49 37 Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat. Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. 34 43 39 48 49 38 42 39 47 47 43 44 46 42 37 Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis 37 43 39 44 45 39 42 36 49 46 42 45 47 49 37 Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat. Vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
Re: [R] Scheirer-Ray-Hare
You can take a look at this, in Vietnamese but you can Gtranslate it http://www.ytecongcong.com/2013/04/scheirer-ray-hare-test-kiem-dinh-phi-tham-so-two-way-anova/ -- View this message in context: http://r.789695.n4.nabble.com/Scheirer-Ray-Hare-tp3818476p4665439.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looping through names of both dataframes and column-names
Hello all, This seems like a pretty standard question - suppose I want to loop through a set of similar data-frames, with similar variables, and create new variables within them: nl-seq(1,5)for (i in nl) { assign(paste0(df_,nl[i]),data.frame(x=seq(1:10),y=rnorm(10)))} ls()[grep(df_,ls())] nls-ls()[grep(df_,ls())]for (df in nls) { print(df) for (var in names(get(df))) { print(var) assign(paste0(df,$,paste0(var,_cs)),cumsum(get(df)[[var]])) }} ls()[grep(df_,ls())] The code above *almost* works, except that it creates a whole bunch of objects of the form df_1$x_cs,df_1$yx_cs . What I want is 5 dataframes, with the $ elements enclosed, as usual. Any help or guidance would be appreciated. Much thanks, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error installing boss package
I am trying to install the package boss but i am getting error below: Please advice install.packages(boss) --- Please select a CRAN mirror for use in this session --- CRAN mirror 1: 0-Cloud 2: Argentina (La Plata) 3: Argentina (Mendoza) 4: Australia (Canberra) 5: Australia (Melbourne) 6: Austria 7: Belgium 8: Brazil (PR) 9: Brazil (RJ) 10: Brazil (SP 1) 11: Brazil (SP 2)12: Canada (BC) 13: Canada (NS) 14: Canada (ON) 15: Canada (QC 1)16: Canada (QC 2) 17: Chile18: China (Beijing 1) 19: China (Beijing 2)20: China (Guangzhou) 21: China (Hefei)22: China (Xiamen) 23: Colombia (Bogota)24: Colombia (Cali) 25: Denmark 26: Ecuador 27: France (Lyon 1) 28: France (Lyon 2) 29: France (Montpellier) 30: France (Paris 1) 31: France (Paris 2) 32: Germany (Berlin) 33: Germany (Bonn) 34: Germany (Falkenstein) 35: Germany (Goettingen) 36: Greece 37: Hungary 38: India 39: Indonesia40: Iran 41: Ireland 42: Italy (Milano) 43: Italy (Padua)44: Italy (Palermo) 45: Japan (Hyogo)46: Japan (Tsukuba) 47: Japan (Tokyo)48: Korea (Seoul 1) 49: Korea (Seoul 2) 50: Latvia 51: Mexico (Mexico City) 52: Mexico (Texcoco) 53: Netherlands (Amsterdam) 54: Netherlands (Utrecht) 55: New Zealand 56: Norway 57: Philippines 58: Poland 59: Portugal 60: Russia 61: Singapore62: Slovakia 63: South Africa (Cape Town) 64: South Africa (Johannesburg) 65: Spain (Madrid) 66: Sweden 67: Switzerland 68: Taiwan (Taichung) 69: Taiwan (Taipei) 70: Thailand 71: Turkey 72: UK (Bristol) 73: UK (London) 74: UK (St Andrews) 75: USA (CA 1) 76: USA (CA 2) 77: USA (IA) 78: USA (IN) 79: USA (KS) 80: USA (MD) 81: USA (MI) 82: USA (MO) 83: USA (OH) 84: USA (OR) 85: USA (PA 1) 86: USA (PA 2) 87: USA (TN) 88: USA (TX 1) 89: USA (WA 1) 90: USA (WA 2) 91: Venezuela92: Vietnam Selection: 86 also installing the dependency 'ncdf' trying URL 'http://cran.mirrors.hoobly.com/src/contrib/ncdf_1.6.6.tar.gz' Content type 'application/x-gzip' length 79403 bytes (77 Kb) opened URL == downloaded 77 Kb trying URL 'http://cran.mirrors.hoobly.com/src/contrib/boss_1.2.tar.gz' Content type 'application/x-gzip' length 9702 bytes opened URL == downloaded 9702 bytes * installing *source* package 'ncdf' ... ** package 'ncdf' successfully unpacked and MD5 sums checked checking for nc-config... no checking for gcc... gcc -std=gnu99 checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking netcdf.h usability... no checking netcdf.h presence... no checking for netcdf.h... no configure: error: netcdf header netcdf.h not found ERROR: configuration failed for package 'ncdf' * removing '/share/apps/R-2.15.3/lib64/R/library/ncdf' ERROR: dependency 'ncdf' is not available for package 'boss' * removing '/share/apps/R-2.15.3/lib64/R/library/boss' The downloaded source packages are in '/tmp/RtmppOWF74/downloaded_packages' Updating HTML index of packages in '.Library' Making packages.html ... done Warning messages: 1: In install.packages(boss) : installation of package 'ncdf' had non-zero exit status 2: In install.packages(boss) : installation of package 'boss' had non-zero exit status [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] Selecting and then joining data blocks
In addition, If your matrix names do not follow any particular pattern: tiger- matrix(1:20,ncol=5) cat- matrix(21:40,ncol=5) dog- matrix(41:60,ncol=5) wolf- matrix(61:80,ncol=5) vec- c(1,2,4,3,2,3,1) vec2- c(tiger,cat,dog,wolf) #Suppose, you wanted the order to be tiger, cat, dog, wolf vec2- factor(vec2,levels=vec2) vec2 #[1] tiger cat dog wolf #Levels: tiger cat dog wolf res3-do.call(rbind,lapply(vec,function(i) get(as.character(vec2[as.numeric(vec2)==i] res3 # [,1] [,2] [,3] [,4] [,5] #[1,] 1 5 9 13 17 #[2,] 2 6 10 14 18 #[3,] 3 7 11 15 19 #[4,] 4 8 12 16 20 #[5,] 21 25 29 33 37 #[6,] 22 26 30 34 38 #[7,] 23 27 31 35 39 #[8,] 24 28 32 36 40 #[9,] 61 65 69 73 77 #[10,] 62 66 70 74 78 #[11,] 63 67 71 75 79 #[12,] 64 68 72 76 80 #[13,] 41 45 49 53 57 #[14,] 42 46 50 54 58 #[15,] 43 47 51 55 59 #[16,] 44 48 52 56 60 #[17,] 21 25 29 33 37 #[18,] 22 26 30 34 38 #[19,] 23 27 31 35 39 #[20,] 24 28 32 36 40 #[21,] 41 45 49 53 57 #[22,] 42 46 50 54 58 #[23,] 43 47 51 55 59 #[24,] 44 48 52 56 60 #[25,] 1 5 9 13 17 #[26,] 2 6 10 14 18 #[27,] 3 7 11 15 19 #[28,] 4 8 12 16 20 A.K. - Original Message - From: arun smartpink...@yahoo.com To: Preetam Pal lordpree...@gmail.com Cc: Sent: Thursday, April 25, 2013 9:03 AM Subject: Re: [R] Selecting and then joining data blocks HI Preetam, I created the matrices in a list because it was easier to create. If you look at the second solution: B1- lst1[[1]] B2- lst1[[2]] B3- lst1[[3]] B4- lst1[[4]] Consider that B1, B2, B3, B4 are your actual matrices and apply the solution below: paste0(B,vec) #gives the names of the matrices #[1] B1 B2 B4 B3 B2 B3 B1 using get(), will get the matrices stored in that names. res2-do.call(rbind,lapply(vec,function(i) get(paste0(B,i If the names of the matrices are different, you need to change it accordingly. I programmed it based on the information you gave. I hope this helps. Arun From: Preetam Pal lordpree...@gmail.com To: arun smartpink...@yahoo.com Sent: Thursday, April 25, 2013 8:53 AM Subject: Re: [R] Selecting and then joining data blocks Hi Arun, Thanks for your solution. But there is only 1 thing which i could not understand: In my case, the 4 matirces(B1,B2,B3,B4) were already specified and i have to work with these only...how do I accommodate that (instead of letting R produce the big matrix by random sampling)..This might be very trivial, but i am a starter with R...I shall really appreciate if you could advise me on this. Again thanks, Preetam On Thu, Apr 25, 2013 at 5:44 PM, arun smartpink...@yahoo.com wrote: HI, set.seed(24) #creating the four matrix in a list lst1-lapply(1:4,function(x) matrix(sample(1:40,20,replace=TRUE),ncol=5)) names(lst1)- paste0(B,1:4) vec- c(1,2,4,3,2,3,1) res-do.call(rbind,lapply(vec,function(i) lst1[[i]])) dim(res) #[1] 28 5 #or B1- lst1[[1]] B2- lst1[[2]] B3- lst1[[3]] B4- lst1[[4]] res2-do.call(rbind,lapply(vec,function(i) get(paste0(B,i identical(res,res2) #[1] TRUE A.K. - Original Message - From: Preetam Pal lordpree...@gmail.com To: r-help@r-project.org Cc: Sent: Thursday, April 25, 2013 7:51 AM Subject: [R] Selecting and then joining data blocks Hi all, I have 4 matrices, each having 5 columns and 4 rows .denoted by B1,B2,B3,B4. I have generated a vector of 7 indices, say (1,2,4,3,2,3,1} which refers to the index of the matrices to be chosen and then appended one on the top of the next: like, in this case, I wish to have the following mega matrix: B1over B2 over B4 over B3 over B2 over B3 over B1. 1 How can I achieve this? 2 I don't want to manually identify and arrange the matrices for each vector of index values generated (for which the code I used is : index=sample( 4,7,replace=T)). How can I automate the process? Basically, I am doing bootstrapping , but the observations are actually 4X5 matrices. Appreciate your help. Thanks, Preetam --- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year,
Re: [R] connecting matrices
Dear Elisa, Try this: el- matrix(1:100,ncol=20) set.seed(25) el1- matrix(sample(1:100,20,replace=TRUE),ncol=1) In the example you showed, there were no column names. list(el[,sort(el1)[1:3]],sort(el1,index.return=TRUE)$ix[1:3]) #[[1]] # [,1] [,2] [,3] #[1,] 31 61 71 #[2,] 32 62 72 #[3,] 33 63 73 #[4,] 34 64 74 #[5,] 35 65 75 # #[[2]] #[1] 9 5 3 A.K. From: eliza botto eliza_bo...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Thursday, April 25, 2013 9:54 AM Subject: connecting matrices Dear Arun, [text file contains the exact format] Although the last codes were absolutely correct and worked the way i want them to. I have an additional cover-up question. Suppose i have a matrix el... here i show you only some part of that matrix so that codes can work faster. el [,595586] [,595587] [,595588] [,595589] [,595590] [,595591] [,595592] [,595593] [,595594] [,595595] [,595596] [,595597] [,595598] [,595599] [,595600] [,595601] [1,] 55 55 55 55 55 55 55 55 55 55 56 56 56 56 56 56 [2,] 59 59 59 59 59 59 60 60 60 61 57 57 57 57 57 57 [3,] 60 60 60 61 61 62 61 61 62 62 58 58 58 58 58 59 [4,] 61 62 63 62 63 63 62 63 63 63 59 60 61 62 63 60 [,595602] [,595603] [,595604] [,595605] [,595606] [,595607] [,595608] [,595609] [,595610] [,595611] [,595612] [,595613] [,595614] [,595615] [,595616] [,595617] [1,] 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 [2,] 57 57 57 57 57 57 57 57 57 58 58 58 58 58 58 58 [3,] 59 59 59 60 60 60 61 61 62 59 59 59 59 60 60 60 [4,] 61 62 63 61 62 63 62 63 63 60 61 62 63 61 62 63 In connection to this matrix, there is another matrix which contains coordination values for each of the column of matrix el el1 [595586,] 5.67 [595587,] 55.90 [595588,] 515 [595589,] 755 [595590,] 955 [595591,] 5.95 [595592,] 575 [595593,] 505 [595594,] 505 [595595,] 515 [595596,] 5612 [595597,] 506 [595598,] 576 [595599,] 5126 [595600,] 5216 [595601,] 5666 [595602,] 526 [595603,] 5.6 [595604,] 156 [595605,] 4556 [595606,] 5556 [595607,] 1256 [595608,] 1256 [595609,] 8756 [595610,] 5906 [595611,] 789 [595612,] 5006 [595613,] 1256 [595614,] 3356 [595615,] 7756 [595616,] 4456 [595617,] 3356 What i want in the end is a list of two elemens containing the 10 column of el which have the lowest values in matrix el1. More precisely [[1]] [,595603][,595586][595591,] 56 575959 596062 626163 [[2]] 5.65.675.95 is it possible to carry out such operation?? thanks for your help Elisa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] connecting matrices
HI Elisa, I guess there is a mistake. Check whether this is what you wanted. indx-sort(el1,index.return=TRUE)$ix[1:3] list(el[,indx],indx) #[[1]] # [,1] [,2] [,3] #[1,] 41 21 11 #[2,] 42 22 12 #[3,] 43 23 13 #[4,] 44 24 14 #[5,] 45 25 15 # #[[2]] #[1] 9 5 3 A.K. - Original Message - From: arun smartpink...@yahoo.com To: eliza botto eliza_bo...@hotmail.com Cc: R help r-help@r-project.org Sent: Thursday, April 25, 2013 10:09 AM Subject: Re: connecting matrices Dear Elisa, Try this: el- matrix(1:100,ncol=20) set.seed(25) el1- matrix(sample(1:100,20,replace=TRUE),ncol=1) In the example you showed, there were no column names. list(el[,sort(el1)[1:3]],sort(el1,index.return=TRUE)$ix[1:3]) #[[1]] # [,1] [,2] [,3] #[1,] 31 61 71 #[2,] 32 62 72 #[3,] 33 63 73 #[4,] 34 64 74 #[5,] 35 65 75 # #[[2]] #[1] 9 5 3 A.K. From: eliza botto eliza_bo...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Thursday, April 25, 2013 9:54 AM Subject: connecting matrices Dear Arun, [text file contains the exact format] Although the last codes were absolutely correct and worked the way i want them to. I have an additional cover-up question. Suppose i have a matrix el... here i show you only some part of that matrix so that codes can work faster. el [,595586] [,595587] [,595588] [,595589] [,595590] [,595591] [,595592] [,595593] [,595594] [,595595] [,595596] [,595597] [,595598] [,595599] [,595600] [,595601] [1,] 55 55 55 55 55 55 55 55 55 55 56 56 56 56 56 56 [2,] 59 59 59 59 59 59 60 60 60 61 57 57 57 57 57 57 [3,] 60 60 60 61 61 62 61 61 62 62 58 58 58 58 58 59 [4,] 61 62 63 62 63 63 62 63 63 63 59 60 61 62 63 60 [,595602] [,595603] [,595604] [,595605] [,595606] [,595607] [,595608] [,595609] [,595610] [,595611] [,595612] [,595613] [,595614] [,595615] [,595616] [,595617] [1,] 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 56 [2,] 57 57 57 57 57 57 57 57 57 58 58 58 58 58 58 58 [3,] 59 59 59 60 60 60 61 61 62 59 59 59 59 60 60 60 [4,] 61 62 63 61 62 63 62 63 63 60 61 62 63 61 62 63 In connection to this matrix, there is another matrix which contains coordination values for each of the column of matrix el el1 [595586,] 5.67 [595587,] 55.90 [595588,] 515 [595589,] 755 [595590,] 955 [595591,] 5.95 [595592,] 575 [595593,] 505 [595594,] 505 [595595,] 515 [595596,] 5612 [595597,] 506 [595598,] 576 [595599,] 5126 [595600,] 5216 [595601,] 5666 [595602,] 526 [595603,] 5.6 [595604,] 156 [595605,] 4556 [595606,] 5556 [595607,] 1256 [595608,] 1256 [595609,] 8756 [595610,] 5906 [595611,] 789 [595612,] 5006 [595613,] 1256 [595614,] 3356 [595615,] 7756 [595616,] 4456 [595617,] 3356 What i want in the end is a list of two elemens containing the 10 column of el which have the lowest values in matrix el1. More precisely [[1]] [,595603][,595586][595591,] 56 575959 596062 626163 [[2]] 5.65.675.95 is it possible to carry out such operation?? thanks for your help Elisa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vectorized code for generating the Kac (Clement) matrix
Hi, I am generating large Kac matrices (also known as Clement matrix). This a tridiagonal matrix. I was wondering whether there is a vectorized solution that avoids the `for' loops to the following code: n - 1000 Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 The above code is fast, but I am curious about vectorized ways to do this. Thanks in advance. Best, Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine Gerontology Johns Hopkins University rvarad...@jhmi.edumailto:rvarad...@jhmi.edu 410-502-2619 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make R 3.0 open .RData files
Another thing that you can try is changing the Path. Make sure the PATH environment variable has the path to R 3.0 before R 2.15.3 in the string. Regards, Indrajit On Thu, 25 Apr 2013 22:10:52 +0530 wrote a) See FAQ 2.17 b) Methods for configuring operating systems are off topic here. I will say there is a REGEDIT program in Windows, but there are potential permissions complications (you may not have them) and possible collateral damage (don't touch it if you don't understand it) that mean you should study up on this topic with an appropriate resource (book, forum, expert, system administrator, etc.) before attempting it. --- Jeff NewmillerThe .. Go Live... DCN:Basics: ##.#.##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#.#.O#. with /Software/Embedded Controllers).OO#..OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Dimitri Liakhovitski wrote: Brian, how do I remove the relevant old Registry entries? Thank you! Dimitri On Thu, Apr 25, 2013 at 10:29 AM, Prof Brian Ripley wrote: On 25/04/2013 14:00, Duncan Murdoch wrote: On 13-04-25 8:33 AM, Dimitri Liakhovitski wrote: Hello! I have Windows 7 Enterprise and two versions of R installed: 2.15.3 and 3.0.0. Before I had R 3.0 I made it a setting that all .RData files - when I double-click on them - were opened by R 2.15.3. Now I want them to be opened by R 3.0 instead of R 2.15.3 (but I don't want to remove R 2.15.3. yet). I right-click on some .RData file, select Open with - Choose default program and then click on Browse. I browse to the folder where my R 3.0 is installed, then to the folder bin, then to the folder x64 and select Rgui.exe. However, when R opens - or after I shut R down and then double-click on some .RData file and R opens, it is again R 2.15.3, not R3.0. What am I doing wrong? Of course, when I open R 3.0 directly, then it opens no problem. This is really a question about Windows 7, not about R, but I would guess you aren't telling it to make your choice permanent, or perhaps you are not allowed by your administrator to make permanent changes to file associations. You should ask for local help. We've encountered this for our student accounts, and think it is a bug in Windows 7. If you remove the relevant old Registry entries first it should work. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time series plot: x-axis problem
Hi, I'm trying to plot a simple time series. I'm running into an issue with x-axis The codes below will produce a plot with correct x-axis showing from Jan to Dec rr=c(3,2,4,5,4,5,3,3,6,2,4,2) (rr=ts(rr,start=c(2012,1),frequency=12)) win.graph(width=6.5, height=2.5,pointsize=8) plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) However, if I change the start point from Jan 2012 to May 2012, which is (rr=ts(rr,start=c(2012,5),frequency=12)) Then run the codes below plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) In the the new plot produced, x-axis is still showing from Jan to Dec, not from May to April as I desired. How to fix x-axis? Is it possible to fix it WITHOUT modifying the object rr? Also, ideally, I would like to have each time point on x-axis showing month/year, not just month. How to do that? Any help and input will be much appreciated! Thanks Jerry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can a column of a list be called?
Hello Everyone, I would like to know if I can call one of the columns of a list, to use it as a variable in a function. Thanks in advance for any advice! Jana -- Jana Makedonska, B.Sc. Biology, Universite Paul Sabatier Toulouse III M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II Ph.D. candidate in Physical Anthropology and Part-time lecturer Department of Anthropology College of Arts Sciences State University of New York at Albany 1400 Washington Avenue 1 Albany, NY Office phone: 518-442-4699 http://electricsongs.academia.edu/JanaMakedonska http://www.youtube.com/watch?v=OHbT9VvtonMhttp://www.youtube.com/watch?v=jRoMoLjzpf4list=PL5BF6ACDCC2E4AAA0index=7 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls: example code throws error
Hi Try x - -(1:100)/10 set.seed(1) y - 100 + 10 * exp(x / 2) + rnorm(x)/10 short cut to starting values lm(log(y) ~-log(x+10)) Call: lm(formula = log(y) ~ -log(x + 10)) Coefficients: (Intercept) 4.624 nlmod - nls(y ~ A + B * exp(C * x), start=list(A=90, B=5,C=0.1)) Formula: y ~ A + B * exp(C * x) Parameters: Estimate Std. Error t value Pr(|t|) A 100.009079 0.017797 5619.4 2e-16 B 9.93 0.042718 234.1 2e-16 C 0.499529 0.004495 111.1 2e-16 Residual standard error: 0.09073 on 97 degrees of freedom Number of iterations to convergence: 5 Achieved convergence tolerance: 0.0002475 I will leave you to plot the results as a check Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au At 09:16 26/04/2013, you wrote: Content-Type: text/plain Content-Disposition: inline Content-length: 899 Greets, I'm trying to learn to use nls and was running the example code for an exponential model: x - -(1:100)/10 y - 100 + 10 * exp(x / 2) + rnorm(x)/10 nlmod - nls(y ~ Const + A * exp(B * x)) Error in B * x : non-numeric argument to binary operator In addition: Warning message: In nls(y ~ Const + A * exp(B * x)) : No starting values specified for some parameters. Initializing 'Const' to '1.'. Consider specifying 'start' or using a selfStart model Presumably, the code should work if it is part of an example on the help page. In perusing various help forums for similar problems, it also appears that others believe this syntax should work in the model formula. Any ideas? Perhaps also, a pointer to a comprehensive and correct document that details model formulae syntax if someone has one? Thanks Best Regards, Steven [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping through names of both dataframes and column-names
Here are two possible ways to do it: This would simplify your code a bit. But it changes the names of x_cs to cs.x. for (df in nls) { assign(df, cbind(get(df), cs=apply(get(df), 2, cumsum))) } This is closer to what you have done. for (df in nls) { print(df) for (var in names(get(df))) { print(var) assign(df, within(get(df), assign(paste0(var,_cs), cumsum(get(df)[[var]] }} ls()[grep(df_,ls())] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel Egan Sent: Donnerstag, 25. April 2013 22:19 To: r-help@r-project.org Subject: [R] Looping through names of both dataframes and column-names Hello all, This seems like a pretty standard question - suppose I want to loop through a set of similar data-frames, with similar variables, and create new variables within them: nl-seq(1,5)for (i in nl) { assign(paste0(df_,nl[i]),data.frame(x=seq(1:10),y=rnorm(10)))} ls()[grep(df_,ls())] nls-ls()[grep(df_,ls())]for (df in nls) { print(df) for (var in names(get(df))) { print(var) assign(paste0(df,$,paste0(var,_cs)),cumsum(get(df)[[var]])) }} ls()[grep(df_,ls())] The code above *almost* works, except that it creates a whole bunch of objects of the form df_1$x_cs,df_1$yx_cs . What I want is 5 dataframes, with the $ elements enclosed, as usual. Any help or guidance would be appreciated. Much thanks, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transferring R to another computer, R_HOME_DIR
This is really an R-devel topic: it is not about using R. R is usually (but not always) built so that everything except Rscript is relocatable by editing the 'R' script (and R_HOME and R_HOME_DIR are ignored in the environment, intentionally). So you could edit the script, but not having Rscript working is a limitation. Having said that, not all packages play by the same rules and e.g. some use -rpath to hardcode paths in package DSOs. On 26/04/2013 06:13, lcn wrote: Well, to my understanding, you planned to rsync the original compiled folder from one machine to somewhere on another machine, and work with it. Then how about create a file link on the second machine for /usr/lib64/R? Or maybe I misunderstand your purpose? If you have write permission there, you could install the R RPM. On Thu, Apr 25, 2013 at 5:57 PM, Saptarshi Guha saptarshi.g...@gmail.comwrote: Hello, I was looking at the R (installed on RHEL6) shell script and saw R_HOME_DIR=/usr/lib64/R. Nowhere (and I could have got it wrong) does it read in the environment value R_HOME_DIR. I have the need to rsync the entire folder below /usr/lib64/R to another computer into another directory location. Without changing the R shell script, how can i force it read in R_HOME_DIR? Or maybe i misunderstood the bash source? (Note, i cannot recompile on target machine) Cheers Saptarshi 1. I also realize Rscript will not work (i think path is hard coded in the source) No, compiled it when it is compiled. Beginning of /usr/lib64/R/bin/R R_HOME_DIR=/usr/lib64/R if test ${R_HOME_DIR} = /usr/lib64/R; then case linux-gnu in linux*) run_arch=`uname -m` case $run_arch in x86_64|mips64|ppc64|powerpc64|sparc64|s390x) libnn=lib64 libnn_fallback=lib ;; *) libnn=lib libnn_fallback=lib64 ;; esac if [ -x /usr/${libnn}/R/bin/exec/R ]; then R_HOME_DIR=/usr/lib64/R elif [ -x /usr/${libnn_fallback}/R/bin/exec/R ]; then R_HOME_DIR=/usr/lib64/R ## else -- leave alone (might be a sub-arch) fi ;; esac fi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum up column values according to row id
Thank you very much Doct. Carlson!!! The function you suggest me wors perfectely!!! Thanks a lot again, Best whishes sincerely Mt M 2013/4/24 David Carlson dcarl...@tamu.edu Something like this? mean6 - function(x) { if (length(x) 6) { mn - mean(x) } else { mn - mean(x[1:6]) } return(mn) } aggregate(g~id, ipso, mean6) - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matteo Mura Sent: Wednesday, April 24, 2013 7:57 AM To: r-help@r-project.org Subject: [R] Sum up column values according to row id Dear All, here a problem I think many of you can solve in few minutes. I have a dataframe which contains values of plot id, diameters, heigths and basal area of trees, thus columns names are: id | dbh | h | g head(ipso, n=10)id dbh h g 1 FPE0164 36 13.62 0.10178760 2 FPE0164 31 12.70 0.07547676 21 FPE1127 57 18.85 0.25517586 13 FPE1127 39 15.54 0.11945906 12 FPE1127 34 14.78 0.09079203 6 FPE1127 32 15.12 0.08042477 5 FPE1127 28 14.13 0.06157522 15 FPE1127 27 13.50 0.05725553 19 FPE1127 25 13.28 0.04908739 11 FPE1127 19 11.54 0.02835287 from here I need to calculate the mean of the six greater g_ith for each id_ith. The clauses are that: if length(id) =6 do the mean of the first six greaters g else do the mean of all the g_ith in the id_ith (in head print above e.g. for the id==FPE0164 do the mean of just these two values of g). The g are already ordered by id ascending and g descending using: ipso - ipso[with(ipso, order(ipso$id, -ipso$g)), ] # Order for id ascending and g descending I tried a lot of for loops and tapply() without results. Can anyone help me to solve this? Thanks for your attention Best whishes Matteo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove reciprocal data from a grouped animal social contact dataset
Hi r-help forum, I have been collecting contact data (with proximity logger collars) between a few different species of animal. All animals wear the collars, and any contact between the animals should be detected and recorded by both collars. However, this isn't always the case and more contacts may be recorded on one collar of the two. This is fine, it depends on battery life and other things that I will have to discuss! I now have each contact recorded as a 'group in 4 columns': head(data) recordstart duration pair 1 1 27/05/2012 04:40 4948 CO1 CO12 2 2 31/05/2012 04:48 278 CO1 CO12 3 3 31/05/2012 05:303 CO1 CO12 4 4 31/05/2012 05:51 159 CO1 CO12 5 5 31/05/2012 05:56 47 CO1 CO12 6 6 31/05/2012 06:02 107 CO1 CO12 The first column shows the record number, the second shows the start date and time of the contact, the third shows the contact duration and the fourth shows the pair of animals involved in the contact. In this case the top 6 contacts are all between animals CO1 and CO12. There are nearly 100,000 records. There were many animals that could have contacted each other: animals animals 1 CO1 2 CO2 3 CO3 4 CO4 5 CO5 6 CO6 7 CO7 8 CO8 9 CO9 10CO10 11CO11 12CO12 13CO13 14CO14 15CO15 16CO16 17CO17 18 PO1 19 PO2 20 PO3 21 PO4 22 PO5 23 PO6 24 PO7 25 PO8 26 PO9 27PO10 28PO11 29PO12 30PO13 31 PI1 32 PI2 33 PI3 34 PI4 35 PI5 36 PI6 37 PI7 38 PI8 39 RD1 40 RD2 41 WB1 42 WB2 Because both collars may have recorded the single contact, I need to remove the reciprocal contacts from this dataset. For example, you may have records for CO1 CO2 that are mirrored by records for CO2 CO1. If there are the same number of records it doesn't matter which of these you select, as long as only one set is used for further analysis. Where there is an unequal number of contacts recorded on the two collars between a pair, I would like to select the records which have the most contacts. So, if there were 10 records recorded for CO1 CO2 and 15 for CO2 CO1 I would like to reject the first 10 contacts and retain the 15. There are some cases where only one version of the group is recorded, e.g. just CO1 CO3, with no reciprocal CO3 CO1. In this case I would like to retain the data that I have. I would normally like to present you with my attempts so far but as a relatively new (but enthusiastic!) R user I am struggling to know where to start. I present more data here... sadly dput(head(data, 200) is printing all the dates (all nearly 100,000 of them, regardless of using head()!) so I hope this is ok for now: head(data,300) recordstart duration pair 11 27/05/2012 04:40 4948 CO1 CO12 22 31/05/2012 04:48 278 CO1 CO12 33 31/05/2012 05:303 CO1 CO12 44 31/05/2012 05:51 159 CO1 CO12 55 31/05/2012 05:56 47 CO1 CO12 66 31/05/2012 06:02 107 CO1 CO12 77 31/05/2012 06:08 86 CO1 CO12 88 31/05/2012 06:11 194 CO1 CO12 99 31/05/2012 06:20 87 CO1 CO12 10 10 31/05/2012 06:24 12 CO1 CO12 11 11 31/05/2012 06:32 11 CO1 CO12 12 12 31/05/2012 06:40 227 CO1 CO12 13 13 31/05/2012 06:47 115 CO1 CO12 14 14 12/04/2011 13:39 109 CO1 CO15 15 15 12/04/2011 22:293 CO1 CO15 16 16 12/04/2011 22:45 44 CO1 CO15 17 17 12/04/2011 23:20 55 CO1 CO15 18 18 13/04/2011 02:50 58 CO1 CO15 19 19 13/04/2011 03:15 11 CO1 CO15 20 20 13/04/2011 05:38 65 CO1 CO15 21 21 13/04/2011 08:55 122 CO1 CO15 22 22 13/04/2011 11:064 CO1 CO15 23 23 13/04/2011 13:47 53 CO1 CO15 24 24 13/04/2011 13:57 32 CO1 CO15 25 25 13/04/2011 14:32 16 CO1 CO15 26 26 13/04/2011 14:414 CO1 CO15 27 27 13/04/2011 21:53 33 CO1 CO15 28 28 14/04/2011 01:00 41 CO1 CO15 29 29 14/04/2011 01:075 CO1 CO15 30 30 14/04/2011 01:462 CO1 CO15 31 31 14/04/2011 06:433 CO1 CO15 32 32 14/04/2011 08:443 CO1 CO15 33 33 14/04/2011 08:51 64 CO1 CO15 34 34 14/04/2011 13:596 CO1 CO15 35 35 14/04/2011 14:11 11 CO1 CO15 36 36 14/04/2011 14:36 169 CO1 CO15 37 37 14/04/2011 14:42 19 CO1 CO15 38 38 14/04/2011 15:04 48 CO1 CO15 39 39 14/04/2011 15:102 CO1 CO15 40 40 14/04/2011 17:41 58 CO1 CO15 41 41 14/04/2011 18:333 CO1 CO15 42 42 15/04/2011 16:26 50 CO1 CO15 43 43 15/04/2011 20:123 CO1 CO15 44 44 16/04/2011 23:042 CO1 CO15 45 45 17/04/2011 02:577 CO1 CO15 46 46 17/04/2011 03:08 32 CO1 CO15 47 47 17/04/2011
Re: [R] time series plot: x-axis problem
Hello, Try the following. (rr=ts(rr,start=c(2012,5),frequency=12)) plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) labs - format(as.Date(time(rr)), %b-%Y) axis(1, time(rr), labs, cex.axis = .9, tcl = -.5, las = 2) Hope this helps, Rui Barradas Em 25-04-2013 19:11, Jerry escreveu: Hi, I'm trying to plot a simple time series. I'm running into an issue with x-axis The codes below will produce a plot with correct x-axis showing from Jan to Dec rr=c(3,2,4,5,4,5,3,3,6,2,4,2) (rr=ts(rr,start=c(2012,1),frequency=12)) win.graph(width=6.5, height=2.5,pointsize=8) plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) However, if I change the start point from Jan 2012 to May 2012, which is (rr=ts(rr,start=c(2012,5),frequency=12)) Then run the codes below plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) In the the new plot produced, x-axis is still showing from Jan to Dec, not from May to April as I desired. How to fix x-axis? Is it possible to fix it WITHOUT modifying the object rr? Also, ideally, I would like to have each time point on x-axis showing month/year, not just month. How to do that? Any help and input will be much appreciated! Thanks Jerry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sample size in box plot labels
Hi, I would like to put the sample number beside each lable in a boxplot. How do I do this? Essentially, I need to count the sample size for each factor, see below: Thanks boxplot(DATA$K_Merge~factor(DATA$UnitName_1),axes=FALSE,col=colours) title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb)) axis(1, 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1), par(usr)[3], labels = levels(factor(DATA$UnitName_1)), srt = 45, adj = c(1.03,1.03), xpd = TRUE, cex=1.8) axis(2, seq(-1,5, 1), seq(-1, 5, 1)) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error installing boss package
On 04/25/2013 11:42 PM, Pramod Anugu wrote: I am trying to install the package boss but i am getting error below: Please advice ... checking netcdf.h usability... no checking netcdf.h presence... no checking for netcdf.h... no configure: error: netcdf header netcdf.h not found ERROR: configuration failed for package 'ncdf' ... Hi Pramod, I would suggest installing the netcdf packages: yum install netcdf or use whatever package management system you prefer. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample size in box plot labels
Hi -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Shane Carey Sent: Friday, April 26, 2013 11:49 AM To: r-help@r-project.org Subject: [R] sample size in box plot labels Hi, I would like to put the sample number beside each lable in a boxplot. How do I do this? Essentially, I need to count the sample size for each factor, see below: Thanks boxplot(DATA$K_Merge~factor(DATA$UnitName_1),axes=FALSE,col=colours) title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb)) axis(1, 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1), par(usr)[3], labels = levels(factor(DATA$UnitName_1)), srt = 45, adj = c(1.03,1.03), xpd = TRUE, cex=1.8) axis(2, seq(-1,5, 1), seq(-1, 5, 1)) Does not work without data. Do you want something like this? boxplot(Sepal.Length~Species, data=iris) mtext(as.character(table(iris$Species)), 1, at=1:3) Regards Petr -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample size in box plot labels
Hello, To count the sample sizes for each factor try tapply(DATA$K_Merge, DATA$UnitName_1, FUN = length) Hope this helps, Rui Barradas Em 26-04-2013 10:48, Shane Carey escreveu: Hi, I would like to put the sample number beside each lable in a boxplot. How do I do this? Essentially, I need to count the sample size for each factor, see below: Thanks boxplot(DATA$K_Merge~factor(DATA$UnitName_1),axes=FALSE,col=colours) title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb)) axis(1, 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1), par(usr)[3], labels = levels(factor(DATA$UnitName_1)), srt = 45, adj = c(1.03,1.03), xpd = TRUE, cex=1.8) axis(2, seq(-1,5, 1), seq(-1, 5, 1)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a raster image in R from my own data set
Hi Kristi, it takes a few extra steps to create a raster layer from your example data set, as it is not a gridded map in Lat lon (probably in some projection though). How exactly to do it depends on your data, but here are some hints: 1. If you actually need to read the data set from a link, then read.table.url is depreceted, just use read.table. You might need to call setInternet2(TRUE) first, as the example data is on an https-url. 2. Raster can read a range of inputs, but I am not sure if .csv is one of them, and definitely not if the data is not gridded. You can then first do interpolation with a Spatial*-object. Set the coordinates of your object, this creates a Spatial*-object, and add the projection: coordinates(pts) = ~lon+lat proj4string(pts) = CRS(+proj=longlat +datum=WGS84) 3. You will have to create your reference grid (spsample, from raster, or another existing grid you have available), and interpolate to this grid, using one of the many interpolation packages, such as geoR, automap, gstat, intamap. 4. The resulting object can easily be converted to raster through raster(interpolationResult[,resultname]) I hope this can help you getting started. Generally you get quicker response to spatial questions from the r-sig-geo list. Jon On 24-Apr-13 16:56, Kristi Glover wrote: Hi R-user, I was trying to make a raster map with WGS84 projection in R, but I could not make it. I found one data set in Google that data is almost the same format as of mine. I wanted to make a raster map of temperature with 1 degree spatial resolution for the global scale. I could make it in GIS software but I do have many variables (to be many raster images) and ultimately I am importing them to R for further analysis. Therefore, I wanted to make them in R, if possible. It would be great if you give some hints on how script look like in creating a raster map from my own data set (I have provided link for your references, this is an example data set). I am really appropriating for your help. #-- #create a raster map from scratch install.packages(raster, dependencies=TRUE) library(raster) # raster data install.packages(rgdal, dependencies=TRUE) library(rgdal) # input/output, projections install.packages(rgeos, dependencies=TRUE) library(rgeos) # geometry ops install.packages(spdep, dependencies=TRUE) library(spdep) # spatial dependence install.packages(pastecs, dependencies=TRUE) library(pastecs) pts-read.table.url(https://www.betydb.org//miscanthusyield.csv;, header=T, sep=,) proj4string(pts)=- CRS(+proj=longlat +datum=WGS84) #--- Cheers, Kristi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Land Resource Management Unit Via Fermi 2749, TP 440, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Tel: +39 0332 789206 Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample size in box plot labels
This works, great. Cheers On Fri, Apr 26, 2013 at 12:02 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, To count the sample sizes for each factor try tapply(DATA$K_Merge, DATA$UnitName_1, FUN = length) Hope this helps, Rui Barradas Em 26-04-2013 10:48, Shane Carey escreveu: Hi, I would like to put the sample number beside each lable in a boxplot. How do I do this? Essentially, I need to count the sample size for each factor, see below: Thanks boxplot(DATA$K_Merge~factor(**DATA$UnitName_1),axes=FALSE,**col=colours) title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb)) axis(1, 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1), par(usr)[3], labels = levels(factor(DATA$UnitName_1)**), srt = 45, adj = c(1.03,1.03), xpd = TRUE, cex=1.8) axis(2, seq(-1,5, 1), seq(-1, 5, 1)) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorized code for generating the Kac (Clement) matrix
On 25-04-2013, at 17:18, Ravi Varadhan ravi.varad...@jhu.edu wrote: Hi, I am generating large Kac matrices (also known as Clement matrix). This a tridiagonal matrix. I was wondering whether there is a vectorized solution that avoids the `for' loops to the following code: n - 1000 Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 The above code is fast, but I am curious about vectorized ways to do this. You could vectorize like this Kacmat - matrix(0, n+1, n+1) Kacmat[row(Kacmat)==col(Kacmat)-1] - n -(1:n) + 1 Kacmat[row(Kacmat)==col(Kacmat)+1] - 1:n But this show that your version is pretty quick f1 - function(n) { Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 Kacmat } f2 - function(n) { Kacmat - matrix(0, n+1, n+1) Kacmat[row(Kacmat)==col(Kacmat)-1] - n -(1:n) + 1 Kacmat[row(Kacmat)==col(Kacmat)+1] -1:n Kacmat } library(compiler) f1.c - cmpfun(f1) f2.c - cmpfun(f2) n - 5000 system.time(K1 - f1(n)) system.time(K2 - f2(n)) system.time(K3 - f1.c(n)) system.time(K4 - f2.c(n)) identical(K2,K1) identical(K3,K1) identical(K4,K1) # system.time(K1 - f1(n)) #user system elapsed # 0.386 0.120 0.512 # system.time(K2 - f2(n)) #user system elapsed # 3.779 1.141 4.940 # system.time(K3 - f1.c(n)) #user system elapsed # 0.323 0.119 0.444 # system.time(K4 - f2.c(n)) #user system elapsed # 3.607 0.852 4.472 # identical(K2,K1) # [1] TRUE # identical(K3,K1) # [1] TRUE # identical(K4,K1) # [1] TRUE Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls: example code throws error
On 26/04/2013 00:16, Steven LeBlanc wrote: Greets, I'm trying to learn to use nls and was running the example code for an exponential model: snip Perhaps also, a pointer to a comprehensive and correct document that details model formulae syntax if someone has one? Thanks Best Regards, Steven Others have pointed out that the error is probably from an unclean environment. For model formula syntax, see ?nls Under Arguments formula, follow the link to ?formula __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] labeling
Hi, I have a dataset as follows: Name N Visean limestone calcareous shale 2 Visean sandstone, mudstone evaporite 2 Westphalian shale, sandstone, siltstone coal How do I combine them so that I can label a plot with Visean limestone calcareous shale N=2 for example on two lines with N=2 centered on the length of the Name label? Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted Principle Components analysis
The reason for my asking is because I have to replicate the same analysis done in SPSS and SAS. Again, to make it clear - it's respondent-weighted Factor Analysis with a desired number of factors. Method of extraction: Principal Components. Rotation: Varimax. The only solution I can think of is to multiply my respondent weight by 10 (or by 100) and round it so that the new weight has no decimals, then repeat every row as many times as the new weight says and run regular, unweighted principal on the new data. I've done it - but again, this does not match the Factor Scores from SPSS and SAS exactly. Any other ideas? Thank you! On Thu, Apr 25, 2013 at 9:21 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I am doing Principle Componenets Analysis using psych package: mypc-principal(mydata,5,scores=TRUE) However, I was asked to run a case-weighted PCA - using an individual weight for each case. I could use corr from boot package to calculate the case-weighed intercorrelation matrix. But if I use the intercorrelation matrix as input (instead of the raw data), I am not going to get factor scores, which I do need to get. Any advice? Thank you very much! -- Dimitri Liakhovitski -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
Sigh. Message: 50 Date: Fri, 26 Apr 2013 10:13:52 +1200 From: Rolf Turner rolf.tur...@xtra.co.nz To: Terry Therneau thern...@mayo.edu Cc: r-help@r-project.org, Achim Zeileis achim.zeil...@uibk.ac.at Subject: Re: [R] Trouble Computing Type III SS in a Cox Regression Message-ID: 5179aaa0.8060...@xtra.co.nz Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 26/04/13 03:40, Terry Therneau wrote: (In response to a question about computing type III sums of squares in a Cox regression): SNIP If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Fortune nomination! cheers, Rolf --- On Thu, 4/25/13, Terry Therneau thern...@mayo.edu wrote: From: Terry Therneau thern...@mayo.edu Subject: Re: Trouble Computing Type III SS in a Cox Regression To: Paul Miller pjmiller...@yahoo.com, r-help@R-project.org Received: Thursday, April 25, 2013, 10:40 AM You've missed the point of my earlier post, which is that type III is not an answerable question. 1. There are lots of ways to compare Cox models, LRT is normally considered the most reliable by serious authors. There is usually not much difference between score, Wald, and LRT tests though, and the other two are more convenient in many situations. 2. Type III is a question that can't be addressed. SAS prints something out with that label, but since they don't document what it is, and people with in-depth knowlegde of Cox models (like me) cannot figure out what a sensible definition could actually be, there is nowhere to go. How to do this in R can't be answered. (It has nothing to do with interactions.) 3. If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Terry T. On 04/25/2013 07:59 AM, Paul Miller wrote Hi Dr. Therneau, Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS. You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model? People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS? A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good? Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with dataEllipse function
Dear Jana, The lty argument to dataEllipse() (in the car package) isn't vectorized. It could be, and I'll add that as a feature request. Actually, lty isn't an explicit argument to dataEllipse(); it's simply passed through to the lines() function, which draws the ellipses. You should be able to do what you want by adding the ellipses one at a time to your plot (see the argument add in ?dataEllipse) or by using the coordinates of the ellipses, returned by dataEllipse(), to a customized graph. I hope that this helps, John John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Thu, 25 Apr 2013 20:00:20 -0400 Jana Makedonska jmakedon...@gmail.com wrote: Hi Everyone, I am working with the R function dataEllipse. I plot the 95% confidence ellipses for several different samples in the same plot and I color-code the ellipse of each sample, but I do not know how to specify a different line pattern for each ellipse. I can only modify the pattern for all ellipses with the lty argument. Any help will be highly appreciated. Thanks in advance! Jana -- Jana Makedonska, B.Sc. Biology, Universite Paul Sabatier Toulouse III M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II Ph.D. candidate in Physical Anthropology and Part-time lecturer Department of Anthropology College of Arts Sciences State University of New York at Albany 1400 Washington Avenue 1 Albany, NY Office phone: 518-442-4699 http://electricsongs.academia.edu/JanaMakedonska [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] labeling
On 04/26/2013 10:15 PM, Shane Carey wrote: Hi, I have a dataset as follows: Name N Visean limestone calcareous shale 2 Visean sandstone, mudstone evaporite 2 Westphalian shale, sandstone, siltstone coal How do I combine them so that I can label a plot with Visean limestone calcareous shale N=2 for example on two lines with N=2 centered on the length of the Name label? Hi Shane, Look at the title function (graphics) and the main and sub arguments. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] csv file with two header rows
I don't think so. read.csv is a striped down version of read.table. You should be able to do this with the skip option there. John Kane Kingston ON Canada -Original Message- From: analys...@hotmail.com Sent: Thu, 25 Apr 2013 18:35:42 -0700 (PDT) To: r-help@r-project.org Subject: [R] csv file with two header rows Is there a way to use read.csv() on such a file without deleting one of the header rows? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can a column of a list be called?
If you are using the list as simply a collection of data frames a simple example to accomplish what you are describing is this: data(iris) data(mtcars) y=list(iris, mtcars) #return Sepal.Length column from first data frame in list #list[[number of list component]][number of column] y[[1]][1] Cheers, On Thu, Apr 25, 2013 at 7:24 PM, Jana Makedonska jmakedon...@gmail.comwrote: Hello Everyone, I would like to know if I can call one of the columns of a list, to use it as a variable in a function. Thanks in advance for any advice! Jana -- Jana Makedonska, B.Sc. Biology, Universite Paul Sabatier Toulouse III M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II Ph.D. candidate in Physical Anthropology and Part-time lecturer Department of Anthropology College of Arts Sciences State University of New York at Albany 1400 Washington Avenue 1 Albany, NY Office phone: 518-442-4699 http://electricsongs.academia.edu/JanaMakedonska http://www.youtube.com/watch?v=OHbT9VvtonM http://www.youtube.com/watch?v=jRoMoLjzpf4list=PL5BF6ACDCC2E4AAA0index=7 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Charles Determan Integrated Biosciences PhD Student University of Minnesota [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to export graph value in R
Dear exports,I have created a hypsometric curve (area-elevation curve) for my watershed by using simple command hypsometric(X,main=Hypsometric Curve, xlab=Relative Area above Elevation, (a/A),ylab=Relative Elevation, (h/H), col=blue)It plots the hypsometric curve in RGraphics window, My question is how can I export values which is used to create this plot? I mean I want to know the value in y axis for certain x value. Thanks in advance ! ..Anup KhanalNorwegian Institute of science and Technology (NTNU)Trondheim, NorwayMob:(+47) 45174313 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Splitting data.frame and saving to csv files
Dear R Forum, I have a data.frame as df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 2013-04-12, 2013-04-11), ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256), LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716), XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712), LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209), ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419), LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817), PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865), ABC_d = c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019), ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976)) df date ABC_f LMN_d XYZ_p LMN_a ABC_e 1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784 2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652 3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482 4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821 5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561 LMN_c XYZ_zz PQR ABC_d ABC_m 1 45.44847 85.74755 71.22868 38.71090 40.28474 2 17.72362 63.48582 95.09995 93.48216 43.97076 3 36.76901 81.61107 83.62438 93.14432 47.38762 4 68.58913 58.15729 30.18525 78.27387 97.33573 5 35.80767 27.44133 25.81805 31.87170 22.06885 I need to identify columns with same labels and along-with the dates in the first column, save the columns in different csv files. E.g. in the above data frame, I have 4 columns beginning with ABC so I need to save these four columns with the date in the first column as ABC.csv, then LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. In my actual data.frame, I won't be aware how many such rates combinations are available. If there is no matching column as PQR, the PQR.csv file should have only date and PQR column. Kindly guide how do I split the data.frame and save the respective csv files. Regards Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with dataEllipse function
On 4/25/2013 8:00 PM, Jana Makedonska wrote: Hi Everyone, I am working with the R function dataEllipse. I plot the 95% confidence ellipses for several different samples in the same plot and I color-code the ellipse of each sample, but I do not know how to specify a different line pattern for each ellipse. I can only modify the pattern for all ellipses with the lty argument. Any help will be highly appreciated. lty is not an argument of car::dataEllipse, but is passed via ... So, when you use the groups= argument, only a single value gets used. You would have to modify the function to allow what you want. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample size in box plot labels
Hi actually it shall be the same result as table(DATA$UnitName_1) Both approaches does not work if there are NAs in your data. tapply(DATA$K_Merge, DATA$UnitName_1, FUN = function(x) sum(!is.na(x))) consideres also NA values. Regards Petr ---Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Shane Carey Sent: Friday, April 26, 2013 1:09 PM To: Rui Barradas Cc: r-help@r-project.org Subject: Re: [R] sample size in box plot labels This works, great. Cheers On Fri, Apr 26, 2013 at 12:02 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, To count the sample sizes for each factor try tapply(DATA$K_Merge, DATA$UnitName_1, FUN = length) Hope this helps, Rui Barradas Em 26-04-2013 10:48, Shane Carey escreveu: Hi, I would like to put the sample number beside each lable in a boxplot. How do I do this? Essentially, I need to count the sample size for each factor, see below: Thanks boxplot(DATA$K_Merge~factor(**DATA$UnitName_1),axes=FALSE,**col=colou rs) title(main=list(Tukey Boxplot by Geology:\n K(%),cex=cexlb)) axis(1, 1:21, labels=FALSE, las=2) text(seq(1, 21, by=1), par(usr)[3], labels = levels(factor(DATA$UnitName_1)**), srt = 45, adj = c(1.03,1.03), xpd = TRUE, cex=1.8) axis(2, seq(-1,5, 1), seq(-1, 5, 1)) -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorized code for generating the Kac (Clement) matrix
On Thu, 25 Apr 2013, Ravi Varadhan ravi.varad...@jhu.edu writes: Hi, I am generating large Kac matrices (also known as Clement matrix). This a tridiagonal matrix. I was wondering whether there is a vectorized solution that avoids the `for' loops to the following code: n - 1000 Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 The above code is fast, but I am curious about vectorized ways to do this. Thanks in advance. Best, Ravi This may be a bit faster; but as Berend and you said, the original function seems already fast. n - 5000 f1 - function(n) { Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 Kacmat } f3 - function(n) { n1 - n + 1L res - numeric(n1 * n1) dim(res) - c(n1, n1) bw - n:1L ## bw = backward, fw = forward fw - seq_len(n) res[cbind(fw, fw + 1L)] - bw res[cbind(fw + 1L, fw)] - fw res } system.time(K1 - f1(n)) system.time(K3 - f3(n)) identical(K3, K1) ##user system elapsed ## 0.132 0.028 0.161 ## ##user system elapsed ## 0.024 0.048 0.071 ## -- Enrico Schumann Lucerne, Switzerland http://enricoschumann.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
Seconded John Kane Kingston ON Canada -Original Message- From: rolf.tur...@xtra.co.nz Sent: Fri, 26 Apr 2013 10:13:52 +1200 To: thern...@mayo.edu Subject: Re: [R] Trouble Computing Type III SS in a Cox Regression On 26/04/13 03:40, Terry Therneau wrote: (In response to a question about computing type III sums of squares in a Cox regression): SNIP If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Fortune nomination! cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can a column of a list be called?
Please read An Introduction to R or other basic R tutorial to learn basic R operations before posting. Please read the posting guide (link at bottom) or other similar online guides for how to post a coherent question that will elicit an accurate and helpful answer. -- Bert On Thu, Apr 25, 2013 at 5:24 PM, Jana Makedonska jmakedon...@gmail.com wrote: Hello Everyone, I would like to know if I can call one of the columns of a list, to use it as a variable in a function. Thanks in advance for any advice! Jana -- Jana Makedonska, B.Sc. Biology, Universite Paul Sabatier Toulouse III M.Sc. Paleontology, Paleobiology and Phylogeny, Universite de Montpellier II Ph.D. candidate in Physical Anthropology and Part-time lecturer Department of Anthropology College of Arts Sciences State University of New York at Albany 1400 Washington Avenue 1 Albany, NY Office phone: 518-442-4699 http://electricsongs.academia.edu/JanaMakedonska http://www.youtube.com/watch?v=OHbT9VvtonMhttp://www.youtube.com/watch?v=jRoMoLjzpf4list=PL5BF6ACDCC2E4AAA0index=7 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting data.frame and saving to csv files
Hint: nm - substring(names(df). 1,3) gives the first 3 letters of the names, assuming this is the info needed for classifying the names -- you were not explicit about this. If some sort of pattern is used, ?grep may be what you need. You can then pick columns from df by e.g. loopingt through unique(nm)... etc. -- Bert On Fri, Apr 26, 2013 at 6:21 AM, Katherine Gobin katherine_go...@yahoo.com wrote: Dear R Forum, I have a data.frame as df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 2013-04-12, 2013-04-11), ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256), LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716), XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712), LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209), ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419), LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817), PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865), ABC_d = c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019), ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976)) df dateABC_fLMN_dXYZ_pLMN_aABC_e 1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784 2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652 3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482 4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821 5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561 LMN_c XYZ_zz PQRABC_dABC_m 1 45.44847 85.74755 71.22868 38.71090 40.28474 2 17.72362 63.48582 95.09995 93.48216 43.97076 3 36.76901 81.61107 83.62438 93.14432 47.38762 4 68.58913 58.15729 30.18525 78.27387 97.33573 5 35.80767 27.44133 25.81805 31.87170 22.06885 I need to identify columns with same labels and along-with the dates in the first column, save the columns in different csv files. E.g. in the above data frame, I have 4 columns beginning with ABC so I need to save these four columns with the date in the first column as ABC.csv, then LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. In my actual data.frame, I won't be aware how many such rates combinations are available. If there is no matching column as PQR, the PQR.csv file should have only date and PQR column. Kindly guide how do I split the data.frame and save the respective csv files. Regards Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove reciprocal data from a grouped animal social contact dataset
Cat, It seems risky to me to assume that one collar is always outperforming another one. I would think there would be some cases where one collar picked up on a contact that the other one missed AND that the other picked up on a contact that the one missed. If so, it may be best to keep all of the records, and use the time information to update the start and duration information (taking into account overlapping observations). Defining a unique pair is pretty easy. data - data.frame( record = 1:5, start = c(31/05/2012 04:48, 31/05/2012 05:30, 31/05/2012 05:51, 31/05/2012 05:56, 31/05/2012 06:02), duration = c(278, 3, 159, 47, 107), pair = c(CO1 CO2, CO1 CO2, CO2 CO1, CO2 CO1, CO2 CO1)) data$uniqpair - sapply(strsplit(data$pair, ), function(x) paste(sort(x), collapse= )) Detecting overlapping observations will be more challenging. As for your trouble with dput(head(data, 200)), perhaps the variable start is defined as a factor in your data frame, so dput() was listing all of the levels of start? Jean On Fri, Apr 26, 2013 at 4:08 AM, Cat Cowie cat.e.co...@gmail.com wrote: Hi r-help forum, I have been collecting contact data (with proximity logger collars) between a few different species of animal. All animals wear the collars, and any contact between the animals should be detected and recorded by both collars. However, this isn't always the case and more contacts may be recorded on one collar of the two. This is fine, it depends on battery life and other things that I will have to discuss! I now have each contact recorded as a 'group in 4 columns': head(data) recordstart duration pair 1 1 27/05/2012 04:40 4948 CO1 CO12 2 2 31/05/2012 04:48 278 CO1 CO12 3 3 31/05/2012 05:303 CO1 CO12 4 4 31/05/2012 05:51 159 CO1 CO12 5 5 31/05/2012 05:56 47 CO1 CO12 6 6 31/05/2012 06:02 107 CO1 CO12 The first column shows the record number, the second shows the start date and time of the contact, the third shows the contact duration and the fourth shows the pair of animals involved in the contact. In this case the top 6 contacts are all between animals CO1 and CO12. There are nearly 100,000 records. There were many animals that could have contacted each other: animals animals 1 CO1 2 CO2 3 CO3 4 CO4 5 CO5 6 CO6 7 CO7 8 CO8 9 CO9 10CO10 11CO11 12CO12 13CO13 14CO14 15CO15 16CO16 17CO17 18 PO1 19 PO2 20 PO3 21 PO4 22 PO5 23 PO6 24 PO7 25 PO8 26 PO9 27PO10 28PO11 29PO12 30PO13 31 PI1 32 PI2 33 PI3 34 PI4 35 PI5 36 PI6 37 PI7 38 PI8 39 RD1 40 RD2 41 WB1 42 WB2 Because both collars may have recorded the single contact, I need to remove the reciprocal contacts from this dataset. For example, you may have records for CO1 CO2 that are mirrored by records for CO2 CO1. If there are the same number of records it doesn't matter which of these you select, as long as only one set is used for further analysis. Where there is an unequal number of contacts recorded on the two collars between a pair, I would like to select the records which have the most contacts. So, if there were 10 records recorded for CO1 CO2 and 15 for CO2 CO1 I would like to reject the first 10 contacts and retain the 15. There are some cases where only one version of the group is recorded, e.g. just CO1 CO3, with no reciprocal CO3 CO1. In this case I would like to retain the data that I have. I would normally like to present you with my attempts so far but as a relatively new (but enthusiastic!) R user I am struggling to know where to start. I present more data here... sadly dput(head(data, 200) is printing all the dates (all nearly 100,000 of them, regardless of using head()!) so I hope this is ok for now: head(data,300) recordstart duration pair 11 27/05/2012 04:40 4948 CO1 CO12 22 31/05/2012 04:48 278 CO1 CO12 33 31/05/2012 05:303 CO1 CO12 44 31/05/2012 05:51 159 CO1 CO12 55 31/05/2012 05:56 47 CO1 CO12 66 31/05/2012 06:02 107 CO1 CO12 77 31/05/2012 06:08 86 CO1 CO12 88 31/05/2012 06:11 194 CO1 CO12 99 31/05/2012 06:20 87 CO1 CO12 10 10 31/05/2012 06:24 12 CO1 CO12 11 11 31/05/2012 06:32 11 CO1 CO12 12 12 31/05/2012 06:40 227 CO1 CO12 13 13 31/05/2012 06:47 115 CO1 CO12 14 14 12/04/2011 13:39 109 CO1 CO15 15 15 12/04/2011 22:293 CO1 CO15 16 16 12/04/2011 22:45 44 CO1 CO15 17 17 12/04/2011 23:20 55 CO1 CO15 18 18 13/04/2011 02:50 58 CO1 CO15 19 19 13/04/2011 03:15 11 CO1 CO15 20 20 13/04/2011 05:38 65 CO1 CO15 21
[R] Stepwise regression for multivariate case in R?
Hi! I am trying to make a stepwise regression in the multivariate case, using Wilks' Lambda test. I've tried this: greedy.wilks(cbind(Y1,Y2) ~ . , data=my.data ) But it only returns: Error in model.frame.default(formula = X[, j] ~ grouping, drop.unused.levels = TRUE) : variable lengths differ (found for 'grouping') What can be wrong here? I have checked and all variables in my.data is of the same length. //Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove reciprocal data from a grouped animal social contact dataset
Hi See https://github.com/hongqin/RCompBio/blob/master/48states/48states-permutation-igraph.r and http://www.youtube.com/watch?v=GE2l3LYDQG0 Hope they are useful, Hong Qin On Fri, Apr 26, 2013 at 5:08 AM, Cat Cowie cat.e.co...@gmail.com wrote: Hi r-help forum, I have been collecting contact data (with proximity logger collars) between a few different species of animal. All animals wear the collars, and any contact between the animals should be detected and recorded by both collars. However, this isn't always the case and more contacts may be recorded on one collar of the two. This is fine, it depends on battery life and other things that I will have to discuss! I now have each contact recorded as a 'group in 4 columns': head(data) recordstart duration pair 1 1 27/05/2012 04:40 4948 CO1 CO12 2 2 31/05/2012 04:48 278 CO1 CO12 3 3 31/05/2012 05:303 CO1 CO12 4 4 31/05/2012 05:51 159 CO1 CO12 5 5 31/05/2012 05:56 47 CO1 CO12 6 6 31/05/2012 06:02 107 CO1 CO12 The first column shows the record number, the second shows the start date and time of the contact, the third shows the contact duration and the fourth shows the pair of animals involved in the contact. In this case the top 6 contacts are all between animals CO1 and CO12. There are nearly 100,000 records. There were many animals that could have contacted each other: animals animals 1 CO1 2 CO2 3 CO3 4 CO4 5 CO5 6 CO6 7 CO7 8 CO8 9 CO9 10CO10 11CO11 12CO12 13CO13 14CO14 15CO15 16CO16 17CO17 18 PO1 19 PO2 20 PO3 21 PO4 22 PO5 23 PO6 24 PO7 25 PO8 26 PO9 27PO10 28PO11 29PO12 30PO13 31 PI1 32 PI2 33 PI3 34 PI4 35 PI5 36 PI6 37 PI7 38 PI8 39 RD1 40 RD2 41 WB1 42 WB2 Because both collars may have recorded the single contact, I need to remove the reciprocal contacts from this dataset. For example, you may have records for CO1 CO2 that are mirrored by records for CO2 CO1. If there are the same number of records it doesn't matter which of these you select, as long as only one set is used for further analysis. Where there is an unequal number of contacts recorded on the two collars between a pair, I would like to select the records which have the most contacts. So, if there were 10 records recorded for CO1 CO2 and 15 for CO2 CO1 I would like to reject the first 10 contacts and retain the 15. There are some cases where only one version of the group is recorded, e.g. just CO1 CO3, with no reciprocal CO3 CO1. In this case I would like to retain the data that I have. I would normally like to present you with my attempts so far but as a relatively new (but enthusiastic!) R user I am struggling to know where to start. I present more data here... sadly dput(head(data, 200) is printing all the dates (all nearly 100,000 of them, regardless of using head()!) so I hope this is ok for now: head(data,300) recordstart duration pair 11 27/05/2012 04:40 4948 CO1 CO12 22 31/05/2012 04:48 278 CO1 CO12 33 31/05/2012 05:303 CO1 CO12 44 31/05/2012 05:51 159 CO1 CO12 55 31/05/2012 05:56 47 CO1 CO12 66 31/05/2012 06:02 107 CO1 CO12 77 31/05/2012 06:08 86 CO1 CO12 88 31/05/2012 06:11 194 CO1 CO12 99 31/05/2012 06:20 87 CO1 CO12 10 10 31/05/2012 06:24 12 CO1 CO12 11 11 31/05/2012 06:32 11 CO1 CO12 12 12 31/05/2012 06:40 227 CO1 CO12 13 13 31/05/2012 06:47 115 CO1 CO12 14 14 12/04/2011 13:39 109 CO1 CO15 15 15 12/04/2011 22:293 CO1 CO15 16 16 12/04/2011 22:45 44 CO1 CO15 17 17 12/04/2011 23:20 55 CO1 CO15 18 18 13/04/2011 02:50 58 CO1 CO15 19 19 13/04/2011 03:15 11 CO1 CO15 20 20 13/04/2011 05:38 65 CO1 CO15 21 21 13/04/2011 08:55 122 CO1 CO15 22 22 13/04/2011 11:064 CO1 CO15 23 23 13/04/2011 13:47 53 CO1 CO15 24 24 13/04/2011 13:57 32 CO1 CO15 25 25 13/04/2011 14:32 16 CO1 CO15 26 26 13/04/2011 14:414 CO1 CO15 27 27 13/04/2011 21:53 33 CO1 CO15 28 28 14/04/2011 01:00 41 CO1 CO15 29 29 14/04/2011 01:075 CO1 CO15 30 30 14/04/2011 01:462 CO1 CO15 31 31 14/04/2011 06:433 CO1 CO15 32 32 14/04/2011 08:443 CO1 CO15 33 33 14/04/2011 08:51 64 CO1 CO15 34 34 14/04/2011 13:596 CO1 CO15 35 35 14/04/2011 14:11 11 CO1 CO15 36 36 14/04/2011 14:36 169 CO1 CO15 37 37 14/04/2011 14:42 19 CO1 CO15 38 38 14/04/2011 15:04 48 CO1 CO15 39 39
Re: [R] Looping through names of both dataframes and column-names
Much thanks Blaser. That worked perfectly. This will improve my code considerably. Greatly appreciated. Regards, Dan On Fri, Apr 26, 2013 at 3:48 AM, Blaser Nello nbla...@ispm.unibe.ch wrote: Here are two possible ways to do it: This would simplify your code a bit. But it changes the names of x_cs to cs.x. for (df in nls) { assign(df, cbind(get(df), cs=apply(get(df), 2, cumsum))) } This is closer to what you have done. for (df in nls) { print(df) for (var in names(get(df))) { print(var) assign(df, within(get(df), assign(paste0(var,_cs), cumsum(get(df)[[var]] }} ls()[grep(df_,ls())] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Daniel Egan Sent: Donnerstag, 25. April 2013 22:19 To: r-help@r-project.org Subject: [R] Looping through names of both dataframes and column-names Hello all, This seems like a pretty standard question - suppose I want to loop through a set of similar data-frames, with similar variables, and create new variables within them: nl-seq(1,5)for (i in nl) { assign(paste0(df_,nl[i]),data.frame(x=seq(1:10),y=rnorm(10)))} ls()[grep(df_,ls())] nls-ls()[grep(df_,ls())]for (df in nls) { print(df) for (var in names(get(df))) { print(var) assign(paste0(df,$,paste0(var,_cs)),cumsum(get(df)[[var]])) }} ls()[grep(df_,ls())] The code above *almost* works, except that it creates a whole bunch of objects of the form df_1$x_cs,df_1$yx_cs . What I want is 5 dataframes, with the $ elements enclosed, as usual. Any help or guidance would be appreciated. Much thanks, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- [image: Betterment]*Daniel Egan |* Director of Behavioral Finance and Investing at Betterment.com http://betterment.com/ | Follow us on Twitterhttp://twitter.com/betterment and Facebook http://www.facebook.com/betterment contact | d...@betterment.com - Office: 212.228.1328 - Mobile: 347-931-4897 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series plot: x-axis problem
Hi, labs - format(as.Date(time(rr)), %b-%Y) #Error in as.Date.default(time(rr)) : # do not know how to convert 'time(rr)' to class “Date” #I guess this needs library(zoo) library(zoo) labs - format(as.Date(time(rr)), %b-%Y) sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) #or z- zoo(rr) lab1-as.yearmon(index(z)) plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), lab1, cex.axis = .9, tcl = -.5, las = 2) A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: Jerry i89...@gmail.com Cc: r-help@r-project.org Sent: Friday, April 26, 2013 5:25 AM Subject: Re: [R] time series plot: x-axis problem Hello, Try the following. (rr=ts(rr,start=c(2012,5),frequency=12)) plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) labs - format(as.Date(time(rr)), %b-%Y) axis(1, time(rr), labs, cex.axis = .9, tcl = -.5, las = 2) Hope this helps, Rui Barradas Em 25-04-2013 19:11, Jerry escreveu: Hi, I'm trying to plot a simple time series. I'm running into an issue with x-axis The codes below will produce a plot with correct x-axis showing from Jan to Dec rr=c(3,2,4,5,4,5,3,3,6,2,4,2) (rr=ts(rr,start=c(2012,1),frequency=12)) win.graph(width=6.5, height=2.5,pointsize=8) plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) However, if I change the start point from Jan 2012 to May 2012, which is (rr=ts(rr,start=c(2012,5),frequency=12)) Then run the codes below plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) In the the new plot produced, x-axis is still showing from Jan to Dec, not from May to April as I desired. How to fix x-axis? Is it possible to fix it WITHOUT modifying the object rr? Also, ideally, I would like to have each time point on x-axis showing month/year, not just month. How to do that? Any help and input will be much appreciated! Thanks Jerry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorized code for generating the Kac (Clement) matrix
On 26-04-2013, at 14:42, Enrico Schumann e...@enricoschumann.net wrote: On Thu, 25 Apr 2013, Ravi Varadhan ravi.varad...@jhu.edu writes: Hi, I am generating large Kac matrices (also known as Clement matrix). This a tridiagonal matrix. I was wondering whether there is a vectorized solution that avoids the `for' loops to the following code: n - 1000 Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 The above code is fast, but I am curious about vectorized ways to do this. Thanks in advance. Best, Ravi This may be a bit faster; but as Berend and you said, the original function seems already fast. n - 5000 f1 - function(n) { Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 Kacmat } f3 - function(n) { n1 - n + 1L res - numeric(n1 * n1) dim(res) - c(n1, n1) bw - n:1L ## bw = backward, fw = forward fw - seq_len(n) res[cbind(fw, fw + 1L)] - bw res[cbind(fw + 1L, fw)] - fw res } system.time(K1 - f1(n)) system.time(K3 - f3(n)) identical(K3, K1) ##user system elapsed ## 0.132 0.028 0.161 ## ##user system elapsed ## 0.024 0.048 0.071 ## Using some of your code in my function I was able to speed up my function f2. Complete code: f1 - function(n) { #Ravi Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 1:n) Kacmat[i+1, i] - i Kacmat } f2 - function(n) { # Berend 1 modified to use 1L Kacmat - matrix(0, n+1, n+1) Kacmat[row(Kacmat)==col(Kacmat)-1L] - n:1L Kacmat[row(Kacmat)==col(Kacmat)+1L] - 1L:n Kacmat } f3 - function(n) { # Enrico n1 - n + 1L res - numeric(n1 * n1) dim(res) - c(n1, n1) bw - n:1L ## bw = backward, fw = forward fw - seq_len(n) res[cbind(fw, fw + 1L)] - bw res[cbind(fw + 1L, fw)] - fw res } f4 - function(n) {# Berend 2 using which with arr.ind=TRUE Kacmat - matrix(0, n+1, n+1) k1 - which(row(Kacmat)==col(Kacmat)-1L, arr.ind=TRUE) k2 - which(row(Kacmat)==col(Kacmat)+1L, arr.ind=TRUE) Kacmat[k1] - n:1L Kacmat[k2] - 1L:n Kacmat } library(compiler) f1.c - cmpfun(f1) f2.c - cmpfun(f2) f3.c - cmpfun(f3) f4.c - cmpfun(f4) f1(n) f2(n) n - 5000 system.time(K1 - f1(n)) system.time(K2 - f2(n)) system.time(K3 - f3(n)) system.time(K4 - f4(n)) system.time(K1c - f1.c(n)) system.time(K2c - f2.c(n)) system.time(K3c - f3.c(n)) system.time(K4c - f4.c(n)) identical(K2,K1) identical(K3,K1) identical(K4,K1) identical(K1c,K1) identical(K2c,K2) identical(K3c,K3) identical(K4c,K4) Result: # system.time(K1 - f1(n)) #user system elapsed # 0.387 0.120 0.511 # system.time(K2 - f2(n)) #user system elapsed # 3.541 0.702 4.250 # system.time(K3 - f3(n)) #user system elapsed # 0.108 0.089 0.199 # system.time(K4 - f4(n)) #user system elapsed # 1.975 0.355 2.336 # # system.time(K1c - f1.c(n)) #user system elapsed # 0.323 0.120 0.445 # system.time(K2c - f2.c(n)) #user system elapsed # 3.374 0.422 3.807 # system.time(K3c - f3.c(n)) #user system elapsed # 0.107 0.098 0.205 # system.time(K4c - f4.c(n)) #user system elapsed # 1.816 0.384 2.203 # identical(K2,K1) # [1] TRUE # identical(K3,K1) # [1] TRUE # identical(K4,K1) # [1] TRUE # identical(K1c,K1) # [1] TRUE # identical(K2c,K2) # [1] TRUE # identical(K3c,K3) # [1] TRUE # identical(K4c,K4) # [1] TRUE So Ravi's original and Enrico's versions are the quickest. Using which with arr.ind made my version run a lot quicker. All in all an interesting exercise. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls: example code throws error
Keith Jewell k.jewell at campden.co.uk writes: Others have pointed out that the error is probably from an unclean environment. Completely OT, but an unclean environment sounds sort of scary to me. Like it contains zombies or something. I don't know a better, short way to express the idea though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted Principle Components analysis
When you run an unweighted analysis on all three systems, do the scores agree? I would have expected that replicating the observations would give you similar results. You might be able to run the weighted analysis using princomp() instead of principal since you can supply data and a covariance matrix (but the manual page does not specifically mention supplying a correlation matrix - you might have to run the analysis on standardized variables). - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dimitri Liakhovitski Sent: Friday, April 26, 2013 6:32 AM To: r-help Subject: Re: [R] Weighted Principle Components analysis The reason for my asking is because I have to replicate the same analysis done in SPSS and SAS. Again, to make it clear - it's respondent-weighted Factor Analysis with a desired number of factors. Method of extraction: Principal Components. Rotation: Varimax. The only solution I can think of is to multiply my respondent weight by 10 (or by 100) and round it so that the new weight has no decimals, then repeat every row as many times as the new weight says and run regular, unweighted principal on the new data. I've done it - but again, this does not match the Factor Scores from SPSS and SAS exactly. Any other ideas? Thank you! On Thu, Apr 25, 2013 at 9:21 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Hello! I am doing Principle Componenets Analysis using psych package: mypc-principal(mydata,5,scores=TRUE) However, I was asked to run a case-weighted PCA - using an individual weight for each case. I could use corr from boot package to calculate the case-weighed intercorrelation matrix. But if I use the intercorrelation matrix as input (instead of the raw data), I am not going to get factor scores, which I do need to get. Any advice? Thank you very much! -- Dimitri Liakhovitski -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls: example code throws error
On 13-04-26 10:14 AM, Ben Bolker wrote: Keith Jewell k.jewell at campden.co.uk writes: Others have pointed out that the error is probably from an unclean environment. Completely OT, but an unclean environment sounds sort of scary to me. Like it contains zombies or something. Isn't that accurate? Undead objects causing your code to be full of bugs? Duncan Murdoch I don't know a better, short way to express the idea though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print occurrence / positions of words
I have tried some different packages in order to build a R program which will take as input a text file, produce a list of the words inside that file. Each word should have a vector with all the places that this word exist in the file. How about txt - paste(rep(this is a nice text with nice characters, 3), But this is not, collapse= ) library(stringr) txt.vec -str_split(txt, [^[:alnum:]_]+)[[1]] #vector of all the words in their original sequence tapply(1:length(txt.vec), txt.vec, c) #Returns a list of vectors of locations of each word, sorted alphabetically S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] time series plot: x-axis problem
Hello, Em 26-04-2013 14:30, arun escreveu: Hi, labs - format(as.Date(time(rr)), %b-%Y) #Error in as.Date.default(time(rr)) : # do not know how to convert 'time(rr)' to class “Date” #I guess this needs library(zoo) You're right, I forgot because it was already loaded prior to running the code. Apologies to the OP. Rui Barradas library(zoo) labs - format(as.Date(time(rr)), %b-%Y) sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) #or z- zoo(rr) lab1-as.yearmon(index(z)) plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), lab1,cex.axis = .9, tcl = -.5, las = 2) A.K. - Original Message - From: Rui Barradas ruipbarra...@sapo.pt To: Jerry i89...@gmail.com Cc: r-help@r-project.org Sent: Friday, April 26, 2013 5:25 AM Subject: Re: [R] time series plot: x-axis problem Hello, Try the following. (rr=ts(rr,start=c(2012,5),frequency=12)) plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) labs - format(as.Date(time(rr)), %b-%Y) axis(1, time(rr), labs,cex.axis = .9, tcl = -.5, las = 2) Hope this helps, Rui Barradas Em 25-04-2013 19:11, Jerry escreveu: Hi, I'm trying to plot a simple time series. I'm running into an issue with x-axis The codes below will produce a plot with correct x-axis showing from Jan to Dec rr=c(3,2,4,5,4,5,3,3,6,2,4,2) (rr=ts(rr,start=c(2012,1),frequency=12)) win.graph(width=6.5, height=2.5,pointsize=8) plot(rr, xlab=2012, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) However, if I change the start point from Jan 2012 to May 2012, which is (rr=ts(rr,start=c(2012,5),frequency=12)) Then run the codes below plot(rr, xlab=2012 - 2013, ylab=event freq, xaxt = n, col=blue) axis(1, time(rr), rep(substr(month.abb, 1, 3), length = length(rr)), cex.axis = .9, tcl = -.5, las = 2) In the the new plot produced, x-axis is still showing from Jan to Dec, not from May to April as I desired. How to fix x-axis? Is it possible to fix it WITHOUT modifying the object rr? Also, ideally, I would like to have each time point on x-axis showing month/year, not just month. How to do that? Any help and input will be much appreciated! Thanks Jerry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
One might wonder if the Excel error was indeed THAT or perhaps a way to get the desired results, give the other issues in their analysis? The prior for the incompetence/malice question is usually best set pretty heavily in favour of incompetence ... S *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
The prior for the incompetence/malice question is usually best set pretty heavily in favour of incompetence ... The following comment on economic research is from a 2010 article in the Atlantic reviewing John Ioannidis' work. http://www.theatlantic.com/magazine/print/2010/11/lies-damned-lies-and-medical-science/308269/ Medical research is not especially plagued with wrongness. Other meta-research experts have confirmed that similar issues distort research in all fields of science, from physics to economics (where the highly regarded economists J. Bradford DeLong and Kevin Lang once showed how a remarkably consistent paucity of strong evidence in published economics studies made it unlikely that any of them were right). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of S Ellison Sent: Friday, April 26, 2013 9:08 AM To: Thomas Adams; peter dalgaard Cc: r-help Subject: Re: [R] the joy of spreadsheets (off-topic) One might wonder if the Excel error was indeed THAT or perhaps a way to get the desired results, give the other issues in their analysis? The prior for the incompetence/malice question is usually best set pretty heavily in favour of incompetence ... S *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
From a quick read, the Excel error prior for incompetence looks high but some of the other issues hint that the prior for the overall findings was remarkably in favor of malice. John Kane Kingston ON Canada -Original Message- From: s.elli...@lgcgroup.com Sent: Fri, 26 Apr 2013 17:07:55 +0100 To: tea...@gmail.com, pda...@gmail.com Subject: Re: [R] the joy of spreadsheets (off-topic) One might wonder if the Excel error was indeed THAT or perhaps a way to get the desired results, give the other issues in their analysis? The prior for the incompetence/malice question is usually best set pretty heavily in favour of incompetence ... S *** This email and any attachments are confidential. Any =...{{dropped:15}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
From a quick read, the Excel error prior for incompetence looks high but some of the other issues hint that the prior for the overall findings was remarkably in favor of malice. That's p(malice|evidence), not p(malice); surely that must be the posterior? ;-) 'tain't a great advert for economics either way, though, however much fun it may be to apply Bayes theorem (badly, in my case) to analyse it. Steve E *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decomposing a List
You might add vapply() to you repertoire, as it is quicker than sapply but also does some error checking on the your input data. E.g., your f2 returns a matrix whose columns are the elements of the list l and you assume that there each element of l contains 2 character strings. f2 - function(l)matrix(unlist(l),nr=2) Here is a function based on vapply() the returns the same thing but also verifies that element of l is really a 2-long character vector. f2v - function (l) vapply(l, function(x) x, FUN.VALUE = character(2)) and a function to generate datasets of various sizes makeL - function(n)strsplit(paste(sample(LETTERS,n,rep=TRUE),sample(1:10,n,rep=TRUE),sep=+),+,fix=TRUE) Timing the functions on a million-long list I get l - makeL(n=10^6) system.time( r2 - f2(l) ) user system elapsed 0.088 0.000 0.090 system.time( r2v - f2v(l) ) user system elapsed 0.920.000.92 identical(r2, r2v) [1] TRUE vapply() is ten times slower than unlist() but three times faster than sapply(x,function(x)x). However, when you give it data that doesn't meet your expectations, which is common when using strsplit(), f2v tells you about the problem and f2 gives you an incorrect result: l[[10]] - c(a,b,c,d) system.time( r2v - f2v(l) ) Error in vapply(l, function(x) x, FUN.VALUE = character(2)) : values must be length 2, but FUN(X[[10]]) result is length 4 Timing stopped at: 0.004 0 0.002 system.time( rv - f2(l) ) user system elapsed 0.088 0.008 0.095 dim(rv) # you will have alignment problems later [1] 2 101 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Bert Gunter Sent: Thursday, April 25, 2013 7:54 AM To: ted.hard...@wlandres.net Cc: R mailing list Subject: Re: [R] Decomposing a List Well, what you really want to do is convert the list to a matrix, and it can be done directly and considerably faster than with the (implicit) looping of sapply: f1 - function(l)sapply(l,[,1) f2 - function(l)matrix(unlist(l),nr=2) l - strsplit(paste(sample(LETTERS,1e6,rep=TRUE),sample(1:10,1e6,rep=TRUE),sep=+),+,f ix=TRUE) ## Then you get these results: system.time(x1 - f1(l)) user system elapsed 1.920.011.95 system.time(x2 - f2(l)) user system elapsed 0.060.020.08 system.time(x2 - f2(l)[1,]) user system elapsed 0.1 0.0 0.1 identical(x1,x2) [1] TRUE Cheers, Bert On Thu, Apr 25, 2013 at 3:32 AM, Ted Harding ted.hard...@wlandres.net wrote: Thanks, Jorge, that seems to work beautifully! (Now to try to understand why ... but that's for later). Ted. On 25-Apr-2013 10:21:29 Jorge I Velez wrote: Dear Dr. Harding, Try sapply(L, [, 1) sapply(L, [, 2) HTH, Jorge.- On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding ted.hard...@wlandres.netwrote: Greetings! For some reason I am not managing to work out how to do this (in principle) simple task! As a result of applying strsplit() to a vector of character strings, I have a long list L (N elements), where each element is a vector of two character strings, like: L[1] = c(A1,B1) L[2] = c(A2,B2) L[3] = c(A3,B3) [etc.] From L, I wish to obtain (as directly as possible, e.g. avoiding a loop) two vectors each of length N where one contains the strings that are first in the pair, and the other contains the strings which are second, i.e. from L (as above) I would want to extract: V1 = c(A1,A2,A3,...) V2 = c(B1,B2,B3,...) Suggestions? With thanks, Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 25-Apr-2013 Time: 11:16:46 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 25-Apr-2013 Time: 11:31:57 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb- biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list
Re: [R] Stepwise regression for multivariate case in R?
Since stepwise methods do not work as advertised in the univariate case I'm wondering why they should work in the multivariate case. Frank Jonathan Jansson wrote Hi! I am trying to make a stepwise regression in the multivariate case, using Wilks' Lambda test. I've tried this: greedy.wilks(cbind(Y1,Y2) ~ . , data=my.data ) But it only returns: Error in model.frame.default(formula = X[, j] ~ grouping, drop.unused.levels = TRUE) : variable lengths differ (found for 'grouping') What can be wrong here? I have checked and all variables in my.data is of the same length. //Jonathan [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Stepwise-regression-for-multivariate-case-in-R-tp4665505p4665526.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NMDS in Vegan: problems in stressplot, best solution
Hello, I can draw a basic stress plot for NMDS with the following code in package Vegan. stressplot(parth.mds, parth.dis) When I try to specify the line and point types, it gives me error message. stressplot(parth.mds, parth.dis, pch=1, p.col=gray, lwd=2, l.col=red) Error in plot.xy(xy, type, ...) : invalid plot type In the above code, if I removed line type, it does give me the plot only of points with my choice of type. stressplot(parth.mds, parth.dis, pch=1, p.col=gray) Why cannot I define both line and point at the same time? If I have 100 iterations for metaMDS, then when I plot the result, does it give me result from best solution? How do I know that. Can you plot the Stress by Iteration number? parth.mds - metaMDS(WorldPRSenv, distance = bray, k = 2, trymax = 100, engine = c(monoMDS, isoMDS), autotransform =TRUE, wascores = TRUE, expand = TRUE, trace = 2) plot(parth.mds, type = p) Thanks in advance, Kumar -- Section of Integrative Biology University of Texas at Austin Austin, Texas 78712, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Read big data (3G ) methods ?
Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting data.frame and saving to csv files
Hi, You can do this: lst1-lapply(split(colnames(df)[-1],gsub(_.*,,colnames(df)[-1])),function(x) {x1-cbind(date=df[,1],df[,x]);colnames(x1)[-1]- x;x1}) lst1 #$ABC # date ABC_f ABC_e ABC_d ABC_m #1 2013-04-15 62.80740 11.36784 38.71090 40.28474 #2 2013-04-14 81.04526 62.29652 93.48216 43.97076 #3 2013-04-13 84.65712 47.63482 93.14432 47.38762 #4 2013-04-12 12.78237 32.27821 78.27387 97.33573 #5 2013-04-11 57.61345 52.12561 31.87170 22.06885 # #$LMN # date LMN_d LMN_a LMN_c #1 2013-04-15 21.16794 56.67684 45.44847 #2 2013-04-14 54.65804 25.81530 17.72362 #3 2013-04-13 63.89233 40.12268 36.76901 #4 2013-04-12 87.59880 35.74175 68.58913 #5 2013-04-11 87.07694 47.95892 35.80767 # #$PQR # date PQR #[1,] 5 71.22868 #[2,] 4 95.09995 #[3,] 3 83.62438 #[4,] 2 30.18525 #[5,] 1 25.81805 # #$XYZ # date XYZ_p XYZ_zz #1 2013-04-15 55.88855 85.74755 #2 2013-04-14 94.13587 63.48582 #3 2013-04-13 84.00891 81.61107 #4 2013-04-12 98.99747 58.15729 #5 2013-04-11 64.71084 27.44133 lapply(seq_along(lst1),function(i) write.csv(lst1[[i]],file=paste0(names(lst1[i]),.csv),row.names=FALSE)) A.K. - Original Message - From: Katherine Gobin katherine_go...@yahoo.com To: r-help@r-project.org Cc: Sent: Friday, April 26, 2013 9:21 AM Subject: [R] Splitting data.frame and saving to csv files Dear R Forum, I have a data.frame as df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 2013-04-12, 2013-04-11), ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256), LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716), XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712), LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209), ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419), LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817), PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865), ABC_d = c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019), ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976)) df date ABC_f LMN_d XYZ_p LMN_a ABC_e 1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784 2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652 3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482 4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821 5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561 LMN_c XYZ_zz PQR ABC_d ABC_m 1 45.44847 85.74755 71.22868 38.71090 40.28474 2 17.72362 63.48582 95.09995 93.48216 43.97076 3 36.76901 81.61107 83.62438 93.14432 47.38762 4 68.58913 58.15729 30.18525 78.27387 97.33573 5 35.80767 27.44133 25.81805 31.87170 22.06885 I need to identify columns with same labels and along-with the dates in the first column, save the columns in different csv files. E.g. in the above data frame, I have 4 columns beginning with ABC so I need to save these four columns with the date in the first column as ABC.csv, then LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. In my actual data.frame, I won't be aware how many such rates combinations are available. If there is no matching column as PQR, the PQR.csv file should have only date and PQR column. Kindly guide how do I split the data.frame and save the respective csv files. Regards Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting data.frame and saving to csv files
Hi, Just noticed a mistake: lst1 should be: lst1-lapply(split(colnames(df)[-1],gsub(_.*,,colnames(df)[-1])),function(x) {x1-cbind(date=df[,1],df[,x]); cbind(date=df[,1],df[x])}) lst1 #$ABC # date ABC_f ABC_e ABC_d ABC_m #1 2013-04-15 62.80740 11.36784 38.71090 40.28474 #2 2013-04-14 81.04526 62.29652 93.48216 43.97076 #3 2013-04-13 84.65712 47.63482 93.14432 47.38762 #4 2013-04-12 12.78237 32.27821 78.27387 97.33573 #5 2013-04-11 57.61345 52.12561 31.87170 22.06885 # #$LMN # date LMN_d LMN_a LMN_c #1 2013-04-15 21.16794 56.67684 45.44847 #2 2013-04-14 54.65804 25.81530 17.72362 #3 2013-04-13 63.89233 40.12268 36.76901 #4 2013-04-12 87.59880 35.74175 68.58913 #5 2013-04-11 87.07694 47.95892 35.80767 # #$PQR # date PQR #1 2013-04-15 71.22868 #2 2013-04-14 95.09995 #3 2013-04-13 83.62438 #4 2013-04-12 30.18525 #5 2013-04-11 25.81805 # #$XYZ # date XYZ_p XYZ_zz #1 2013-04-15 55.88855 85.74755 #2 2013-04-14 94.13587 63.48582 #3 2013-04-13 84.00891 81.61107 #4 2013-04-12 98.99747 58.15729 #5 2013-04-11 64.71084 27.44133 A.K. - Original Message - From: arun smartpink...@yahoo.com To: Katherine Gobin katherine_go...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, April 26, 2013 9:45 AM Subject: Re: [R] Splitting data.frame and saving to csv files Hi, You can do this: lst1-lapply(split(colnames(df)[-1],gsub(_.*,,colnames(df)[-1])),function(x) {x1-cbind(date=df[,1],df[,x]);colnames(x1)[-1]- x;x1}) lst1 #$ABC # date ABC_f ABC_e ABC_d ABC_m #1 2013-04-15 62.80740 11.36784 38.71090 40.28474 #2 2013-04-14 81.04526 62.29652 93.48216 43.97076 #3 2013-04-13 84.65712 47.63482 93.14432 47.38762 #4 2013-04-12 12.78237 32.27821 78.27387 97.33573 #5 2013-04-11 57.61345 52.12561 31.87170 22.06885 # #$LMN # date LMN_d LMN_a LMN_c #1 2013-04-15 21.16794 56.67684 45.44847 #2 2013-04-14 54.65804 25.81530 17.72362 #3 2013-04-13 63.89233 40.12268 36.76901 #4 2013-04-12 87.59880 35.74175 68.58913 #5 2013-04-11 87.07694 47.95892 35.80767 # #$PQR # date PQR #[1,] 5 71.22868 #[2,] 4 95.09995 #[3,] 3 83.62438 #[4,] 2 30.18525 #[5,] 1 25.81805 # #$XYZ # date XYZ_p XYZ_zz #1 2013-04-15 55.88855 85.74755 #2 2013-04-14 94.13587 63.48582 #3 2013-04-13 84.00891 81.61107 #4 2013-04-12 98.99747 58.15729 #5 2013-04-11 64.71084 27.44133 lapply(seq_along(lst1),function(i) write.csv(lst1[[i]],file=paste0(names(lst1[i]),.csv),row.names=FALSE)) A.K. - Original Message - From: Katherine Gobin katherine_go...@yahoo.com To: r-help@r-project.org Cc: Sent: Friday, April 26, 2013 9:21 AM Subject: [R] Splitting data.frame and saving to csv files Dear R Forum, I have a data.frame as df = data.frame(date = c(2013-04-15, 2013-04-14, 2013-04-13, 2013-04-12, 2013-04-11), ABC_f = c(62.80739769,81.04525895,84.65712455,12.78237251,57.61345256), LMN_d = c(21.16794336,54.6580401,63.8923307,87.59880367,87.07693716), XYZ_p = c(55.8885464,94.1358684,84.0089114,98.99746696,64.71083712), LMN_a = c(56.6768395,25.81530198,40.12268441,35.74175237,47.95892209), ABC_e = c(11.36783959,62.29651784,47.63481552,32.27820673,52.12561419), LMN_c = c(45.4484695,17.72362438,36.7690054,68.58912931,35.80767235), XYZ_zz = c(85.74755089,63.48582415,81.61107212,58.1572924,27.44132817), PQR = c(71.22867519,95.09994812,83.62437819,30.18524735,25.81804865), ABC_d = c(38.71089816,93.48216193,93.14432203,78.2738731,31.87170019), ABC_m = c(40.28473769,43.97076327,47.38761559,97.33573412,22.06884976)) df date ABC_f LMN_d XYZ_p LMN_a ABC_e 1 2013-04-15 62.80740 21.16794 55.88855 56.67684 11.36784 2 2013-04-14 81.04526 54.65804 94.13587 25.81530 62.29652 3 2013-04-13 84.65712 63.89233 84.00891 40.12268 47.63482 4 2013-04-12 12.78237 87.59880 98.99747 35.74175 32.27821 5 2013-04-11 57.61345 87.07694 64.71084 47.95892 52.12561 LMN_c XYZ_zz PQR ABC_d ABC_m 1 45.44847 85.74755 71.22868 38.71090 40.28474 2 17.72362 63.48582 95.09995 93.48216 43.97076 3 36.76901 81.61107 83.62438 93.14432 47.38762 4 68.58913 58.15729 30.18525 78.27387 97.33573 5 35.80767 27.44133 25.81805 31.87170 22.06885 I need to identify columns with same labels and along-with the dates in the first column, save the columns in different csv files. E.g. in the above data frame, I have 4 columns beginning with ABC so I need to save these four columns with the date in the first column as ABC.csv, then LMN_d, LMN_a, LMN_c in the LMN.csv file as date, LMN_a, LMN_c, LMN_d and so on. In my actual data.frame, I won't be aware how many such rates combinations are available. If there is no matching column as PQR, the PQR.csv file should have only date and PQR column. Kindly guide how do I split the data.frame and save the respective csv files. Regards Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
[R] speed of a vector operation question
Hello, I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are sorted (with duplicates) in the vector (v). I am obtaining the length of vectors such as (v c) or (v c1 v c2), where c, c1, c2 are some scalar variables. What is the most efficient way to do this? I am using sum(v c) since TRUE's are 1's and FALSE's are 0's. This seems to me more efficient than length(which(v c)), but, please, correct me if I'm wrong. So, is there anything faster than what I already use? I'm running R 2.14.2 on Linux kernel 3.4.34. I appreciate your time, Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] converting character matrix to POSIXct matrix
I thought this is a common question but rseek/google searches don't yield any relevant hit. I have a matrix of character strings, which are time stamps, time.m[1:5,1:5] [,1] [,2] [,3] [,4] [,5] [1,] 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 [2,] 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370 08:00:25.573 [3,] 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368 08:00:30.536 [4,] 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403 08:00:31.867 [5,] 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571 08:00:34.571 And i would like to convert it to a POSIXct matrix. I tried this, time1 = lapply(time.m, function(tt)strptime(tt, %H:%M:%OS)) but it yields a list. Any tip is appreciated. Horace [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Questions about out-of-sample forecast using random walk
Hi there, I'm a bit confused about which command should I use when performing an out-of-sample forecast using random walk. I have som time sereis data from 1957Q1 to 2011Q4, I want to use a fraction of data from 1960Q1 to 1984Q4 to forecast data from 1985Q1 onwards using random walk model and evaluate the forecasting performance based on the true data I have. I used rwf command from 'forecast' package to do this. However, the results I obtained is all around the value of 1984Q4, which is quite different from the true data that shows an increasing trend with time. Could you me give some suggestions on which command should I choose to perform the random walk forecast and get the Mean Squared Forecast Error and Mean Absolute Error of the forecast? It would be really helpful if you can reply as soon as possible since I'm urgently need this. Thanks a lot. Kind Regards, Lavender [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with merge function
Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova ScotiaAnnapolis 0 0 1 Nova ScotiaAntigonish0 0 0 Nova ScotiaGly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0Nova ScotiaAnnapolis 0 0 10 1Nova ScotiaAntigonish0 0 1 2Nova ScotiaGly 0 0 1 So when I do x3 - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE) it should do the trick. The thing is that it works for the columns (it adds all the new columns not common to both dataframes), but it also adds the rows. This is what I get (x3): FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 10NA NA Nova ScotiaAnnapolis NA NA NA 1 1Nova ScotiaAntigonish0 0 1 NA NA Nova ScotiaAntigonishNA NA NA 0 2Nova ScotiaGly 0 0 1 NA NA Nova ScotiaGly NA NA NA NA What I want to get is a true merge, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 101 1Nova ScotiaAntigonish0 0 1 0 2Nova ScotiaGly 0 0 1 NA Can anybody please help me to understand what I'm doing wrong. Any help will be much appreciated!! -- Catarina C. Ferreira, PhD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Installing packages
I am trying to install the package boss but i am getting error below: Please advice install.packages(boss) --- Please select a CRAN mirror for use in this session --- CRAN mirror 1: 0-Cloud 2: Argentina (La Plata) 3: Argentina (Mendoza) 4: Australia (Canberra) 5: Australia (Melbourne) 6: Austria 7: Belgium 8: Brazil (PR) 9: Brazil (RJ) 10: Brazil (SP 1) 11: Brazil (SP 2)12: Canada (BC) 13: Canada (NS) 14: Canada (ON) 15: Canada (QC 1)16: Canada (QC 2) 17: Chile18: China (Beijing 1) 19: China (Beijing 2)20: China (Guangzhou) 21: China (Hefei)22: China (Xiamen) 23: Colombia (Bogota)24: Colombia (Cali) 25: Denmark 26: Ecuador 27: France (Lyon 1) 28: France (Lyon 2) 29: France (Montpellier) 30: France (Paris 1) 31: France (Paris 2) 32: Germany (Berlin) 33: Germany (Bonn) 34: Germany (Falkenstein) 35: Germany (Goettingen) 36: Greece 37: Hungary 38: India 39: Indonesia40: Iran 41: Ireland 42: Italy (Milano) 43: Italy (Padua)44: Italy (Palermo) 45: Japan (Hyogo)46: Japan (Tsukuba) 47: Japan (Tokyo)48: Korea (Seoul 1) 49: Korea (Seoul 2) 50: Latvia 51: Mexico (Mexico City) 52: Mexico (Texcoco) 53: Netherlands (Amsterdam) 54: Netherlands (Utrecht) 55: New Zealand 56: Norway 57: Philippines 58: Poland 59: Portugal 60: Russia 61: Singapore62: Slovakia 63: South Africa (Cape Town) 64: South Africa (Johannesburg) 65: Spain (Madrid) 66: Sweden 67: Switzerland 68: Taiwan (Taichung) 69: Taiwan (Taipei) 70: Thailand 71: Turkey 72: UK (Bristol) 73: UK (London) 74: UK (St Andrews) 75: USA (CA 1) 76: USA (CA 2) 77: USA (IA) 78: USA (IN) 79: USA (KS) 80: USA (MD) 81: USA (MI) 82: USA (MO) 83: USA (OH) 84: USA (OR) 85: USA (PA 1) 86: USA (PA 2) 87: USA (TN) 88: USA (TX 1) 89: USA (WA 1) 90: USA (WA 2) 91: Venezuela92: Vietnam Selection: 86 also installing the dependency 'ncdf' trying URL 'http://cran.mirrors.hoobly.com/src/contrib/ncdf_1.6.6.tar.gz' Content type 'application/x-gzip' length 79403 bytes (77 Kb) opened URL == downloaded 77 Kb trying URL 'http://cran.mirrors.hoobly.com/src/contrib/boss_1.2.tar.gz' Content type 'application/x-gzip' length 9702 bytes opened URL == downloaded 9702 bytes * installing *source* package 'ncdf' ... ** package 'ncdf' successfully unpacked and MD5 sums checked checking for nc-config... no checking for gcc... gcc -std=gnu99 checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking netcdf.h usability... no checking netcdf.h presence... no checking for netcdf.h... no configure: error: netcdf header netcdf.h not found ERROR: configuration failed for package 'ncdf' * removing '/share/apps/R-2.15.3/lib64/R/library/ncdf' ERROR: dependency 'ncdf' is not available for package 'boss' * removing '/share/apps/R-2.15.3/lib64/R/library/boss' The downloaded source packages are in '/tmp/RtmppOWF74/downloaded_packages' Updating HTML index of packages in '.Library' Making packages.html ... done Warning messages: 1: In install.packages(boss) : installation of package 'ncdf' had non-zero exit status 2: In install.packages(boss) : installation of package 'boss' had non-zero exit status [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] Read big data (3G ) methods ?
Have you think of build a database then then let R read it thru that db instead of your desktop? On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C50 package in R
There isn't much out there. Quinlan didn't open source the code until about a year ago. I've been through the code line by line and we have a fairly descriptive summary of the model in our book (that's almost out): http://appliedpredictivemodeling.com/ I will say that the pruning is mostly the same as described in Quinlan's C4.5 book. The big differences in C4.5 and C5.0 are boosting and winnowing. The former is very different mechanically than gradient boosting machines and is more similar to the re-weighting approach of the original adaboost algorithm (but is still pretty different). I've submitted a talk on C5.0 for this year's UseR! conference. If there is enough time I will be able to go through some of the technical details. Two other related notes: - the J48 implementation in Weka lacks one or two of C4.5's features that makes the results substantially different than what C4.5 would have produced The differences are significant enough that Quinlan asked us to call the results of that function as J48 and not C4.5. Using C5.0 with a single tree is much similar to C4.5 than J48. - the differences between model trees and Cubist are also substantial and largely undocumented. HTH, Max On Thu, Apr 25, 2013 at 9:40 AM, Indrajit Sen Gupta indrajit...@rediffmail.com wrote: Hi All, I am trying to use the C50 package to build classification trees in R. Unfortunately there is not enought documentation around its use. Can anyone explain to me - how to prune the decision trees? Regards, Indrajit [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with merge function
Hello, The following seems to do the trick. x1 - structure(list(State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia ), Shape_name = c(Annapolis, Antigonish, Gly), bob2009 = c(0L, 0L, NA), bob2010 = c(0L, 0L, NA), bob2011 = c(1L, 0L, NA)), .Names = c(State_prov, Shape_name, bob2009, bob2010, bob2011), class = data.frame, row.names = c(NA, -3L)) x2 - structure(list(FID = 0:2, State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia), Shape_name = c(Annapolis, Antigonish, Gly ), bob2009 = c(0L, 0L, 0L), bob2010 = c(0L, 0L, 0L), coy2009 = c(10L, 1L, 1L)), .Names = c(FID, State_prov, Shape_name, bob2009, bob2010, coy2009), class = data.frame, row.names = c(NA, -3L)) x3 - merge(x1, x2, all.y = TRUE) Note also that since by = intersect(names(x1), names(x2)), you really don't need it, it's the default behavior. Hope this helps, Rui Barradas Em 26-04-2013 18:10, Catarina Ferreira escreveu: Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova ScotiaAnnapolis 0 0 1 Nova ScotiaAntigonish0 0 0 Nova ScotiaGly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0Nova ScotiaAnnapolis 0 0 10 1Nova ScotiaAntigonish0 0 1 2Nova ScotiaGly 0 0 1 So when I do x3 - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE) it should do the trick. The thing is that it works for the columns (it adds all the new columns not common to both dataframes), but it also adds the rows. This is what I get (x3): FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 10NA NA Nova ScotiaAnnapolis NA NA NA 1 1Nova ScotiaAntigonish0 0 1 NA NA Nova ScotiaAntigonishNA NA NA 0 2Nova ScotiaGly 0 0 1 NA NA Nova ScotiaGly NA NA NA NA What I want to get is a true merge, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 101 1Nova ScotiaAntigonish0 0 1 0 2Nova ScotiaGly 0 0 1 NA Can anybody please help me to understand what I'm doing wrong. Any help will be much appreciated!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting character matrix to POSIXct matrix
Hello, Use sapply instead. Hope this helps, Rui Barradas Em 26-04-2013 18:51, hh wt escreveu: I thought this is a common question but rseek/google searches don't yield any relevant hit. I have a matrix of character strings, which are time stamps, time.m[1:5,1:5] [,1] [,2] [,3] [,4] [,5] [1,] 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 [2,] 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370 08:00:25.573 [3,] 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368 08:00:30.536 [4,] 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403 08:00:31.867 [5,] 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571 08:00:34.571 And i would like to convert it to a POSIXct matrix. I tried this, time1 = lapply(time.m, function(tt)strptime(tt, %H:%M:%OS)) but it yields a list. Any tip is appreciated. Horace [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stratified Random Sampling Proportional to Size
Hello R Experts, I kindly request your assistance on figuring out how to get a stratified random sampling proportional to 100. Below is my r code showing what I did and the error I'm getting with sampling::strata # FIRST I summarized count of records by the two variables I want to use as strata Library(RODBC) library(sqldf) library(sampling) #After establishing connection I query the data and sort it by strata APPT_TYP_CD_LL and EMPL_TYPE and store it in a dataframe CURRPOP-sqlQuery(ch,SELECT APPT_TYP_CD_LL, EMPL_TYPE,ASOFDATE,EMPLID,NAME,DEPTID,JOBCODE,JOBTITLE,SAL_ADMIN_PLAN,RET_TYP_CD_LL FROM PS_EMPLOYEES_LL WHERE EMPL_STATUS NOT IN('R','T') ORDER BY APPT_TYP_CD_LL, EMPL_TYPE) #ROWID is a dummy ID I added and repositioned after the strat columns for later use CURRPOP$ROWID-seq(nrow(CURRPOP)) CURRPOP-CURRPOP[,c(1:2,11,3:10)] # My strata. Stratp is how many I want to sampled from each strata. NOTE THERE ARE SOME 0's which just means I won't sample from that group. stratum_cp-sqldf(SELECT APPT_TYP_CD_LL,EMPL_TYPE, count(*) HC FROM CURRPOP GROUP BY APPT_TYP_CD_LL,EMPL_TYPE) stratum_cp$stratp-round(stratum_cp$HC/nrow(CURRPOP)*100) stratum_cp APPT_TYP_CD_LL EMPL_TYPE HC stratp 1 FA S1 0 2 FC S5 0 3 FP S 173 3 4 FR H 170 3 5 FX H 49 1 6 FX S 57 1 7 IN H 1589 25 8 IN S 3987 63 9 IP H7 0 10 IP S 53 1 11 SA H8 0 12 SE S 43 1 13 SF H 14 0 14 SF S1 0 15 SG S 10 0 16 ST H 107 2 17 ST S6 0 #THEN I attempted to use sampling::strata using the instructions in that package and got an error #I use stratum_cp$stratp for my sizes. s-strata(CURRPOP,c(APPT_TYP_CD_LL,EMPL_TYPE),size=stratum_cp$stratp,method=srswor) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 traceback() 5: stop(arguments imply differing number of rows: , paste(unique(nrows), collapse = , )) 4: data.frame(..., check.names = FALSE) 3: cbind(deparse.level, ...) 2: cbind(r, i) 1: strata(CURRPOP, c(APPT_TYP_CD_LL, EMPL_TYPE), size = stratum_cp$stratp, method = srswor) #In lieu of a reproducible sample here is some info regarding most of my data dim(CURRPOP) [1] 6280 11 #Cols w/ personal info have been removed in this output str(CURRPOP[,c(1:3,7:11)]) 'data.frame': 6280 obs. of 8 variables: $ APPT_TYP_CD_LL: Factor w/ 12 levels FA,FC,FP,..: 1 2 2 2 2 2 3 3 3 3 ... $ EMPL_TYPE : Factor w/ 2 levels H,S: 2 2 2 2 2 2 2 2 2 2 ... $ ROWID : int 1 2 3 4 5 6 7 8 9 10 ... $ DEPTID: int 9825 9613 9613 9852 9772 9852 9853 9853 9853 9854 ... $ JOBCODE : Factor w/ 325 levels 055.2,055.3,..: 311 112 112 112 112 112 298 299 299 300 ... $ JOBTITLE : Factor w/ 325 levels Accounting Assistant,..: 227 192 192 192 192 192 190 191 191 153 ... $ SAL_ADMIN_PLAN: Factor w/ 40 levels ADE,AME,ASE,..: 36 38 38 38 38 38 31 31 31 31 ... $ RET_TYP_CD_LL : Factor w/ 2 levels TCP1,TCP2: 2 2 2 2 2 2 2 2 2 2 ... Daniel Lopez Workforce Analyst HRIM - Workforce Analytics Metrics Strategic Human Resources Management wf-analytics-metr...@lists.llnl.govmailto:wf-analytics-metr...@lists.llnl.gov (925) 422-0814 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to export graph value in R
Anup, You should have provided some additional information, such as that the function 'hypsometric' is found in the hydroTSM contributed package. Nevertheless, here's what I did (maybe not elegant, but it works) : (1) at the R command prompt simply type hypsometric -- the source code for the function 'hypsometric' will be written out (2) copy this source code into a text file and save it as hypsometric2.R (3) edit it as this or just copy this: hypsometric2 - function (x, band = 1, main = Hypsometric Curve, xlab = Relative Area above Elevation, (a/A), ylab = Relative Elevation, (h/H), col = blue, ...) { if (class(x) != SpatialGridDataFrame) stop(Invalid argument: 'class(x)' must be 'SpatialGridDataFrame') band.error - FALSE if (is.numeric(band) | is.integer(band)) { if ((band 1) | (band length(colnames(x@data band.error - TRUE } else if (is.character(band)) if (!(band %in% colnames(x@data))) band.error - TRUE if (band.error) stop(Invalid argument: 'band' does not exist in 'x' !) mydem - x@data[band] z.min - min(mydem, na.rm = TRUE) z.max - max(mydem, na.rm = TRUE) x.dim - x@grid@cellsize[1] y.dim - x@grid@cellsize[2] max.area - length(which(!is.na(mydem))) * x.dim * y.dim res - plot.stepfun(ecdf(as.matrix(mydem)), lwd = 0, cex.points = 0) z.mean.index - which(round(res$y, 3) == 0.5)[1] z.mean - res$t[z.mean.index] relative.area - (1 - res$y[-1]) relative.elev - (res$t[-c(1, length(res$t))] - z.min)/(z.max - z.min) plot(relative.area, relative.elev, xaxt = n, yaxt = n, main = main, xlim = c(0, 1), ylim = c(0, 1), type = l, ylab = ylab, xlab = xlab, col = col, ...) Axis(side = 1, at = seq(0, 1, by = 0.05), labels = TRUE) Axis(side = 2, at = seq(0, 1, by = 0.05), labels = TRUE) f - splinefun(relative.area, relative.elev, method = monoH.FC) hi - integrate(f = f, lower = 0, upper = 1) legend(topright, c(paste(Min Elev. :, round(z.min, 2), [m.a.s.l.], sep = ), paste(Mean Elev.:, round(z.mean, 1), [m.a.s.l.], sep = ), paste(Max Elev. :, round(z.max, 1), [m.a.s.l.], sep = ), paste(Max Area :, round(max.area/1e+06, 1), [km2], sep = ), , paste(Integral value :, round(hi$value, 3), sep = ), paste(Integral error :, round(hi$abs.error, 3), sep = )), bty = n, cex = 0.9, col = c(black, black, black), lty = c(NULL, NULL, NULL, NULL)) curve_data-data.frame(relative.area,relative.elev) return(curve_data) } (4) rather than calling hypsometric(dem), for example, first do this: source(hypsometric2.R) (5) then call: data-hypsometric2(dem) (6) you can see the x.y pairs by typing: data at the R prompt. (7) verify that the data are what you expect, by typing this at the R prompt: plot(data) which should give the same plot as hypsometric2(dem) and hypsometric(dem) without the embellishments and labeling... Tom On Fri, Apr 26, 2013 at 8:52 AM, Anup khanal za...@hotmail.com wrote: Dear exports,I have created a hypsometric curve (area-elevation curve) for my watershed by using simple command hypsometric(X,main=Hypsometric Curve, xlab=Relative Area above Elevation, (a/A), ylab=Relative Elevation, (h/H), col=blue)It plots the hypsometric curve in RGraphics window, My question is how can I export values which is used to create this plot? I mean I want to know the value in y axis for certain x value. Thanks in advance ! ..Anup KhanalNorwegian Institute of science and Technology (NTNU)Trondheim, NorwayMob:(+47) 45174313 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read big data (3G ) methods ?
Do you really have the need loading all the data into memory? Mostly for large data set, people would just read a chunk of it for developing analysis pipeline, and when that's done, the ready script would just iterate through the entire data set. For example, the read.table function has 'nrow' and 'skip' parameters to control the reading of data chunks. read.table(file, nrows = -1, skip = 0, ...) And another tip here is, you can split the large file into smaller ones. On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [newbie] how to find and combine geographic maps with particular features?
If someone else hasn't suggested it already, you will probably get more/better help on the R-sig-geo mailing list. (if you decide to repost there, just mention up front that it's a repost and why) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 4/25/13 6:38 PM, Tom Roche tom_ro...@pobox.com wrote: SUMMARY: Specific problem: I'm regridding biomass-burning emissions from a global/unprojected inventory to a regional projection (LCC over North America). I need to have boundaries for Canada, Mexico, and US (including US states), but also Caribbean and Atlantic nations (notably the Bahamas). I would also like to add Canadian provinces and Mexican states. How to put these together? General problem: are there references regarding * sources for different geographical and political features? * combining maps for the different R graphics packages? DETAILS: (Apologies if this is a FAQ, but googling has not helped me with this.) I'd appreciate help with a specific problem, as well as guidance (e.g., pointers to docs) regarding the larger topic of combining geographical maps (especially projected ones, i.e., not just lon-lat) on plots of regional data (i.e., data that is multinational but not global). My specific problem is https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na/downloads/GFED- 3.1_2008_N2O_monthly_emissions_regrid_20130404_1344.pdf which plots N2O concentrations from a global inventory of fire emissions (GFED) regridded to a North American projection. (See https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na for details.) The plot currently includes boundaries for Canada, Mexico, and US (including US states, since this is being done for a US agency), which are being gotten calling code from package=M3 http://cran.r-project.org/web/packages/M3/ like https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na/src/95484c5d635 02ab146402cedc3612dcdaf629bd7/vis_regrid_vis.r?at=master ## get projected North American map NorAm.shp - project.NorAm.boundaries.for.CMAQ( units='m', extents.fp=template_input_fp, extents=template.extents, LCC.parallels=c(33,45), CRS=out.crs) https://bitbucket.org/tlroche/gfed-3.1_global_to_aqmeii-na/src/95484c5d635 02ab146402cedc3612dcdaf629bd7/visualization.r?at=master # database: Geographical database to use. Choices include state # (default), world, worldHires, canusamex, etc. Use # canusamex to get the national boundaries of the Canada, the # USA, and Mexico, along with the boundaries of the states. # The other choices (state, world, etc.) are the names of # databases included with the Œmaps¹ and Œmapdata¹ packages. project.M3.boundaries.for.CMAQ - function( database='state', # see `?M3::get.map.lines.M3.proj` units='m',# or 'km': see `?M3::get.map.lines.M3.proj` extents.fp, # path to extents file extents, # raster::extent object LCC.parallels=c(33,45), # LCC standard parallels: see https://github.com/TomRoche/cornbeltN2O/wiki/AQMEII-North-American-domain #wiki-EPA CRS # see `sp::CRS` ) { library(M3) ## Will replace raw LCC map's coordinates with: metadata.coords.IOAPI.list - M3::get.grid.info.M3(extents.fp) metadata.coords.IOAPI.x.orig - metadata.coords.IOAPI.list$x.orig metadata.coords.IOAPI.y.orig - metadata.coords.IOAPI.list$y.orig metadata.coords.IOAPI.x.cell.width - metadata.coords.IOAPI.list$x.cell.width metadata.coords.IOAPI.y.cell.width - metadata.coords.IOAPI.list$y.cell.width library(maps) map.lines - M3::get.map.lines.M3.proj( file=extents.fp, database=database, units=m) # dimensions are in meters, not cells. TODO: take argument map.lines.coords.IOAPI.x - (map.lines$coords[,1] - metadata.coords.IOAPI.x.orig) map.lines.coords.IOAPI.y - (map.lines$coords[,2] - metadata.coords.IOAPI.y.orig) map.lines.coords.IOAPI - cbind(map.lines.coords.IOAPI.x, map.lines.coords.IOAPI.y) # # start debugging # class(map.lines.coords.IOAPI) # # [1] matrix # summary(map.lines.coords.IOAPI) # # map.lines.coords.IOAPI.x map.lines.coords.IOAPI.y # # Min. : 283762Min. : 160844 # # 1st Qu.:26502441st Qu.:1054047 # # Median :3469204Median :1701052 # # Mean :3245997Mean :1643356 # # 3rd Qu.:43009693rd Qu.:2252531 # # Max. :4878260Max. :2993778 # # NA's :168NA's :168 # # end debugging # Note above is not zero-centered, like our extents: # extent : -2556000, 2952000, -1728000, 186 (xmin, xmax, ymin, ymax) # So gotta add (xmin, ymin) below. ## Get LCC state map # see http://stackoverflow.com/questions/14865507/how-to-display-a-projected-ma p-on-an-rlatticelayerplot map.IOAPI - maps::map(
Re: [R] speed of a vector operation question
I think the sum way is the best. On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote: Hello, I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are sorted (with duplicates) in the vector (v). I am obtaining the length of vectors such as (v c) or (v c1 v c2), where c, c1, c2 are some scalar variables. What is the most efficient way to do this? I am using sum(v c) since TRUE's are 1's and FALSE's are 0's. This seems to me more efficient than length(which(v c)), but, please, correct me if I'm wrong. So, is there anything faster than what I already use? I'm running R 2.14.2 on Linux kernel 3.4.34. I appreciate your time, Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression coefficients
Hi all, I have run a ridge regression as follows: reg=lm.ridge(final$l~final$lag1+final$lag2+final$g+final$g+final$u, lambda=seq(0,10,0.01)) Then I enter : select(reg) and it returns: modified HKB estimator is 19.3409 modified L-W estimator is 36.18617 smallest value of GCV at 10 I think it means that it is advisable to use the results of regression corresponding to lambda= 10; so the next thing I do is: reg=lm.ridge(final$l~final$lag1+final$lag2+final$g+final$u, lambda=10) which yields: final$lag1final$lag2 final$g final$u 3.147255e-04 1.802505e-01 -4.461005e-02 -1.728046e-09 -5.154932e-04 If I am to use these coefficient values later in my analysis, how do I call them; for clearly reg$final$lag1 does not work. 1 Any way I can access these values? 2 The main issue is that I want to access these coefficient values automatically, i.e. R should run the regression and automatically provide me these values after taking into consideration that lambda which minimizes the GCV. Kindly advise me how I can proceed. Thanks and regards, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NMDS in Vegan: problems in stressplot, best solution
On Fri, 2013-04-26 at 12:42 -0500, Kumar Mainali wrote: Hello, I can draw a basic stress plot for NMDS with the following code in package Vegan. stressplot(parth.mds, parth.dis) When I try to specify the line and point types, it gives me error message. stressplot(parth.mds, parth.dis, pch=1, p.col=gray, lwd=2, l.col=red) Error in plot.xy(xy, type, ...) : invalid plot type In the above code, if I removed line type, it does give me the plot only of points with my choice of type. stressplot(parth.mds, parth.dis, pch=1, p.col=gray) Why cannot I define both line and point at the same time? You can. What you can't do is use argument `dis` with an metaMDS object. If you use: stressplot(parth.mds, pch=1, p.col=gray, lwd=2, l.col=red) you'll see it works just fine. We'll see about providing a better error message if you do what the documentation asks you not to. If I have 100 iterations for metaMDS, then when I plot the result, does it give me result from best solution? The best solution it encountered in the 100 random starts, yes. How do I know that. It is implied in point 4. of the Details section of ?metaMDS. Can you plot the Stress by Iteration number? Not in a graphical plot. The stresses for each iteration are printed to the console at each iteration. Note these iterations are random starts, each of which has iterations of the algorithm. HTH G parth.mds - metaMDS(WorldPRSenv, distance = bray, k = 2, trymax = 100, engine = c(monoMDS, isoMDS), autotransform =TRUE, wascores = TRUE, expand = TRUE, trace = 2) plot(parth.mds, type = p) Thanks in advance, Kumar -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read big data (3G ) methods ?
On 04/26/2013 08:09 AM, Kevin Hao wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. you mention limma; if this is sequence or microarray data then asking on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ (no subscription necessary) may be more appropriate, but you need to provide more information about what you want to do, e.g., a code chunk illustrating the problem. Martin Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed of a vector operation question
I think the sum way is the best. On my Linux machine running R-3.0.0 the sum way is slightly faster: x - rexp(1e6, 2) system.time(for(i in 1:100)sum(x.3 x.5)) user system elapsed 4.664 0.340 5.018 system.time(for(i in 1:100)length(which(x.3 x.5))) user system elapsed 5.017 0.160 5.186 If you are doing many of these counts on the same dataset you can save time by using functions like cut(), table(), ecdf(), and findInterval(). E.g., system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) x=i), FUN.VALUE=0L)) user system elapsed 5.332 0.568 5.909 system.time(r2 - table(cut(x, seq(0,1,by=1/128 user system elapsed 0.500 0.008 0.511 all.equal(as.vector(r1), as.vector(r2)) [1] TRUE You should do the timings yourself, as the relative speeds will depend on the version or dialect of the R interpreter and how it was compiled. E.g., with the current development version of 'TIBCO Enterprise Runtime for R' (aka 'TERR') on this same 8-core Linux box the sum way is considerably faster then the length(which) way: x - rexp(1e6, 2) system.time(for(i in 1:100)sum(x.3 x.5)) user system elapsed 1.870.030.48 system.time(for(i in 1:100)length(which(x.3 x.5))) user system elapsed 3.210.040.83 system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) x=i), FUN.VALUE=0L)) user system elapsed 2.190.040.56 system.time(r2 - table(cut(x, seq(0,1,by=1/128 user system elapsed 0.270.010.13 all.equal(as.vector(r1), as.vector(r2)) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of lcn Sent: Friday, April 26, 2013 12:09 PM To: Mikhail Umorin Cc: r-help@r-project.org Subject: Re: [R] speed of a vector operation question I think the sum way is the best. On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote: Hello, I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are sorted (with duplicates) in the vector (v). I am obtaining the length of vectors such as (v c) or (v c1 v c2), where c, c1, c2 are some scalar variables. What is the most efficient way to do this? I am using sum(v c) since TRUE's are 1's and FALSE's are 0's. This seems to me more efficient than length(which(v c)), but, please, correct me if I'm wrong. So, is there anything faster than what I already use? I'm running R 2.14.2 on Linux kernel 3.4.34. I appreciate your time, Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed of a vector operation question
A very similar question was asked on StackOverflow (by Mikhail? and then I guess the answers there were somehow not satisfactory...) http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or-match where it turns out that a binary search (implemented in R) on the sorted vector is much faster than sum, etc. I guess because it's log N without copying. The more complicated condition x .3 x .5 could be satisfied with multiple calls to the search. Martin On 04/26/2013 01:20 PM, William Dunlap wrote: I think the sum way is the best. On my Linux machine running R-3.0.0 the sum way is slightly faster: x - rexp(1e6, 2) system.time(for(i in 1:100)sum(x.3 x.5)) user system elapsed 4.664 0.340 5.018 system.time(for(i in 1:100)length(which(x.3 x.5))) user system elapsed 5.017 0.160 5.186 If you are doing many of these counts on the same dataset you can save time by using functions like cut(), table(), ecdf(), and findInterval(). E.g., system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) x=i), FUN.VALUE=0L)) user system elapsed 5.332 0.568 5.909 system.time(r2 - table(cut(x, seq(0,1,by=1/128 user system elapsed 0.500 0.008 0.511 all.equal(as.vector(r1), as.vector(r2)) [1] TRUE You should do the timings yourself, as the relative speeds will depend on the version or dialect of the R interpreter and how it was compiled. E.g., with the current development version of 'TIBCO Enterprise Runtime for R' (aka 'TERR') on this same 8-core Linux box the sum way is considerably faster then the length(which) way: x - rexp(1e6, 2) system.time(for(i in 1:100)sum(x.3 x.5)) user system elapsed 1.870.030.48 system.time(for(i in 1:100)length(which(x.3 x.5))) user system elapsed 3.210.040.83 system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) x=i), FUN.VALUE=0L)) user system elapsed 2.190.040.56 system.time(r2 - table(cut(x, seq(0,1,by=1/128 user system elapsed 0.270.010.13 all.equal(as.vector(r1), as.vector(r2)) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of lcn Sent: Friday, April 26, 2013 12:09 PM To: Mikhail Umorin Cc: r-help@r-project.org Subject: Re: [R] speed of a vector operation question I think the sum way is the best. On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote: Hello, I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are sorted (with duplicates) in the vector (v). I am obtaining the length of vectors such as (v c) or (v c1 v c2), where c, c1, c2 are some scalar variables. What is the most efficient way to do this? I am using sum(v c) since TRUE's are 1's and FALSE's are 0's. This seems to me more efficient than length(which(v c)), but, please, correct me if I'm wrong. So, is there anything faster than what I already use? I'm running R 2.14.2 on Linux kernel 3.4.34. I appreciate your time, Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read big data (3G ) methods ?
I can not think of sth better. Maybe try read part of the data that you want to analyze, basically break the large data set into pieces. On Fri, Apr 26, 2013 at 10:58 AM, Ye Lin ye...@lbl.gov wrote: Have you think of build a database then then let R read it thru that db instead of your desktop? On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed of a vector operation question
R's findInterval can also take advantage of a sorted x vector. E.g., in R-3.0.0 on the same 8-core Linux box: x - rexp(1e6, 2) system.time(for(i in 1:100)tabulate(findInterval(x, c(-Inf, .3, .5, Inf)))[2]) user system elapsed 2.444 0.000 2.446 xs - sort(x) system.time(for(i in 1:100)tabulate(findInterval(xs, c(-Inf, .3, .5, Inf)))[2]) user system elapsed 1.472 0.000 1.475 tabulate(findInterval(xs, c(-Inf, .3, .5, Inf)))[2] [1] 180636 sum( xs .3 xs = .5 ) [1] 180636 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Martin Morgan [mailto:mtmor...@fhcrc.org] Sent: Friday, April 26, 2013 1:33 PM To: William Dunlap Cc: lcn; Mikhail Umorin; r-help@r-project.org Subject: Re: [R] speed of a vector operation question A very similar question was asked on StackOverflow (by Mikhail? and then I guess the answers there were somehow not satisfactory...) http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or- match where it turns out that a binary search (implemented in R) on the sorted vector is much faster than sum, etc. I guess because it's log N without copying. The more complicated condition x .3 x .5 could be satisfied with multiple calls to the search. Martin On 04/26/2013 01:20 PM, William Dunlap wrote: I think the sum way is the best. On my Linux machine running R-3.0.0 the sum way is slightly faster: x - rexp(1e6, 2) system.time(for(i in 1:100)sum(x.3 x.5)) user system elapsed 4.664 0.340 5.018 system.time(for(i in 1:100)length(which(x.3 x.5))) user system elapsed 5.017 0.160 5.186 If you are doing many of these counts on the same dataset you can save time by using functions like cut(), table(), ecdf(), and findInterval(). E.g., system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) x=i), FUN.VALUE=0L)) user system elapsed 5.332 0.568 5.909 system.time(r2 - table(cut(x, seq(0,1,by=1/128 user system elapsed 0.500 0.008 0.511 all.equal(as.vector(r1), as.vector(r2)) [1] TRUE You should do the timings yourself, as the relative speeds will depend on the version or dialect of the R interpreter and how it was compiled. E.g., with the current development version of 'TIBCO Enterprise Runtime for R' (aka 'TERR') on this same 8-core Linux box the sum way is considerably faster then the length(which) way: x - rexp(1e6, 2) system.time(for(i in 1:100)sum(x.3 x.5)) user system elapsed 1.870.030.48 system.time(for(i in 1:100)length(which(x.3 x.5))) user system elapsed 3.210.040.83 system.time(r1 - vapply(seq(0,1,by=1/128)[-1], function(i)sum(x(i-1/128) x=i), FUN.VALUE=0L)) user system elapsed 2.190.040.56 system.time(r2 - table(cut(x, seq(0,1,by=1/128 user system elapsed 0.270.010.13 all.equal(as.vector(r1), as.vector(r2)) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of lcn Sent: Friday, April 26, 2013 12:09 PM To: Mikhail Umorin Cc: r-help@r-project.org Subject: Re: [R] speed of a vector operation question I think the sum way is the best. On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin mike...@gmail.com wrote: Hello, I am dealing with numeric vectors 10^5 to 10^6 elements long. The values are sorted (with duplicates) in the vector (v). I am obtaining the length of vectors such as (v c) or (v c1 v c2), where c, c1, c2 are some scalar variables. What is the most efficient way to do this? I am using sum(v c) since TRUE's are 1's and FALSE's are 0's. This seems to me more efficient than length(which(v c)), but, please, correct me if I'm wrong. So, is there anything faster than what I already use? I'm running R 2.14.2 on Linux kernel 3.4.34. I appreciate your time, Mikhail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
Re: [R] Help with merge function
Hello, Thank you for your help. However the dataframes I gave you were only examples, the actual dataframes are very big. Does this mean I have to write every range of data for each variable?? On Fri, Apr 26, 2013 at 2:25 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, The following seems to do the trick. x1 - structure(list(State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia ), Shape_name = c(Annapolis, Antigonish, Gly), bob2009 = c(0L, 0L, NA), bob2010 = c(0L, 0L, NA), bob2011 = c(1L, 0L, NA)), .Names = c(State_prov, Shape_name, bob2009, bob2010, bob2011), class = data.frame, row.names = c(NA, -3L)) x2 - structure(list(FID = 0:2, State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia), Shape_name = c(Annapolis, Antigonish, Gly ), bob2009 = c(0L, 0L, 0L), bob2010 = c(0L, 0L, 0L), coy2009 = c(10L, 1L, 1L)), .Names = c(FID, State_prov, Shape_name, bob2009, bob2010, coy2009), class = data.frame, row.names = c(NA, -3L)) x3 - merge(x1, x2, all.y = TRUE) Note also that since by = intersect(names(x1), names(x2)), you really don't need it, it's the default behavior. Hope this helps, Rui Barradas Em 26-04-2013 18:10, Catarina Ferreira escreveu: Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova ScotiaAnnapolis 0 0 1 Nova ScotiaAntigonish0 0 0 Nova ScotiaGly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0Nova ScotiaAnnapolis 0 0 10 1Nova ScotiaAntigonish0 0 1 2Nova ScotiaGly 0 0 1 So when I do x3 - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE) it should do the trick. The thing is that it works for the columns (it adds all the new columns not common to both dataframes), but it also adds the rows. This is what I get (x3): FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 10NA NA Nova ScotiaAnnapolis NA NA NA 1 1Nova ScotiaAntigonish0 0 1 NA NA Nova ScotiaAntigonishNA NA NA 0 2Nova ScotiaGly 0 0 1 NA NA Nova ScotiaGly NA NA NA NA What I want to get is a true merge, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 101 1Nova ScotiaAntigonish0 0 1 0 2Nova ScotiaGly 0 0 1 NA Can anybody please help me to understand what I'm doing wrong. Any help will be much appreciated!! -- Catarina C. Ferreira, PhD Post-doctoral Research Fellow Department of Biology Trent University Peterborough, ON Canada URL: http://www.researcherid.com/rid/A-3898-2011 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with merge function
Hi, The format is bit messed up. So, not sure this is what you wanted. x1- read.table(text=State_prov,Shape_name,bob2009,bob2010,bob2011 Nova Scotia,Annapolis,0,0,1 Nova Scotia,Antigonish,0,0,0 Nova Scotia,Gly,NA,NA,NA ,sep=,,header=TRUE,stringsAsFactors=FALSE) x2- read.table(text= FID,State_prov,Shape_name,bob2009,bob2010,coy2009 0,Nova Scotia,Annapolis,0,0,10 1,Nova Scotia,Antigonish,0,0,1 2,Nova Scotia,Gly,0,0,1 ,sep=,,header=TRUE,stringsAsFactors=FALSE) merge(x1,x2,all=TRUE) # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 #4 Nova Scotia Gly NA NA NA NA NA - Original Message - From: Catarina Ferreira catferre...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, April 26, 2013 1:10 PM Subject: [R] Help with merge function Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova Scotia Annapolis 0 0 1 Nova Scotia Antigonish 0 0 0 Nova Scotia Gly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0 Nova Scotia Annapolis 0 0 10 1 Nova Scotia Antigonish 0 0 1 2 Nova Scotia Gly 0 0 1 So when I do x3 - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE) it should do the trick. The thing is that it works for the columns (it adds all the new columns not common to both dataframes), but it also adds the rows. This is what I get (x3): FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0 Nova Scotia Annapolis 0 0 10 NA NA Nova Scotia Annapolis NA NA NA 1 1 Nova Scotia Antigonish 0 0 1 NA NA Nova Scotia Antigonish NA NA NA 0 2 Nova Scotia Gly 0 0 1 NA NA Nova Scotia Gly NA NA NA NA What I want to get is a true merge, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0 Nova Scotia Annapolis 0 0 10 1 1 Nova Scotia Antigonish 0 0 1 0 2 Nova Scotia Gly 0 0 1 NA Can anybody please help me to understand what I'm doing wrong. Any help will be much appreciated!! -- Catarina C. Ferreira, PhD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] example
Dear Sir, My name is Iut Tri Utami. i am beginning user. I have a problem about generate data in R. It consists of one disk generated by a Gaussian N(0, 0.167) and one ring generated by a Gaussian N(R, 0.1). The mean R was generated from its polar coordinates. The angle was drawn from a uniform distribution on the interval (0, 2ð), and the radius, from a Gaussian N(1.5, 0.1). The class sizes are 500 and 2000. Thank you very much for your attention and, I wish that you will help me. Best wishes , Iut Tri Utami [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read big data (3G ) methods ?
Thanks lcn, I will try to read data from different chunks. Best, Kevin On Fri, Apr 26, 2013 at 3:05 PM, lcn lcn...@gmail.com wrote: Do you really have the need loading all the data into memory? Mostly for large data set, people would just read a chunk of it for developing analysis pipeline, and when that's done, the ready script would just iterate through the entire data set. For example, the read.table function has 'nrow' and 'skip' parameters to control the reading of data chunks. read.table(file, nrows = -1, skip = 0, ...) And another tip here is, you can split the large file into smaller ones. On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with merge function
Hello, I don't understand the question, what range? I've just changed the 'all' argument to 'all.y', without doing anything special to the variables. Can you explain what you mean? Rui Barradas Em 26-04-2013 19:30, Catarina Ferreira escreveu: Hello, Thank you for your help. However the dataframes I gave you were only examples, the actual dataframes are very big. Does this mean I have to write every range of data for each variable?? On Fri, Apr 26, 2013 at 2:25 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, The following seems to do the trick. x1 - structure(list(State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia ), Shape_name = c(Annapolis, Antigonish, Gly), bob2009 = c(0L, 0L, NA), bob2010 = c(0L, 0L, NA), bob2011 = c(1L, 0L, NA)), .Names = c(State_prov, Shape_name, bob2009, bob2010, bob2011), class = data.frame, row.names = c(NA, -3L)) x2 - structure(list(FID = 0:2, State_prov = c(Nova Scotia, Nova Scotia, Nova Scotia), Shape_name = c(Annapolis, Antigonish, Gly ), bob2009 = c(0L, 0L, 0L), bob2010 = c(0L, 0L, 0L), coy2009 = c(10L, 1L, 1L)), .Names = c(FID, State_prov, Shape_name, bob2009, bob2010, coy2009), class = data.frame, row.names = c(NA, -3L)) x3 - merge(x1, x2, all.y = TRUE) Note also that since by = intersect(names(x1), names(x2)), you really don't need it, it's the default behavior. Hope this helps, Rui Barradas Em 26-04-2013 18:10, Catarina Ferreira escreveu: Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova ScotiaAnnapolis 0 0 1 Nova ScotiaAntigonish0 0 0 Nova ScotiaGly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0Nova ScotiaAnnapolis 0 0 10 1Nova ScotiaAntigonish0 0 1 2Nova ScotiaGly 0 0 1 So when I do x3 - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE) it should do the trick. The thing is that it works for the columns (it adds all the new columns not common to both dataframes), but it also adds the rows. This is what I get (x3): FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 10NA NA Nova ScotiaAnnapolis NA NA NA 1 1Nova ScotiaAntigonish0 0 1 NA NA Nova ScotiaAntigonishNA NA NA 0 2Nova ScotiaGly 0 0 1 NA NA Nova ScotiaGly NA NA NA NA What I want to get is a true merge, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0Nova ScotiaAnnapolis 0 0 101 1Nova ScotiaAntigonish0 0 1 0 2Nova ScotiaGly 0 0 1 NA Can anybody please help me to understand what I'm doing wrong. Any help will be much appreciated!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting character matrix to POSIXct matrix
time.m- as.matrix(read.table(text=' 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370 08:00:25.573 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368 08:00:30.536 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403 08:00:31.867 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571 08:00:34.571 ',sep=,header=FALSE,stringsAsFactors=FALSE)) colnames(time.m)- NULL op- options(digits.secs=3) res-data.frame(lapply(seq_len(ncol(time.m)),function(i) strptime(time.m[,i],%H:%M:%OS))) colnames(res)- paste0(X,1:5) str(res) #'data.frame': 5 obs. of 5 variables: # $ X1: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:21.996 ... # $ X2: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:22.071 ... # $ X3: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:23.821 ... # $ X4: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:24.369 ... # $ X5: POSIXct, format: 2013-04-26 08:00:20.799 2013-04-26 08:00:25.572 ... options(op) A.K. - Original Message - From: hh wt horace...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, April 26, 2013 1:51 PM Subject: [R] converting character matrix to POSIXct matrix I thought this is a common question but rseek/google searches don't yield any relevant hit. I have a matrix of character strings, which are time stamps, time.m[1:5,1:5] [,1] [,2] [,3] [,4] [,5] [1,] 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 08:00:20.799 [2,] 08:00:21.996 08:00:22.071 08:00:23.821 08:00:24.370 08:00:25.573 [3,] 08:00:29.200 08:00:29.200 08:00:29.591 08:00:30.368 08:00:30.536 [4,] 08:00:31.073 08:00:31.372 08:00:31.384 08:00:31.403 08:00:31.867 [5,] 08:00:31.867 08:00:31.867 08:00:31.971 08:00:34.571 08:00:34.571 And i would like to convert it to a POSIXct matrix. I tried this, time1 = lapply(time.m, function(tt)strptime(tt, %H:%M:%OS)) but it yields a list. Any tip is appreciated. Horace [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read big data (3G ) methods ?
Thanks. I will try break into pieces to analysis. Kevin On Fri, Apr 26, 2013 at 4:38 PM, Ye Lin ye...@lbl.gov wrote: I can not think of sth better. Maybe try read part of the data that you want to analyze, basically break the large data set into pieces. On Fri, Apr 26, 2013 at 10:58 AM, Ye Lin ye...@lbl.gov wrote: Have you think of build a database then then let R read it thru that db instead of your desktop? On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorized code for generating the Kac (Clement) matrix
Thank you, Berend and Enrico, for looking into this. I did not think of Enrico's clever use of cbind() to form the subsetting indices. Best, Ravi From: Berend Hasselman [b...@xs4all.nl] Sent: Friday, April 26, 2013 10:08 AM To: Enrico Schumann Cc: Ravi Varadhan; 'r-help@r-project.org' Subject: Re: [R] Vectorized code for generating the Kac (Clement) matrix On 26-04-2013, at 14:42, Enrico Schumann e...@enricoschumann.net wrote: On Thu, 25 Apr 2013, Ravi Varadhan ravi.varad...@jhu.edu writes: Hi, I am generating large Kac matrices (also known as Clement matrix). This a tridiagonal matrix. I was wondering whether there is a vectorized solution that avoids the `for' loops to the following code: n - 1000 Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 The above code is fast, but I am curious about vectorized ways to do this. Thanks in advance. Best, Ravi This may be a bit faster; but as Berend and you said, the original function seems already fast. n - 5000 f1 - function(n) { Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 2:(n+1)) Kacmat[i, i-1] - i-1 Kacmat } f3 - function(n) { n1 - n + 1L res - numeric(n1 * n1) dim(res) - c(n1, n1) bw - n:1L ## bw = backward, fw = forward fw - seq_len(n) res[cbind(fw, fw + 1L)] - bw res[cbind(fw + 1L, fw)] - fw res } system.time(K1 - f1(n)) system.time(K3 - f3(n)) identical(K3, K1) ##user system elapsed ## 0.132 0.028 0.161 ## ##user system elapsed ## 0.024 0.048 0.071 ## Using some of your code in my function I was able to speed up my function f2. Complete code: f1 - function(n) { #Ravi Kacmat - matrix(0, n+1, n+1) for (i in 1:n) Kacmat[i, i+1] - n - i + 1 for (i in 1:n) Kacmat[i+1, i] - i Kacmat } f2 - function(n) { # Berend 1 modified to use 1L Kacmat - matrix(0, n+1, n+1) Kacmat[row(Kacmat)==col(Kacmat)-1L] - n:1L Kacmat[row(Kacmat)==col(Kacmat)+1L] - 1L:n Kacmat } f3 - function(n) { # Enrico n1 - n + 1L res - numeric(n1 * n1) dim(res) - c(n1, n1) bw - n:1L ## bw = backward, fw = forward fw - seq_len(n) res[cbind(fw, fw + 1L)] - bw res[cbind(fw + 1L, fw)] - fw res } f4 - function(n) {# Berend 2 using which with arr.ind=TRUE Kacmat - matrix(0, n+1, n+1) k1 - which(row(Kacmat)==col(Kacmat)-1L, arr.ind=TRUE) k2 - which(row(Kacmat)==col(Kacmat)+1L, arr.ind=TRUE) Kacmat[k1] - n:1L Kacmat[k2] - 1L:n Kacmat } library(compiler) f1.c - cmpfun(f1) f2.c - cmpfun(f2) f3.c - cmpfun(f3) f4.c - cmpfun(f4) f1(n) f2(n) n - 5000 system.time(K1 - f1(n)) system.time(K2 - f2(n)) system.time(K3 - f3(n)) system.time(K4 - f4(n)) system.time(K1c - f1.c(n)) system.time(K2c - f2.c(n)) system.time(K3c - f3.c(n)) system.time(K4c - f4.c(n)) identical(K2,K1) identical(K3,K1) identical(K4,K1) identical(K1c,K1) identical(K2c,K2) identical(K3c,K3) identical(K4c,K4) Result: # system.time(K1 - f1(n)) #user system elapsed # 0.387 0.120 0.511 # system.time(K2 - f2(n)) #user system elapsed # 3.541 0.702 4.250 # system.time(K3 - f3(n)) #user system elapsed # 0.108 0.089 0.199 # system.time(K4 - f4(n)) #user system elapsed # 1.975 0.355 2.336 # # system.time(K1c - f1.c(n)) #user system elapsed # 0.323 0.120 0.445 # system.time(K2c - f2.c(n)) #user system elapsed # 3.374 0.422 3.807 # system.time(K3c - f3.c(n)) #user system elapsed # 0.107 0.098 0.205 # system.time(K4c - f4.c(n)) #user system elapsed # 1.816 0.384 2.203 # identical(K2,K1) # [1] TRUE # identical(K3,K1) # [1] TRUE # identical(K4,K1) # [1] TRUE # identical(K1c,K1) # [1] TRUE # identical(K2c,K2) # [1] TRUE # identical(K3c,K3) # [1] TRUE # identical(K4c,K4) # [1] TRUE So Ravi's original and Enrico's versions are the quickest. Using which with arr.ind made my version run a lot quicker. All in all an interesting exercise. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with merge function
Hi, From the output you wanted, it looks like: library(plyr) join(x1,x2,type=right) #Joining by: State_prov, Shape_name, bob2009, bob2010 # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 merge(x1,x2,all.y=TRUE) # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 A.K. From: Catarina Ferreira catferre...@gmail.com To: arun smartpink...@yahoo.com Sent: Friday, April 26, 2013 2:23 PM Subject: Re: [R] Help with merge function Hello, I didn't realize that the format had been changed after I sent the email. I'm sending you the original mail in attach in a word with the correct format, since I don't think your answer is the one I'm looking for, likely due to the erroneous format. Thank you again for your help. On Fri, Apr 26, 2013 at 2:11 PM, arun smartpink...@yahoo.com wrote: Hi, The format is bit messed up. So, not sure this is what you wanted. x1- read.table(text=State_prov,Shape_name,bob2009,bob2010,bob2011 Nova Scotia,Annapolis,0,0,1 Nova Scotia,Antigonish,0,0,0 Nova Scotia,Gly,NA,NA,NA ,sep=,,header=TRUE,stringsAsFactors=FALSE) x2- read.table(text= FID,State_prov,Shape_name,bob2009,bob2010,coy2009 0,Nova Scotia,Annapolis,0,0,10 1,Nova Scotia,Antigonish,0,0,1 2,Nova Scotia,Gly,0,0,1 ,sep=,,header=TRUE,stringsAsFactors=FALSE) merge(x1,x2,all=TRUE) # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 #4 Nova Scotia Gly NA NA NA NA NA - Original Message - From: Catarina Ferreira catferre...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, April 26, 2013 1:10 PM Subject: [R] Help with merge function Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova Scotia Annapolis 0 0 1 Nova Scotia Antigonish 0 0 0 Nova Scotia Gly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0 Nova Scotia Annapolis 0 0 10 1 Nova Scotia Antigonish 0 0 1 2 Nova Scotia Gly 0 0 1 So when I do x3 - merge(x1, x2, by=intersect(names(x1), names(x2)), all=TRUE) it should do the trick. The thing is that it works for the columns (it adds all the new columns not common to both dataframes), but it also adds the rows. This is what I get (x3): FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0 Nova Scotia Annapolis 0 0 10 NA NA Nova Scotia Annapolis NA NA NA 1 1 Nova Scotia Antigonish 0 0 1 NA NA Nova Scotia Antigonish NA NA NA 0 2 Nova Scotia Gly 0 0 1 NA NA Nova Scotia Gly NA NA NA NA What I want to get is a true merge, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 bob2011 0 Nova Scotia Annapolis 0 0 10 1 1 Nova Scotia Antigonish 0 0 1 0 2 Nova Scotia Gly 0 0 1 NA Can anybody please help me to understand what I'm doing wrong. Any help will be much appreciated!! -- Catarina C. Ferreira, PhD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Catarina C. Ferreira, PhD Post-doctoral Research Fellow Department of Biology Trent University Peterborough, ON Canada URL: http://www.researcherid.com/rid/A-3898-2011 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] Read big data (3G ) methods ?
Hi Ye, Thanks. That is a good method. have any other methods instead of using database? kevin On Fri, Apr 26, 2013 at 1:58 PM, Ye Lin ye...@lbl.gov wrote: Have you think of build a database then then let R read it thru that db instead of your desktop? On Fri, Apr 26, 2013 at 8:09 AM, Kevin Hao rfans4ch...@gmail.com wrote: Hi all scientists, Recently, I am dealing with big data ( 3G txt or csv format ) in my desktop (windows 7 - 64 bit version), but I can not read them faster, thought I search from internet. [define colClasses for read.table, cobycol and limma packages I have use them, but it is not so fast]. Could you share your methods to read big data to R faster? Though this is an odd question, but we need it really. Any suggest appreciates. Thank you very much. kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with merge function
Hi, Check whether this works. Lines1-readLines(NS_update.txt) x1-read.table(text=gsub('\',,Lines1),sep=,,header=TRUE,stringsAsFactors=FALSE) x2- read.table(data.txt,sep=,header=TRUE,stringsAsFactors=FALSE,fill=TRUE) dim(x2) #[1] 34577 189 library(plyr) res- join(x1,x2,type=right) #Joining by: State_Prov, Shape_name, bob2009, bob2010, red2009, red2010, coy2009, coy2010, lyn2009, lyn2010 dim(res) #[1] 34577 193 res2- merge(x1,x2,all.y=TRUE) dim(res2) #[1] 34577 193 A.K. From: Catarina Ferreira catferre...@gmail.com To: arun smartpink...@yahoo.com Sent: Friday, April 26, 2013 4:20 PM Subject: Re: [R] Help with merge function here they are. As you see the NS_update is data for only 1 province and I want it to add this data to the bigger file (data), merging the common columns and adding the new columns. But what it is doing is duplicating the rows in the bigger file that correspond to NS_update, as well as creating the new columns (this is ok). On Fri, Apr 26, 2013 at 4:16 PM, arun smartpink...@yahoo.com wrote: You can send the files. From: Catarina Ferreira catferre...@gmail.com To: arun smartpink...@yahoo.com Sent: Friday, April 26, 2013 4:15 PM Subject: Re: [R] Help with merge function is it ok if I send you the files, it's probably better for you to understand me. It didn't work on my files. On Fri, Apr 26, 2013 at 4:12 PM, arun smartpink...@yahoo.com wrote: Hi, I am not sure what is the problem. I used the datasets your provided x1 and x2. I got the result that was shown in the output your desired. Are you saying that this didn't worked in your original dataset or the one your provided? In that case, could you dput(dataset,20)? From: Catarina Ferreira catferre...@gmail.com To: arun smartpink...@yahoo.com Sent: Friday, April 26, 2013 4:01 PM Subject: Re: [R] Help with merge function Thank you. It still isn't working. Thank you in any case. On Fri, Apr 26, 2013 at 2:31 PM, arun smartpink...@yahoo.com wrote: Hi, From the output you wanted, it looks like: library(plyr) join(x1,x2,type=right) #Joining by: State_prov, Shape_name, bob2009, bob2010 # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 merge(x1,x2,all.y=TRUE) # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 A.K. From: Catarina Ferreira catferre...@gmail.com To: arun smartpink...@yahoo.com Sent: Friday, April 26, 2013 2:23 PM Subject: Re: [R] Help with merge function Hello, I didn't realize that the format had been changed after I sent the email. I'm sending you the original mail in attach in a word with the correct format, since I don't think your answer is the one I'm looking for, likely due to the erroneous format. Thank you again for your help. On Fri, Apr 26, 2013 at 2:11 PM, arun smartpink...@yahoo.com wrote: Hi, The format is bit messed up. So, not sure this is what you wanted. x1- read.table(text=State_prov,Shape_name,bob2009,bob2010,bob2011 Nova Scotia,Annapolis,0,0,1 Nova Scotia,Antigonish,0,0,0 Nova Scotia,Gly,NA,NA,NA ,sep=,,header=TRUE,stringsAsFactors=FALSE) x2- read.table(text= FID,State_prov,Shape_name,bob2009,bob2010,coy2009 0,Nova Scotia,Annapolis,0,0,10 1,Nova Scotia,Antigonish,0,0,1 2,Nova Scotia,Gly,0,0,1 ,sep=,,header=TRUE,stringsAsFactors=FALSE) merge(x1,x2,all=TRUE) # State_prov Shape_name bob2009 bob2010 bob2011 FID coy2009 #1 Nova Scotia Annapolis 0 0 1 0 10 #2 Nova Scotia Antigonish 0 0 0 1 1 #3 Nova Scotia Gly 0 0 NA 2 1 #4 Nova Scotia Gly NA NA NA NA NA - Original Message - From: Catarina Ferreira catferre...@gmail.com To: r-help@r-project.org Cc: Sent: Friday, April 26, 2013 1:10 PM Subject: [R] Help with merge function Dear all, I'm trying to merge 2 dataframes, but I'm not being entirely successful and I can't understand why. Dataframe x1 State_prov Shape_name bob2009 bob 2010 bob2011 Nova Scotia Annapolis 0 0 1 Nova Scotia Antigonish 0 0 0 Nova Scotia Gly NA NA NA Dataframe x2 - has 2 rows and 193 variables, contains one important field which is FID that is a link to a shapefile (this is not in x1) and shares common columns with x1, like this: FID State_prov Shape_name bob2009 bob 2010 coy 2009 0 Nova