Re: [R] = returns wrong result? Why
Trafim Vanishek wrote: Dear all, Does anybody know the probable reason why = gives false when it should give true? These two variables are of the same type, and everything works in the cycle but then it stops when they are equal. this is the output result Rk[47] = RB[21] [1] FALSE Rk[47] [1] 0.002842007 RB[21] [1] 0.002842007 Thanks a lot. What makes you think that Rk[47] and RB[21] are equal? You're only showing 9-decimal print versions. -Peter Ehlers [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary 403.202.3921 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] = returns wrong result? Why
Hi Trafim, take a look at FAQ 7.31. HTH Stephan Trafim Vanishek schrieb: Dear all, Does anybody know the probable reason why = gives false when it should give true? These two variables are of the same type, and everything works in the cycle but then it stops when they are equal. this is the output result Rk[47] = RB[21] [1] FALSE Rk[47] [1] 0.002842007 RB[21] [1] 0.002842007 Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dynamic data.frame headers
I would like to create a data.frame with dynamic created headers. I will later fill it with percentiles. My percentiles vector is: percentiles = c(0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95, 1.00) From this vector I would like to have headers like: p5, p10, p20, ..., p95, p100 Is it possible to create headers in such way, something like p+100*c? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dynamic data.frame headers
Hi Mattias, Try this, percentiles - c(0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95, 1.00) test - data.frame(matrix(NA,0,12)) names(test) - paste(p,percentiles*100,sep=) test [1] p5 p10 p20 p30 p40 p50 p60 p70 p80 p90 p95 p100 Cheers, Hans -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mattias Nyström Sent: Wednesday, January 13, 2010 10:11 To: r-help@r-project.org Subject: [R] Dynamic data.frame headers I would like to create a data.frame with dynamic created headers. I will later fill it with percentiles. My percentiles vector is: percentiles = c(0.05, 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, 0.95, 1.00) From this vector I would like to have headers like: p5, p10, p20, ..., p95, p100 Is it possible to create headers in such way, something like p+100*c? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert factor data to numeric
hello could you give me a hint to convert data in factor type to numeric (float) ? regards -- Open WebMail Project (http://openwebmail.org) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
check the following: http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f Best, Dimitris Ahmet Temiz wrote: hello could you give me a hint to convert data in factor type to numeric (float) ? regards -- Open WebMail Project (http://openwebmail.org) -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
On 01/13/2010 10:47 AM, Ahmet Temiz wrote: hello could you give me a hint to convert data in factor type to numeric (float) ? regards -- Open WebMail Project (http://openwebmail.org) you could try as.numeric but without more details it is difficult to see if this will work. How did you end up with a factor (e.g. through import)? Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting p values
Duncan Murdoch murd...@stats.uwo.ca 12/01/2010 18:07:46 I need to get the p values for a table with 15000 entries of t values. ... Put the t values into a vector, then use pt() in an appropriate way ... and don't forget any necessary correction for multiple comparisons; see ?p.adjust Steve E *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expand.grid game
It did take me a good night's sleep to understand it. I was stuck with the exact same question but I see now how the remaining balls are shared among all 8 urns (therefore cases with 11, 12, 13, ... 17 balls are also dealt with). Thanks again, baptiste 2010/1/12 Rolf Turner r.tur...@auckland.ac.nz: On 13/01/2010, at 9:19 AM, Greg Snow wrote: How trivial is probably subjective, I don't think it is much above trivial. I would not have been surprised to see this question on an exam in my undergraduate (300 or junior level) probability course (the hard part was remembering the details from that class from over 20 years ago). My favorite test question of all time came from that course: You have a deck of poker cards with the 3's removed (and jokers), you deal yourself 5 cards at random, what is the probability of getting a straight (not including straight flushes)? This problem is simpler. Just think of the 8 places in the number as urns, and the 17 1's as balls to be put into the urns. One ball has to go in the first urn, so you have 16 left, there are choose(16+8-1,8-1) ways to distribute 16 undistinguishable balls among 8 distinguishable urns. But that includes some solutions with more than 9 balls in an urn which violates the digits restriction, so subtract off the illegal counts. If we place 10 balls in the first urn, then we have 7 remaining balls to distribute between the 8 urns or choose( 7+8-1, 7), If we place 1 ball in the first urn and 10 balls in one of the 7 other urns (7*), then there are choose( 6+8-1, 7 ) ways to distribute the remaining 6 balls in the 8 urns. Not too complicated once you remember (or look up) the formula for urns and balls. Sorry to be a thicko --- but doesn't the foregoing solution *leave in* the possibility of putting all 17 balls in the first urn? Or 3 balls in the first urn, 12 in the second, and the remaining 2 in any of the other six urns? Etc. I.e. don't more terms have to be subtracted? cheers, Rolf Turner ## Attention:This e-mail message is privileged and confidential. If you are not theintended recipient please delete the message and notify the sender.Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshalwww.marshalsoftware.com ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Recommended visualization for hierarchical data
On 01/13/2010 02:46 PM, Rex C. Eastbourne wrote: On Tue, Jan 12, 2010 at 5:26 PM, Rex C. Eastbournerex.eastbou...@gmail.com wrote: Let's say I have data in the following schema that describes the number of purchases a company has received from each County in the US: State | County | Purchases --- NJ | Mercer | 550 CA | Orange | 23 I would like to visualize what states contribute the most to the overall total, and furthermore within those states, what Counties contribute the most. What are some recommended R visualizations for this type of data? I created a treemap using map.market from the portfolio library, like the following: http://zoonek2.free.fr/UNIX/48_R/g126.png Although this is an attractive visual, I want something that makes it easier to compare the relative sizes of components at a glance (hard with a treemap because rectangles have different aspect ratios). Does anyone have a recommended alternate visualization? Thanks! Just to clarify: I made up the above example for simplicity's sake to illustrate what I meant by hierarchical data. My actual data is not related to maps or geography, so a map-based visualization wouldn't work. Hi Rex, Have a look at the hierobarp function in the plotrix package. It produces nested bars that begin with the overall value. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate the percentages of the numbers in every column.
Hi r-help-boun...@r-project.org napsal dne 13.01.2010 01:36:31: tmp - scan() 0 2 1 0 1 0 2 1 2 3 0 0 0 0 1 0 0 2 3 1 dat - matrix(tmp, byrow=T, ncol=4) apply(dat, 2, function(x, min.val, max.val) { tmp - table(x)/length(x) res - rep(0, max.val - min.val + 1) res[as.numeric(names(tmp)) - min.val + 1] - tmp res }, 0, 3) Should do it (but I bet there is a more elegant way). I am not sure if more elegant or efficient but dat.m-melt(as.data.frame(dat)) xtabs(~value+variable, dat.m)/nrow(dat) gives you similar result Regards Petr Regards, Simon Knapp On Wed, Jan 13, 2010 at 5:25 AM, Kelvin 6kelv...@gmail.com wrote: Dear friends, I have a table like this, I have A B C D ... levels, the first column you see is just the index, and there are different numbers in the table. A B C D ... 10 2 1 0 21 0 2 1 32 3 0 0 40 0 1 0 50 2 3 1 ... I want to calculate the frequencies or the percentages of the numbers in every column. How do I get a table like this, the first column is the levels of numbers, and the numbers inside the table are the percentages. All the percentages should add up to 1 in every column. A B C D ... 0 0.2 0.3 0.1 0.1 1 0.1 0.1 0.2 0.1 2 0.1 0.2 0.2 0.2 3 0.2 0.1 0.1 0 ... Thanks your help! Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R for windows 64 bit
Thanks to all of you, therefore if I got it well, I could also use the command And, say, --max-mem-size=8G would work from outside R, to further increase the memory even if the machine is 6giga. I had an idea of that before, but I was not so sure! Many thanks again for all your advices. Best alessia 2010/1/12 Uwe Ligges lig...@statistik.tu-dortmund.de: On 12.01.2010 21:07, Alexander Shenkin wrote: Hi Alessia, Note that, while your physical limit might be 6 GB, Windows memory management allows more memory than that to be allocated (aka Virtual Memory, or at least that's what they called it in XP). Windows swaps out memory from RAM to the hard disk and back when necessary (please excuse the explanation if you already know all this). For processing large vectors, this swapping might bring your system to a standstill. Regardless, the maximum memory for a windows process is larger than the physical RAM you have available. allie In this case 6Gb was the default (as physical maximum in the particular machine) and there was bug in the *experimental* version of R that did not allow to increase memory size from within R using memory.limit() which already has been fixed thanks to Brian Ripley. Uwe Ligges On 1/12/2010 6:27 AM, alessia matano wrote: Fine, it worked. I will try in this way. Just the last question and I won't bother you further today. My machine right now has just 6 giga of RAM (it will be increased to 16 in a few days), and I see that with this experimental version memory.limit is 6135. How is the command to increase the memory usage until the maximum I can (5 giga?). If I am writing memory.limit(5000) it still gives me the error: don't be silly! Your machine has a 4Gb address limit which is quite odd. Many thanks Best A. 2010/1/12 alessia matanoalexis@gmail.com: ok, perfect! I will try with it...many many thanks. Have you got there also the quantreg package, which has actually the same problem of sparseM (32bit version)? best alessia 2010/1/12 Uwe Liggeslig...@statistik.tu-dortmund.de: On 12.01.2010 12:09, alessia matano wrote: I am sorry, I know it is an experimental version, and I have been misleading saying a new version. Therefore, I will wait for when they will be available officially, since it is just a few days. Or just use today my private repository I indicated in the other mail. Uwe Ligges However, I tried also to go to the cran pages and download them and insert into the library. For quantreg it worked, for sparseM it did not probably because it's a win32 version, as you said. 2010/1/12 Prof Brian Ripleyrip...@stats.ox.ac.uk: On Tue, 12 Jan 2010, alessia matano wrote: Dear all, I just download and set this new version of R. I am now trying to download the packages I need which are sperseM and quantreg. I downloaded and insert into the library file the quantreg pacjkage and it seems to work. However, when I try to do the same with sparseM I get the following error message: Loading required package: SparseM Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'C:/PROGRA~1/R/R-211~1.0DE/library/SparseM/libs/SparseM.dll': LoadLibrary failure: %1 non è un'applicazione di Win32 valida. Any help for it? Please do refer to the posting referred to in that thread (and Henrique, please do not post just the URL without the explanations). https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html You cannot mix 32-bit Windows binary packages with this experimental port (it is not a 'new version'): you need to install from the package sources. If that is too difficult for you, please do not try to use unsupported experimental builds (and Uwe Ligges may have some binary packages available for test in a few days). Thanks a lot alessia 2010/1/11 Henrique Dallazuannawww...@gmail.com: Try this version (beta of development version): http://www.stats.ox.ac.uk/pub/RWin/Win64/R-2.11.0dev-win64.exe On Mon, Jan 11, 2010 at 2:29 PM, alessia matanoalexis@gmail.com wrote: Dear all, do you know if there is any particular version of R to implement with windows 64 bit, in such a way to increase the amount of memory it can use? How should I increase the memory, and more importantly to set a higher max vector size? It still stops me saying Could not allocate vector of size 145 thanks to all alessia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
[R] selection of multiple subscripts
Readers, For a data set 'x': 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i How to select multiple subscripts to plot? For example to plot values 1:3 and 9:10: plot(x[1:3,1],x[,2]) and plot(x[9:10,1],x[,2]) into one plot? Yours, rhelpatconference.jabber.org r251 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Illustrating kernel distribution in wheat ears
Hi, Thanks a lot for your suggestions and the very detailed instructions, I needed them... Everything worked fine also in the full dataset, up until the last suggestion (the box plots) Here I also got an error message, but a different one from what you got. And no output... Here are the last two command lines and the error message: q - ggplot(spikes.long, aes(side, value)) q + geom_boxplot() + facet_grid(~ cultivar) Error in `[.data.frame`(plot$data, , setdiff(cond, names(df)), drop = FALSE) : undefined columns selected I used the same variable names and have done the steps suggested up to this point, but with a much bigger dataset than in the question sample. Sorry to say, I don´t understand the error message.. But the first two variants of plots worked nice and are possible to use for me. All the best /CG Från: Dennis Murphy [djmu...@gmail.com] Skickat: den 11 januari 2010 15:03 Till: Carl-Göran CG. Pettersson Kopia: r-help@r-project.org Ämne: Re: [R] Illustrating kernel distribution in wheat ears Hi: It wasn't clear to me precisely what you wanted, but here are a couple of ideas in the hope that it will help. I used ggplot2 for the graphics, so it requires some manipulation of your dataset from 'wide' format to 'long'. I also add an indicator for side of the ear (odd is side one (L?), even is side 2) and a variable I call 'loc' to indicate the value associated with the splxx variable. I read the data into a data frame called spikelets. The first step is to remove the rows of missing responses: naind - apply(spikelets[, -1], 1, function(x) all(is.nahttp://is.na(x))) spikelets2 - spikelets[!naind, ] Next, I use the plyr package and its melt() function to convert the data frame from 'wide' to 'long' form: library(ggplot2) # attaches the plyr package in the loading process spikes.long - melt(spikelets2, id = 'cn') The variable 'variable' contains the variable names as a vector (spl01, spl02, ..., spl14) Next, I create a variable called loc, which represents the numeric part of the spl variables, and then create a variable side to distinguish one side of the awn from the other. 'variable' is then removed... spikes.long$loc - as.numeric(substring(spikes.long$variable, 4)) spikes.long$side - factor(2 - spikes.long$loc %% 2) spikes.long$variable - NULL Now we're in a position to plot. The first is a scatterplot of the response by location, stratified by cultivar; it contains color to distinguish sides. # With color: p - qplot(loc, value, data = spikes.long, group = cn, colour = side) p + facet_grid(cn ~ .) The color is not terribly informative, so to get rid of it, remove the colour = side argument. One could also merge the plots together and fit smooths to the different cultivars. ggplot(spikes.long, aes(loc, value, colour = cn)) + geom_point() + geom_smooth(se = FALSE) I also came up with boxplot pairs by side for each cultivar, which is shown below: q - ggplot(spikes.long, aes(side, value)) q + geom_boxplot() + facet_grid(~ cultivar) For some reason, I kept getting these messages from every ggplot2 call: Error in recordGraphics(drawGTree(x), list(x = x), getNamespace(grid)) : invalid graphics state but all of the plots rendered as expected. HTH, Dennis 2010/1/10 Carl-Göran CG. Pettersson cg.petters...@vpe.slu.semailto:cg.petters...@vpe.slu.se Dear all R2.10 WinXP I have a dataset dealing with the way different wheat cultivars build their yield. Wheat ears are organised in spikelets where the spikelets can be numbered from the bottom, with even numbers on one side and odd on the other. I know how many kernels there were in each spikelet after some months spent counting them... Now I want to illustrate the differences between the cultivars in how the kernels are distributed in the ears. In the best of all possible worlds it would be possible to place histograms or boxplots on adjecent sides of vertical lines representing different cultivars. I have done some experimenting using boxplot() but I am stuck and out of ideas right now. All ideas are welcome! /CG Here is a sample dataset with the countings of kernels for the first 14 spikelets: cn spl01 spl02 spl03 spl04 spl05 spl06 spl07 spl08 spl09 spl10 spl11 spl12 spl13 spl14 Lans1.8 3.1 3.5 3.8 3.8 4.1 4.2 4.3 4.4 4.5 4.2 4.1 3.9 3.8 Kranich 0.6 2.4 3.4 4.2 4.5 4.7 4.9 4.9 4.8 4.7 4.4 4.1 4.1 3.9 Loyal 1.1 2.7 3.6 3.7 4.1 4.4 4.4 4.6 4.3 4.5 4.3 4.1 3.8 3.7 Boomer NA NA NA NA NA NA NA NA NA NA NA NA NA NA Oakley NA NA NA NA NA NA NA NA NA NA NA NA NA NA Hereford0.6 2.3 3.3 3.6 3.9
[R] Odp: selection of multiple subscripts
Hi see ?points or ?lines which you would surely found out if you bother to look at ?plot help page Regards Petr r-help-boun...@r-project.org napsal dne 13.01.2010 13:36:57: Readers, For a data set 'x': 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i How to select multiple subscripts to plot? For example to plot values 1:3 and 9:10: plot(x[1:3,1],x[,2]) and plot(x[9:10,1],x[,2]) into one plot? Yours, rhelpatconference.jabber.org r251 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selection of multiple subscripts
On 13/01/2010 7:36 AM, e-letter wrote: Readers, For a data set 'x': 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i How to select multiple subscripts to plot? For example to plot values 1:3 and 9:10: plot(x[1:3,1],x[,2]) and plot(x[9:10,1],x[,2]) into one plot? Neither of those will work, because your x[,2] vector is longer than the other vector. What you want is something like this: plot(col2 ~ col1, data=x[c(1:3, 9:10),]) where col1 and col2 are the names of those two columns. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to do FMOLS and DOLS?
Hi, Can R do FMOLS(Fully Modified OLS) and DOLS(Dynamic OLS)? I cannot find any useful thing in the present package. Thanks in advance! -- View this message in context: http://n4.nabble.com/How-to-do-FMOLS-and-DOLS-tp1012976p1012976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selection of multiple subscripts
On 13/01/2010, Duncan Murdoch murd...@stats.uwo.ca wrote: On 13/01/2010 7:36 AM, e-letter wrote: Readers, For a data set 'x': 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i How to select multiple subscripts to plot? For example to plot values 1:3 and 9:10: plot(x[1:3,1],x[,2]) and plot(x[9:10,1],x[,2]) into one plot? Neither of those will work, because your x[,2] vector is longer than the other vector. What you want is something like this: plot(col2 ~ col1, data=x[c(1:3, 9:10),]) Thanks, I now understand the concatenate function would help but forgot the syntax. Anyway I've just realised that the search database for R yields no result for '?concatenate' which is surprising. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selection of multiple subscripts
On 13/01/2010, e-letter inp...@gmail.com wrote: On 13/01/2010, Duncan Murdoch murd...@stats.uwo.ca wrote: On 13/01/2010 7:36 AM, e-letter wrote: Readers, For a data set 'x': 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i How to select multiple subscripts to plot? For example to plot values 1:3 and 9:10: plot(x[1:3,1],x[,2]) and plot(x[9:10,1],x[,2]) into one plot? Neither of those will work, because your x[,2] vector is longer than the other vector. What you want is something like this: plot(col2 ~ col1, data=x[c(1:3, 9:10),]) Thanks, I now understand the concatenate function would help but forgot the syntax. Anyway I've just realised that the search database for R yields no result for '?concatenate' which is surprising. For the benefit of other novices: for the data set, the subscripts should have read: 1:3 and 8:9 Alternatively, the data set should have included: 10 j :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] column width in .dbf files using write.dbf ... to be continued
Dear UseRs, I did not have any answer to my previous message (Is there a way to define manually columns width when using write.dbf function from the library foreign ?), so I tried to modify write.dbf function to do what I want. Here is my modified version : write.dbfMODIF - function (dataframe, file, factor2char = TRUE, max_nchar = 254, width = d) { allowed_classes - c(logical, integer, numeric, character, factor, Date) if (!is.data.frame(dataframe)) dataframe - as.data.frame(dataframe) if (any(sapply(dataframe, function(x) !is.null(dim(x) stop(cannot handle matrix/array columns) cl - sapply(dataframe, function(x) class(x[1L])) asis - cl == AsIs cl[asis sapply(dataframe, mode) == character] - character if (length(cl0 - setdiff(cl, allowed_classes))) stop(data frame contains columns of unsupported class(es) , paste(cl0, collapse = ,)) m - ncol(dataframe) DataTypes - c(logical = L, integer = N, numeric = F, character = C, factor = if (factor2char) C else N, Date = D)[cl] for (i in seq_len(m)) { x - dataframe[[i]] if (is.factor(x)) dataframe[[i]] - if (factor2char) as.character(x) else as.integer(x) else if (inherits(x, Date)) dataframe[[i]] - format(x, %Y%m%d) } precision - integer(m) scale - integer(m) dfnames - names(dataframe) for (i in seq_len(m)) { nlen - nchar(dfnames[i], b) x - dataframe[, i] if (is.logical(x)) { precision[i] - 1L scale[i] - 0L } else if (is.integer(x)) { rx - range(x, na.rm = TRUE) rx[!is.finite(rx)] - 0 if (any(rx == 0)) rx - rx + 1 mrx - as.integer(max(ceiling(log10(abs(rx + 3L) precision[i] - min(max(nlen, mrx), 19L) scale[i] - 0L } else if (is.double(x)) { precision[i] - 19L rx - range(x, na.rm = TRUE) rx[!is.finite(rx)] - 0 mrx - max(ceiling(log10(abs(rx scale[i] - min(precision[i] - ifelse(mrx 0L, mrx + 3L, 3L), 15L) } else if (is.character(x)) { if (width == d) { mf - max(nchar(x[!is.na(x)], b)) p - max(nlen, mf) if (p max_nchar) warning(gettext(character column %d will be truncated to %d bytes, i, max_nchar), domain = NA) precision[i] - min(p, max_nchar) scale[i] - 0L } else { if (width max_nchar) warning(gettext(character column %d will be truncated to %d bytes, i, max_nchar), domain = NA) precision[i] - min(width, max_nchar) } } else stop(unknown column type in data frame) } if (any(is.na(precision))) stop(NA in precision) if (any(is.na(scale))) stop(NA in scale) invisible(.Call(DoWritedbf, as.character(file), dataframe, as.integer(precision), as.integer(scale), as.character(DataTypes))) } However, when I wanted to use this function ... it does not find the DoWritedbf function that is called in the last lines (a function written in C). Is there a way to temporally replace the original write.dbf function by this one in the foreign package ? Thanks, Arnaud R version 2.10.0 (2009-10-26) i386-pc-mingw32 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting a linear step function without vertical lines
---BeginMessage--- ---End Message--- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selection of multiple subscripts
On 13/01/2010 8:09 AM, e-letter wrote: On 13/01/2010, Duncan Murdoch murd...@stats.uwo.ca wrote: On 13/01/2010 7:36 AM, e-letter wrote: Readers, For a data set 'x': 1 a 2 b 3 c 4 d 5 e 6 f 7 g 8 h 9 i How to select multiple subscripts to plot? For example to plot values 1:3 and 9:10: plot(x[1:3,1],x[,2]) and plot(x[9:10,1],x[,2]) into one plot? Neither of those will work, because your x[,2] vector is longer than the other vector. What you want is something like this: plot(col2 ~ col1, data=x[c(1:3, 9:10),]) Thanks, I now understand the concatenate function would help but forgot the syntax. Anyway I've just realised that the search database for R yields no result for '?concatenate' which is surprising. That's because there's no concatenate function in base R. If you want to search for the word concatenate, use ??concatenate. You won't find the c() function, because it is called combine, but you'll find several other ways to concatenate. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading fifo with read.table hangs
To R-helpers, Running R version 2.10.0 (2009-10-26) Linux ... 2.6.25.20-0.5-default #1 SMP 2009-08-14 01:48:11 +0200 x86_64 x86_64 x86_64 GNU/Linux openSUSE 11.0 (X86-64) and having difficulties reading a fifo from within R. A short example that I find simply haning is shown as 'SHORT SCRIPT' below. I expected R to print a data set read from the fifo with the numbers 0,1,...7 and then gracefully exit. Any ideas why not? A longer script that actually does the job in its 2nd clause is shown in 'LONG SCRIPT' below ... I'm confused that the open call is needed. Any comments on this? Regards MJ --- SHORT SCRIPT BEGIN #!/bin/bash mkfifo chops gawk 'BEGIN {for (i=0;i8;i++){print i}}' chops R --slave --no-save EOF print (Hello from R) con.data - read.table (chops) con.data EOF unlink chops --- SHORT SCRIPT END --- LONG SCRIPT BEGIN #!/bin/bash DO_1st=no DO_2nd=yes DO_3rd=yes # 1 Hoped for this to work but fails if [[ $DO_1st =~ [yY][eE][sS] ]] ; then echo With R 1 mkfifo chops gawk 'BEGIN {for (i=0;i8;i++){print i}}' chops R --slave --no-save EOF print (Hello from R 1) con.data - read.table (chops) con.data EOF unlink chops fi # 2 Works but with an unexpected open call if [[ $DO_2nd =~ [yY][eE][sS] ]] ; then echo With R 2 mkfifo chops gawk 'BEGIN {for (i=0;i8;i++){print i}}' chops R --slave --no-save EOF print (Hello from R 2) theFifo - fifo(description=chops, open=read) open(theFifo) # without this read.table raises error of no lines available con.data - read.table (theFifo) close(theFifo) con.data EOF unlink chops fi # 3 Works - just for reference if [[ $DO_3rd =~ [yY][eE][sS] ]] ; then echo With cat mkfifo chops gawk 'BEGIN {for (i=0;i8;i++){print i}}' chops cat chops unlink chops fi --- LONG SCRIPT END __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem fitting a non-linear regression model with nls
Hi, I'm trying to make a regression of the form : formula - y ~ Asym_inf + Asym_sup * ( (1 / (1 + (n1 * (exp( (tmid1-x) / scal1) )^(1/n1) ) ) ) - (1 / (1 + (n2 * (exp( (tmid2-x) / scal2) )^(1/n2) ) ) ) ) which is a sum of the generalized logistic model proposed by richards. with data such as these: x - c(88,113,128,143,157,172,184,198,210,226,240,249,263,284,302,340) y - c(0.04,0.16,1.09,2.65,2.46,2.43,1.88,2.42,1.51,1.70,1.92,1.35,0.89,0.34,0.13,0.10) I use the nls function to fit my data to the model. nls(formule, data=cbind.data.frame(x,y), start=list(Asym_inf =min(y),Asym_inf =max(y)-min(y), n1=1,n2=1,tmid1=120,tmid2=250,scal1=11,scal2=30)) and it always finished by one of those answers (even if I change the initial values) : - Error in nls(formule, data = cbind.data.frame(x, y), start = list(Asym_inf =min(y), : \n le pas 0.000488281 est devenu inférieur à 'minFactor' de 0.000976562\n - Error in nls(formule, data = cbind.data.frame(x, y), start = list(miny = min(y), : \n gradient singulier\n - Error in numericDeriv(form[[3]], names(ind), env) : \n Valeur manquante ou infinie obtenue au cours du calcul du modèle\n) - Error in nlsModel(formula, mf, start, wts) : \n singular gradient matrix at initial parameter estimates\n So it seems that I reach a local extremum each time. I know that most of the problem comes from the choice of the initial values of the parameters Asym_inf, Asym_inf, n1, n2, tmid1, tmid2, scal1and scal2. My question is how could I estimate those initial values so that the nls fitting works. Thanks in advance -- Nathalie YAUSCHEW-RAGUENES Ph.D Student Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement (EPHYSE) INRA, Centre de Bordeaux - Aquitaine 71 Av Edouard Bourlaux 33883 Villenave d'Ornon Cedex France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting moving range control chart
I have been having the same problem as poster Hodgess, below. It appears that her question was never answered, so I would like to share a solution with the community. The problem is the (apparent?) inability to produce moving range process behavior (a.k.a. control) charts with individuals data in the package qcc (v. 2.0). I have also struggled with the same limitation in package IQCC (v. 1.0). The package qAnalyst (v. 0.6.0) provides an option to produce a moving range chart with individuals data. The example given in the qAnalyst manual for function spc yields an individuals chart: #i-chart, moving range to estimate st. dev. is equal to 2 points with testType=1, data(rawWeight) ichart=spc(x=rawWeight$rawWeight, sg=2, type=i, name=weight, testType=1) plot(ichart) summary(ichart) Changing type = 'i' to type = 'mr' yields the moving chart: mrchart = spc(x = rawWeight$rawWeight, sg = 2, type = mr, name = weight, testType = 1) plot(mrchart) summary(mrchart) In separate tests, I have confirmed that qAnalyst correctly computes natural process limits (a.k.a. control limits) for X-bar and R charts, using the average of the subgroup means. I have not yet checked the calculations for the ImR or other charts. An additional difference between these packages is that qAnalyst uses the lattice library to generate output, while the other two packages appear to use the (traditional) graphics library. Regards, Tom On Tue, 10 Nov 2009 23:39:23 -0600, Erin Hodgess erinm.hodgess_at_gmail.comerinm.hodgess_at_gmail.com?Subject=Re:%20[R]%20%20plotting%20moving%20range%20control%20chart wrote: Dear R People: I am using qcc for a quality control class. I have used qcc with type xbar.one for individuals but cannot determine how to plot a moving range control chart. Has anyone done that, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess_at_gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I store the results
Dear R users, I am running a R code which gives me 10 columns and 160 rows. I need to run the code for 100 times and each time I need to store the results in a single file. I do not know how can I store them in a single file without over writting the results? Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem fitting a non-linear regression model with nls
You could try the brute force of nls2 package; however, note that you have 8 parameters and only 16 points so you might look for a more parsimonious model. Plotting it it seems somewhat gaussian in shape so: mod - nls(y ~ a * dnorm(x, b, c), start = c(a = mean(y)/dnorm(0, 0, sd(x)), b = mean(x), c = sd(x))) matplot(x, cbind(y, fitted(mod)), type = c(p, l), pch = 20) On Wed, Jan 13, 2010 at 9:02 AM, Nathalie Yauschew-Raguenes nathalie.yauschew-rague...@bordeaux.inra.fr wrote: Hi, I'm trying to make a regression of the form : formula - y ~ Asym_inf + Asym_sup * ( (1 / (1 + (n1 * (exp( (tmid1-x) / scal1) )^(1/n1) ) ) ) - (1 / (1 + (n2 * (exp( (tmid2-x) / scal2) )^(1/n2) ) ) ) ) which is a sum of the generalized logistic model proposed by richards. with data such as these: x - c(88,113,128,143,157,172,184,198,210,226,240,249,263,284,302,340) y - c(0.04,0.16,1.09,2.65,2.46,2.43,1.88,2.42,1.51,1.70,1.92,1.35,0.89,0.34,0.13,0.10) I use the nls function to fit my data to the model. nls(formule, data=cbind.data.frame(x,y), start=list(Asym_inf =min(y),Asym_inf =max(y)-min(y), n1=1,n2=1,tmid1=120,tmid2=250,scal1=11,scal2=30)) and it always finished by one of those answers (even if I change the initial values) : - Error in nls(formule, data = cbind.data.frame(x, y), start = list(Asym_inf =min(y), : \n le pas 0.000488281 est devenu inférieur à 'minFactor' de 0.000976562\n - Error in nls(formule, data = cbind.data.frame(x, y), start = list(miny = min(y), : \n gradient singulier\n - Error in numericDeriv(form[[3]], names(ind), env) : \n Valeur manquante ou infinie obtenue au cours du calcul du modèle\n) - Error in nlsModel(formula, mf, start, wts) : \n singular gradient matrix at initial parameter estimates\n So it seems that I reach a local extremum each time. I know that most of the problem comes from the choice of the initial values of the parameters Asym_inf, Asym_inf, n1, n2, tmid1, tmid2, scal1and scal2. My question is how could I estimate those initial values so that the nls fitting works. Thanks in advance -- Nathalie YAUSCHEW-RAGUENES Ph.D Student Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement (EPHYSE) INRA, Centre de Bordeaux - Aquitaine 71 Av Edouard Bourlaux 33883 Villenave d'Ornon Cedex France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] wrong with using subset
hello is it wrong with this expression: subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2))) 0.2 (dfpr2_r$landa 10)) it gives nothing regards -- Open WebMail Project (http://openwebmail.org) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] = returns wrong result? Why
Yupp, FAQ 7.31 is definitely your friend here. You might also want to take a look at these two very recent threads on this help list: Strange behaviour of as.integer() http://tolstoy.newcastle.edu.au/R/e9/help/10/01/index.html#547 Newbie question on precision http://tolstoy.newcastle.edu.au/R/e9/help/10/01/index.html#718 Best, Magnus On 1/13/2010 3:25 AM, Stephan Kolassa wrote: take a look at FAQ 7.31. Trafim Vanishek wrote: Does anybody know the probable reason why = gives false when it should give true? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dynamic file / url name with read.csv
A few packages have support for basic download from Yahoo Finance. If that's what you are trying to achieve - you may want to try quantmod (getSymbols function) or tseries (get.hist.quote function). If you want to do something not supported yet - first take a look at their source code. Regards, Ivan From: k...@csusb.edu Date: Tue, 12 Jan 2010 22:25:17 -0800 To: r-help@r-project.org Subject: Re: [R] Dynamic file / url name with read.csv A few suggestions: Don't mix ' and Use paste() Don't include an extraneous ; SymA- SPY Sym1- paste(http://ichart.finance.yahoo.com/table.csv?s=,SymA,ignore=.csv,sep=;) Symbol- read.csv(Sym1, stringsAsFactors=F) On Jan 12, 2010, at 10:03 PM, B S wrote: Hi- I would like to be able to change the value of SymA below and download a file from the corresponding URL. Hardcoded, this line works fine: Symbol- read.csv(http://ichart.finance.yahoo.com/table.csv?s=SPYignore=.csv;, stringsAsFactors=F) However, when I incorporate using a variable for the ticker, it no longer works. SymA- SPY Sym1- cat('http://ichart.finance.yahoo.com/table.csv?s=,SymA,ignore=.csv,sep=;;) Symbol- read.csv(Sym1, stringsAsFactors=F) I know that the problem lies in the concatenation, but I've tried different variations of cat() and toString() (and others) with SymA and Sym1 but cannot seem to get a string together that will work. Would appreciate any suggestions for this simple problem?? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem fitting a non-linear regression model with nls
My question is how could I estimate those initial values so that the nls fitting works. You can't. Your parameters are almost certainly nonidentifiable (which is what Gabor told you more gracefully). Just because you believe in a complex (often mechanistic) nonlinear model and have some data does not assure that the model parameters can be estimated. If you do not understand why this is so, consider fitting even a simple 4 parameter logistic when the data do not level off at the top and/or bottom end. There are then infinitely many solutions in which the parameters trade off with one another to give essentially identical fits. That is what the singular gradient message is trying to tell you. Bert Gunter Genentech Nonclinical Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] wrong with using subset
I would suggest that first you look at the results of (as.numeric(as.character(dfpr2_r$pr2))) 0.2 (dfpr2_r$landa 10) by itself. Does it give all FALSE ? Then look at each of the parts separately. What are the results of (as.numeric(as.character(dfpr2_r$pr2))) 0.2 and dfpr2_r$landa 10 Are there any TRUE among the results? Does as.numeric(as.character(dfpr2_r$pr2)) give what you expect? -Don At 5:20 PM +0200 1/13/10, Ahmet Temiz wrote: hello is it wrong with this expression: subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2))) 0.2 (dfpr2_r$landa 10)) it gives nothing regards -- Open WebMail Project (http://*openwebmail.org) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] wrong with using subset
On 13/01/2010 10:45 AM, Don MacQueen wrote: I would suggest that first you look at the results of (as.numeric(as.character(dfpr2_r$pr2))) 0.2 (dfpr2_r$landa 10) by itself. Does it give all FALSE ? I'd guess the problem is using instead of . Duncan Murdoch Then look at each of the parts separately. What are the results of (as.numeric(as.character(dfpr2_r$pr2))) 0.2 and dfpr2_r$landa 10 Are there any TRUE among the results? Does as.numeric(as.character(dfpr2_r$pr2)) give what you expect? -Don At 5:20 PM +0200 1/13/10, Ahmet Temiz wrote: hello is it wrong with this expression: subset(dfpr2_r,(as.numeric(as.character(dfpr2_r$pr2))) 0.2 (dfpr2_r$landa 10)) it gives nothing regards -- Open WebMail Project (http://*openwebmail.org) -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ask about large data set
On 1/12/2010 8:29 PM, Yi Du wrote: Hi, Is that okay to let R to read data set more than 1 rows and use it to do some kernel density estimation? Thanks. Yi Why don't you just try it and see? Nothing bad will happen - the absolute worst case scenario is that R will hang. But I can tell you that reading 1 rows should be a piece of cake on any decent computer. Different estimation techniques are different in terms of computational intensity. Trying it is the best approach. If you run into problems, you could come back with specific questions of optimization. Best, Magnus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem fitting a non-linear regression model with nls
Actually, the data that I used are measurements of plant growth during an entire year.It is usual to model the growth with logistic models. I have already tried the simple logistic model (which works). But the problem is that with this model the inflexion point occurs half-way up or down the logistic curve. Thats why, despite the small amount of measurements, I wanted to try the generalized logistic model proposed by richards. So I will still try the nls2 package, just in case. And if it doesn't work, I'll use a more parsimonious model as you two have suggested. Thank you for your answers -- Nathalie YAUSCHEW-RAGUENES Ph.D Student Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I store the results
You could put all of your results into a single list, then just save the list. Or, functions like write.table and write have an append argument, set that to true and the information will be appended to the file rather than overwriting it. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Alex Roy Sent: Wednesday, January 13, 2010 8:00 AM To: r-help@r-project.org Subject: [R] How can I store the results Dear R users, I am running a R code which gives me 10 columns and 160 rows. I need to run the code for 100 times and each time I need to store the results in a single file. I do not know how can I store them in a single file without over writting the results? Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I store the results
Collect the results in a list (one entry for each matrix) and then 'save' the list. When you 'load' it back in, you can easily reference each element for further processing. On Wed, Jan 13, 2010 at 9:59 AM, Alex Roy alexroy2...@gmail.com wrote: Dear R users, I am running a R code which gives me 10 columns and 160 rows. I need to run the code for 100 times and each time I need to store the results in a single file. I do not know how can I store them in a single file without over writting the results? Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R package dependencies
Hi there, My question relates to getting information about R packages. In particular i would like to be able to find from within R: what are a packages dependencies what are a packages reverse dependencies does a package contain a dll The reason i ask is: The organisation that i work for is introducing a secure intranet operating on windows PCs and laptops, and this requires that all software / executables / dlls are validated before they are combined to produce a generic PC build. I would like to maximise the packages available to our staff and so for the packages that we have listed as buisness needs, i would like to include all reverse dependencies of this collection that do not have dlls. I hope this makes sense (the question not the reason). Kind regards, Colin. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
S Devriese wrote: On 01/13/2010 10:47 AM, Ahmet Temiz wrote: hello could you give me a hint to convert data in factor type to numeric (float) ? regards -- Open WebMail Project (http://openwebmail.org) you could try as.numeric but without more details it is difficult to see if this will work. How did you end up with a factor (e.g. through import)? No, don't use as.numeric(). Do follow Dimitris' advice. But the question of how you got the factor data is good; you can usually avoid getting factors to begin with. -Peter Ehlers Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary 403.202.3921 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] exporting data frame - write foreign inconsistencies
Hello List, I have a data frame object (wa2) that I am exporting for use in another statistics package. Using library(foreign) write.foreign(wa2, choose.files(), choose.files(), package='SPSS') I noticed that there were several differences between the data sets as seen within R (View(wa2)) and what was produced in SPSS. Examining the data file produced by write.foreign (before running the generated SPSS syntax), I noticed the same inconsistencies. I then used: write.table(wa2, choose.files(), sep=,, col.names=TRUE, row.names=FALSE, quote=TRUE, na=NA) and the file generated using this method matched what was in the R object. I'm trying to send this dataset to a colleague who will only use SPSS. Any ideas why the two methods produce different data files? -- sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C [5] LC_TIME=English_Canada.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rcmdr_1.5-4car_1.2-16 relimp_1.0-1 foreign_0.8-39 loaded via a namespace (and not attached): [1] tools_2.10.1 -- Thanks in advance. Sincerely; John Cullen, M.Sc. caninesinmotion.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?
Hello, I am learning randomForest, now I want to boxplot mse and mtry using 20 5-fold cross-validation(using median value), but I have no a good method to do it, except a not good method. randomforest package itself did not contain cross-validating method, and caret package contain cross validation method, but how can I get the the all number of mtry , at the same time corresponding mse? -- View this message in context: http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Method for reduction of independent variables
Hello I am currently investing software code metrics for a variety of software projects of a company to determine the worst parts of software products according to specified quality characteristics. As the gathering of metrics correlates with effort, I would like to find a subset of the metrics preserving significant predictive power for the problem value while using the least amount of code metrics. I have the results of 25 metrics for 6 software projects for a combined 9355 individuals, i.e. software parts with metrics. However, as many metrics only measure metric values above a predefined limit, 58% of the responses for independent variables are 0. Which method can I use to determine a reduced set of independent variables with significant predictive power? As I do not have a statistics background, I would also appreciate a simple explanation of the chosen method and sensible choices for parameters, so that I will be able to infer the reduced set of software metrics to keep. Thank you in advance! Johannes -- View this message in context: http://n4.nabble.com/Method-for-reduction-of-independent-variables-tp1013171p1013171.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
Hello, I find a way to convert data in factor type to numeric : data_numeric - as.numeric(as.character(data_factor)). It's treaky but works. Peter Ehlers a écrit : S Devriese wrote: On 01/13/2010 10:47 AM, Ahmet Temiz wrote: hello could you give me a hint to convert data in factor type to numeric (float) ? regards -- Open WebMail Project (http://openwebmail.org) you could try as.numeric but without more details it is difficult to see if this will work. How did you end up with a factor (e.g. through import)? No, don't use as.numeric(). Do follow Dimitris' advice. But the question of how you got the factor data is good; you can usually avoid getting factors to begin with. -Peter Ehlers Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Nathalie YAUSCHEW-RAGUENES Ph.D Student Unité de Recherches Ecologie Fonctionnelle et Physique de l'Environnement (EPHYSE) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applying function to parts of a matrix based on a factor
R 2.9 Windows XP I have a matrix, Data, which contains a factor Sex and a continuous variable Age. I want to get mean age by sex. I know I can do this with two statements, mean(Data[Age,Data[,Sex]==Male) and mean(Data[Age,Data[,Sex]==Female) I know this can be done in a single command, but I can remember how. There is a function that allows another function work within factors, something like magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is not in the lapply, sapply etc. family Please put me out of my misery (and senior moment) and remind me what function I should be using. John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying function to parts of a matrix based on a factor
with(yourdataframe, tapply(age,sex,mean)) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin Sent: Wednesday, January 13, 2010 12:11 PM To: r-help@r-project.org Subject: [R] Applying function to parts of a matrix based on a factor R 2.9 Windows XP I have a matrix, Data, which contains a factor Sex and a continuous variable Age. I want to get mean age by sex. I know I can do this with two statements, mean(Data[Age,Data[,Sex]==Male) and mean(Data[Age,Data[,Sex]==Female) I know this can be done in a single command, but I can remember how. There is a function that allows another function work within factors, something like magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is not in the lapply, sapply etc. family Please put me out of my misery (and senior moment) and remind me what function I should be using. John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying function to parts of a matrix based on a factor
try this: with(Data, tapply(Age, Sex, mean)) I hope it helps. Best, Dimitris John Sorkin wrote: R 2.9 Windows XP I have a matrix, Data, which contains a factor Sex and a continuous variable Age. I want to get mean age by sex. I know I can do this with two statements, mean(Data[Age,Data[,Sex]==Male) and mean(Data[Age,Data[,Sex]==Female) I know this can be done in a single command, but I can remember how. There is a function that allows another function work within factors, something like magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is not in the lapply, sapply etc. family Please put me out of my misery (and senior moment) and remind me what function I should be using. John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I store the results
On Wed, 2010-01-13 at 15:59 +0100, Alex Roy wrote: Dear R users, I am running a R code which gives me 10 columns and 160 rows. I need to run the code for 100 times and each time I need to store the results in a single file. I do not know how can I store them in a single file without over writting the results? In a list? results - vector(mode = list, length = 100) for(i in seq_along(results) { ## do something ## ## store result for iteration i results[[i]] - something } results will now contain 100 matrices of dim 160x10. HTH G Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulation numbers from a probability table
Dear friends, If I have a table like this, first row A B C D ... are different levels of the variable, first column 0 1 2 4 ... are the levels of the numbers, the numbers inside the table are the probabilities of the number occuring. A B C D... 0 0.20.30.10.05 1 0.10.10.20.2 2 0.02 0.20 0.1 4 0.30.01 0.01 0.4 ... How can I use R to do the simulation and get a table like this, first row A B C D ... are different levels of the variable, the numbers inside the table are the numbers simulated from the probailties table above? A B C D ... 0 4 2 0 2 2 0 1 0 1 4 1 2 2 0 0 ... Thanks for help! Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic programming. It runs pretty quickly, and returns (what I hope is :-) the optimal sum of squares and the cut-points. function(X=bom3$Verses,days=128){ # find optimal BOM reading schedule for Greg Snow # minimize variance of quantity to read per day over 128 days # N = length(X) Nm1 = N-1 SSQ- matrix(NA,nrow=days,ncol=N) Cuts - list() # # SSQ[i,j]: the ssqs about the overall mean for the optimal partition # for i days on the chapters 1 to j # M = sum(X)/days CS = cumsum(X) SSQ[1,]= (CS-M)^2 Cuts[[1]]= as.list(1:N) # for(m in 2:days){ Cuts[[m]]=list() #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]] for(n in m:N){ CS = cumsum(X[n:1])[n:1] SSQ1 = (CS-M)^2 j = (m-1):(n-1) TS = SSQ[m-1,j]+(SSQ1[j+1]) SSQ[m,n] = min(TS) k = min(which((min(TS)== TS)))+m-1 Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n) } } list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]]) } $SSQ [1] 11241.05 $Cuts [1] 2 4 7 9 11 13 15 16 17 19 21 23 25 27 30 31 34 37 [19] 39 41 44 46 48 50 53 56 59 60 62 64 66 68 70 73 75 77 [37] 78 80 82 84 86 88 89 91 92 94 95 96 97 99 100 103 105 106 [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135 137 138 [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163 164 166 [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194 196 199 [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228 234 236 [127] 238 239 On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote: I have a challenge that I want to share with the group. This is not homework (but I may assign it as such if I teach the appropriate class again) and I have found one solution, so don't need anything urgent. This is more for fun to see if others can find a better solution than I did. The challenge: I want to read a book in a given number of days. I want to read an integer number of chapters each day (there are more chapters than days), no stopping part way through a chapter, and at least 1 chapter each day. The chapters are very non uniform in length (some very short, a few very long, many in between) so I would like to come up with a reading schedule that minimizes the variance of the length of the days readings (read multiple short chapters on the same day, long chapters are the only one read that day). I also want to read through the book in order (no skipping ahead to combine short chapters that are not naturally next to each other. My thought was that the optim function with method=SANN would be an appropriate approach, but my first couple of tries did not give very good results. I have since come up with an optim with SANN solution that gives what I consider good results (but I accept that better is possible). Below is a data frame with the lengths of the chapters for the book that originally sparked the challenge for me (but the general idea should work for any book). Each row represents a chapter (in order) with 3 different measures of the length of the chapter. For this challenge I want to read the book in 128 days (there are 239 chapters). I will post my solutions in a few days, but I want to wait so that my direction does not influence people from trying other approaches (if there is something better than optim, that is fine). Good luck for anyone interested in the challenge, The data frame: bom3 - structure(list(Chapter = structure(1:239, .Label = c(1 Nephi 1, 1 Nephi 2, 1 Nephi 3, 1 Nephi 4, 1 Nephi 5, 1 Nephi 6, 1 Nephi 7, 1 Nephi 8, 1 Nephi 9, 1 Nephi 10, 1 Nephi 11, 1 Nephi 12, 1 Nephi 13, 1 Nephi 14, 1 Nephi 15, 1 Nephi 16, 1 Nephi 17, 1 Nephi 18, 1 Nephi 19, 1 Nephi 20, 1 Nephi 21, 1 Nephi 22, 2 Nephi 1, 2 Nephi 2, 2 Nephi 3, 2 Nephi 4, 2 Nephi 5, 2 Nephi 6, 2 Nephi 7, 2 Nephi 8, 2 Nephi 9, 2 Nephi 10, 2 Nephi 11, 2 Nephi 12, 2 Nephi 13, 2 Nephi 14, 2 Nephi 15, 2 Nephi 16, 2 Nephi 17, 2 Nephi 18, 2 Nephi 19, 2 Nephi 20, 2 Nephi 21, 2 Nephi 22, 2 Nephi 23, 2 Nephi 24, 2
Re: [R] Applying function to parts of a matrix based on a factor
If your matrix were a data.frame, it could work like this: df - data.frame(age=1:100, sex=rep(1:2, 50)) with(df, by(age, sex, mean)) without the lapply, sapply etc. family. h At 18:16 13.01.2010, Doran, Harold wrote: with(yourdataframe, tapply(age,sex,mean)) -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin Sent: Wednesday, January 13, 2010 12:11 PM To: r-help@r-project.org Subject: [R] Applying function to parts of a matrix based on a factor R 2.9 Windows XP I have a matrix, Data, which contains a factor Sex and a continuous variable Age. I want to get mean age by sex. I know I can do this with two statements, mean(Data[Age,Data[,Sex]==Male) and mean(Data[Age,Data[,Sex]==Female) I know this can be done in a single command, but I can remember how. There is a function that allows another function work within factors, something like magicfunction(Data,Factor=Sex). n.b. I know the function I am looking for is not in the lapply, sapply etc. family Please put me out of my misery (and senior moment) and remind me what function I should be using. John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for t...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation numbers from a probability table
If the trials are not connected then I would consider melting the table using melt() from the reshape package. And then using lapply() with the function random.function - function(my.prob, number.of.observations = 10) { sum(rbinom(number.of.observations, 1, my.prob)) } in case the trials are connected, by column, than you could use apply(the.data.table, 2, a.function) on it. Where a.function will to multinum distribution (for which I don't remember the function at the moment, but it can be searched). Best, Tal. Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com/ (English) -- On Wed, Jan 13, 2010 at 7:20 PM, Kelvin 6kelv...@gmail.com wrote: Dear friends, If I have a table like this, first row A B C D ... are different levels of the variable, first column 0 1 2 4 ... are the levels of the numbers, the numbers inside the table are the probabilities of the number occuring. A B C D... 0 0.20.30.10.05 1 0.10.10.20.2 2 0.02 0.20 0.1 4 0.30.01 0.01 0.4 ... How can I use R to do the simulation and get a table like this, first row A B C D ... are different levels of the variable, the numbers inside the table are the numbers simulated from the probailties table above? A B C D ... 0 4 2 0 2 2 0 1 0 1 4 1 2 2 0 0 ... Thanks for help! Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ask for histogram
Hi, I use a vector of data to draw the histogram, but it is different from the graph by SAS. Can you check it for me please? b is a column vector of 4332 hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1)) rug(b) When I used rug, I find the records are smaller than 4332. I don't know where I did wrong. Thanks. -- Yi Du [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
On 01/13/2010 05:41 PM, Peter Ehlers wrote: S Devriese wrote: On 01/13/2010 10:47 AM, Ahmet Temiz wrote: hello could you give me a hint to convert data in factor type to numeric (float) ? regards -- Open WebMail Project (http://openwebmail.org) you could try as.numeric but without more details it is difficult to see if this will work. How did you end up with a factor (e.g. through import)? No, don't use as.numeric(). Do follow Dimitris' advice. But the question of how you got the factor data is good; you can usually avoid getting factors to begin with. -Peter Ehlers Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. I know, slightly sloppy answer (see Dimitri's answer), but I hoped to find out how he got the factor in the first place, because if it is an import issue (and e,g. decimal character is different from the locale decimal character) the FAQ answer might not work as expected. Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ask for histogram
Hi, On Wed, Jan 13, 2010 at 12:58 PM, Yi Du abraham...@gmail.com wrote: Hi, I use a vector of data to draw the histogram, but it is different from the graph by SAS. Can you check it for me please? How are we supposed to check something without data, pictures, etc? What do you want checking, exactly? b is a column vector of 4332 hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1)) rug(b) When I used rug, I find the records are smaller than 4332. I don't know where I did wrong. What do you mean? Is the histogram that you're getting surprising? Is the result of adding a rug surprising? Are you actually trying to count 4332 tick marks at the bottom of your plot? What records are smaller than 4332? Try to see if what rug returns, eg: r - rug(b) length(r) should be as long as your `b` vector I'm not sure what you're asking, but hopefully some of the info I threw at you is helpful. Please be a bit more specific with any follow up if you still find anything confusing. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
The key idea is that you are building a matrix that contains the solutions to smaller problems which are sub-problems of the big problem. The first row of the matrix SSQ contains the solution for no splits, ie SSQ[1,j] is just the sum of squares about the overall mean for reading chapters1 through j in one day. The iteration then uses row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j chapters in m-1 days) is part of the overall optimal solution, you have already computed it, and so don't ever need to recompute it. TS = SSQ[m-1,j]+(SSQ1[j+1]) computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to n in 1 day. j is a vector in the function, and min(TS) is the minimum over choices of j, ie SSQ[m,n]. At the end, SSQ[128,239] is the optimal value for reading all 239 chapters in 128 days. That's just the objective function, so the rest involves constructing the list of optimal cuts, ie which chapters are grouped together for each day's reading. That code uses the same idea... constructing a list of lists of cutpoints. statisticians should study a bit of data structures and algorithms! albyn On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic programming. It runs pretty quickly, and returns (what I hope is :-) the optimal sum of squares and the cut-points. function(X=bom3$Verses,days=128){ # find optimal BOM reading schedule for Greg Snow # minimize variance of quantity to read per day over 128 days # N = length(X) Nm1 = N-1 SSQ- matrix(NA,nrow=days,ncol=N) Cuts - list() # # SSQ[i,j]: the ssqs about the overall mean for the optimal partition # for i days on the chapters 1 to j # M = sum(X)/days CS = cumsum(X) SSQ[1,]= (CS-M)^2 Cuts[[1]]= as.list(1:N) # for(m in 2:days){ Cuts[[m]]=list() #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]] for(n in m:N){ CS = cumsum(X[n:1])[n:1] SSQ1 = (CS-M)^2 j = (m-1):(n-1) TS = SSQ[m-1,j]+(SSQ1[j+1]) SSQ[m,n] = min(TS) k = min(which((min(TS)== TS)))+m-1 Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n) } } list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]]) } $SSQ [1] 11241.05 $Cuts [1] 2 4 7 9 11 13 15 16 17 19 21 23 25 27 30 31 34 37 [19] 39 41 44 46 48 50 53 56 59 60 62 64 66 68 70 73 75 77 [37] 78 80 82 84 86 88 89 91 92 94 95 96 97 99 100 103 105 106 [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135 137 138 [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163 164 166 [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194 196 199 [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228 234 236 [127] 238 239 On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote: I have a challenge that I want to share with the group. This is not homework (but I may assign it as such if I teach the appropriate class again) and I have found one solution, so don't need anything urgent. This is more for fun to see if others can find a better solution than I did. The challenge: I want to read a book in a given number of days. I want to read an integer number of chapters each day (there are more chapters than days), no stopping part way through a chapter, and at least 1 chapter each day. The chapters are very non uniform in length (some very short, a few very long, many in between) so I would like to come up with a reading schedule that minimizes the variance of the length of the days readings (read multiple short chapters on the same day, long chapters are the only one read that day). I also want to read through the book in order (no skipping ahead to combine short chapters that are not naturally next to each other. My thought was that the optim function with method=SANN would be an appropriate approach, but my first couple of tries did
Re: [R] Simulation numbers from a probability table
Try this: dat - data.frame(x=11:14, pa=1:4/10, pb=4:1/10) f - function(numreps, data){ pmat - as.matrix(data[-1]) x - data[,1] result - matrix(0, nrow=numreps, ncol=ncol(pmat)) colnames(result) - c(A, B) for(i in seq_len(numreps)){ result[i,] - apply(pmat, 2, function(p) sample(x, 1, prob=p)) } result } f(5, dat) -Peter Ehlers Kelvin wrote: Dear friends, If I have a table like this, first row A B C D ... are different levels of the variable, first column 0 1 2 4 ... are the levels of the numbers, the numbers inside the table are the probabilities of the number occuring. A B C D... 0 0.20.30.10.05 1 0.10.10.20.2 2 0.02 0.20 0.1 4 0.30.01 0.01 0.4 ... How can I use R to do the simulation and get a table like this, first row A B C D ... are different levels of the variable, the numbers inside the table are the numbers simulated from the probailties table above? A B C D ... 0 4 2 0 2 2 0 1 0 1 4 1 2 2 0 0 ... Thanks for help! Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Ehlers University of Calgary 403.202.3921 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Method for reduction of independent variables
Hi, please read the posting guide. You are not likely to get an extensive answer to your question from this list. Your question is a please solve/explain my statistical problem for me question. There are two things problematic with that. First, statistical, and second please solve for me. First, the R-help list is mostly concerned with problems in implementing analyses in R, not with the (choice of the) statistical approach per se (there are few exceptions). Second, please solve for me questions are generally frowned upon, unless you evidence a specific point at which you are stuck and have to make a choice. That is, the list members want to see that you have done your homework to the extent one can expect you to. To ask the list to provide an introduction to data reduction methods without having any background knowledge is, frankly, a waste of your and the list members' time. There are books on the topic, which you can buy or lend, and certainly many online sources to give you a basic background. Or you can start here: http://en.wikipedia.org/wiki/Dimension_reduction. If you want your statistical questions answered and problems solved without reading yourself into the matter, your question is more suitable for a local statistician at your institution or a paid service rather than this list. Best, Daniel - cuncta stricte discussurus - -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of rubystallion Sent: Wednesday, January 13, 2010 11:57 AM To: r-help@r-project.org Subject: [R] Method for reduction of independent variables Hello I am currently investing software code metrics for a variety of software projects of a company to determine the worst parts of software products according to specified quality characteristics. As the gathering of metrics correlates with effort, I would like to find a subset of the metrics preserving significant predictive power for the problem value while using the least amount of code metrics. I have the results of 25 metrics for 6 software projects for a combined 9355 individuals, i.e. software parts with metrics. However, as many metrics only measure metric values above a predefined limit, 58% of the responses for independent variables are 0. Which method can I use to determine a reduced set of independent variables with significant predictive power? As I do not have a statistics background, I would also appreciate a simple explanation of the chosen method and sensible choices for parameters, so that I will be able to infer the reduced set of software metrics to keep. Thank you in advance! Johannes -- View this message in context: http://n4.nabble.com/Method-for-reduction-of-independent-variables-tp1013171 p1013171.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?
In caret, see ?trainControl. Use returnResamp = all Max On Wed, Jan 13, 2010 at 9:47 AM, bbslover dlu...@yeah.net wrote: Hello, I am learning randomForest, now I want to boxplot mse and mtry using 20 5-fold cross-validation(using median value), but I have no a good method to do it, except a not good method. randomforest package itself did not contain cross-validating method, and caret package contain cross validation method, but how can I get the the all number of mtry , at the same time corresponding mse? -- View this message in context: http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formula for normal distribution with know mean and standard error and n terms
Hello, I am searching for a method to calculate a normal distribution. For example this equation is used to calculate the normal curve when the mean and standard deviation are know. p(x) = (1/σ*sqrt(2π)) x exp (- (x-μ)2/2σ2) or (Embedded image moved to file: pic27350.jpg)Normal Probability Distribution Formula However, some of the literature I'm reading (I'm building an ecological niche model for vegetation along several ecological gradients) report the standard error instead and n sample size. Is there an equivalent formula ? If so, how can I also normalize the p(x) term to be within the 0-1 range? Thank you all Steve Steve Friedman Ph. D. Spatial Ecological Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New sp release
The sp package provides class definitions for spatial data, and utilities for spatial data handling and manipulation. The release of sp version 0.9-56 introduces changes in the ways in which Polygon, Polygons, and SpatialPolygons objects are created, moving from R code to compiled C code. Because of these changes, it is possible that users will see changed output. The package maintainers have tested as far as possible, and a beta release has been checked by some users, without any problems coming to light. Further details are given in: https://stat.ethz.ch/pipermail/r-sig-geo/2010-January/007377.html Should anyone see problems following this change, please contact me directly with a reproducible example. -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: roger.biv...@nhh.no ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert factor data to numeric
On 14/01/2010, at 6:00 AM, Nathalie Yauschew-Raguenes wrote: Hello, I find a way to convert data in factor type to numeric : data_numeric - as.numeric(as.character(data_factor)). It's treaky but works. Possibly even more ``treaky'' but more efficient is: data_numeric - as.numeric(levels(data_factor)[data_factor]) as has been pointed out quite a few times on this list. cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Advantages of using SQLite for data import in comparison to csv files
Hello everybody out there using R, I'm using R for the analysis of biological data and write the results down using LaTeX, both on a notebook with linux installed. I've already tried two options for the import of my data: 1. Import from a SQLite database 2. Import from individual csv files edited with sed, awk and sort. Both methods actually work very well, since I don't need advanced features like multi-user network access to the data. My data sets are tables with up to 20 columns and 1000 rows, containing mostly numerical values and strings. Moreover, I might also have to handle microarray data, but I'm not so sure about that yet. Moreover, I need to organise tags for a collection of photos, but this data is of course not analysed with R. I'm now beginning to work on a larger project and have to decide, whether it is better to use SQLite or csv-files for handling my data. I fear, it might get difficult to switch between the two system after having accumulated the data, adapted software for backups and revision control, written makefiles etc. Could anyone of you give me a hint on the additional benefits of importing data from a SQLite database to R to the simpler way of organising the data in csv files? Is it for example possible to select values from a column within a certain range from a csv file using awk? Thanks in advance, Juliet Jacobson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rollapply
Hi I would like to understand how to extend the function (FUN) I am using in rollapply below. ## With the following simplified data, test1 yields parameters for a rolling regression data = data.frame(Xvar=c(70.67,70.54,69.87,69.51,70.69,72.66,72.65,73.36), Yvar =c(78.01,77.07,77.35,76.72,77.49,78.70,77.78,79.58)) data.z = zoo(d) test1 = rollapply(data.z, width=3, FUN = function(z) coef(lm(z[,1]~z[,2], data=as.data.frame(z))), by.column = FALSE, align = right) print(test1) ## Rewriting this to call myfn1 gives test2 (and is consistent with test1 above) myfn1 = function(mydata){ dd = as.data.frame(mydata) l = lm(dd[,1]~dd[,2], data=dd) c = coef(l) } test2 = rollapply(data.z, width=3, FUN= myfn1, by.column = FALSE, align = right) print(test2) ## I would like to be able to use the predict function to obtain a prediction (and its std error) from the rolling regression I have just calculated. My effort below issues a warning that 'newdata' had 1 row but variable(s) found have 3 rows. (if I run this outside of rollapply I don't get this warning) Also, I don't see the predicted value or its se with print(fm2[[1]]). Again, if I run this outside of rollapply I am able to extract the predicted value. Xpred=c(70.67) myfn2 = function(mydata){ dd = as.data.frame(mydata) l = lm(dd[,1]~dd[,2], data=dd) c = coef(l) p = predict(l, data.frame(Xvar=Xpred),se=T) ret=c(l,c,p) } fm2 = rollapply(data.z, width=3, FUN= myfn2, by.column = FALSE, align = right) print(fm2[[1]]) Any insights would be gratefully received. Best regards Pete -- View this message in context: http://n4.nabble.com/Rollapply-tp1013345p1013345.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ask for histogram
If I do b - rnorm(4332) hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1)) rug(b) The plot looks entirely reasonable. As far as being different from SAS, perhaps SAS and R use different breakpoints, that is, different boundaries between the histogram bars. -Don At 11:58 AM -0600 1/13/10, Yi Du wrote: Hi, I use a vector of data to draw the histogram, but it is different from the graph by SAS. Can you check it for me please? b is a column vector of 4332 hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1)) rug(b) When I used rug, I find the records are smaller than 4332. I don't know where I did wrong. Thanks. -- Yi Du [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ask for histogram
Thanks all, I fixed it. On Wed, Jan 13, 2010 at 2:47 PM, Don MacQueen m...@llnl.gov wrote: If I do b - rnorm(4332) hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1)) rug(b) The plot looks entirely reasonable. As far as being different from SAS, perhaps SAS and R use different breakpoints, that is, different boundaries between the histogram bars. -Don At 11:58 AM -0600 1/13/10, Yi Du wrote: Hi, I use a vector of data to draw the histogram, but it is different from the graph by SAS. Can you check it for me please? b is a column vector of 4332 hist(b,probability=T,breaks=30,col='lightblue',ylim=c(0,1)) rug(b) When I used rug, I find the records are smaller than 4332. I don't know where I did wrong. Thanks. -- Yi Du [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http:// *www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 -- -- Yi Du Ph. D student in Economics University of Missouri Department of Economics 118 Professional Building Columbia MO 65211 1-573-239-6467 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Operating on each row of data frame
Hi All I have a data frame in which there are 4 columns . Column 1 : name Column 2-4 : values I would like to calculate mean/Standard error of values in column 2-4 and store them in column 5,6 respectively. I have done the following but doesn't seem to work mean_N_SE -function(x) { name - x[1] vals - c(x[2:4]) temp_mean - mean(vals) SE - sqrt(var(x)/length(x)) } apply(d,1,mean_N_SE) where d = data frame. Can someone help me with this. Thanks! -Abhi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging issue.........
Try the merge function ?merge in1 = id trait1 110.2 211.1 39.7 610.2 78.9 10 9.7 11 10.2 in2 = id trait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 1210.2 1310.1 data1 = read.table(textConnection(in1), header=T) data2 = read.table(textConnection(in2), header=T) mymerge = merge(data1,data2,all.x=TRUE) print(mymerge) karena wrote: hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 110.2 211.1 39.7 610.2 78.9 10 9.7 11 10.2 The second file is like the following: idtrait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 1210.2 1310.1 now I want to merge the two files by the variable id, I only want to keep the ids which show up in the first file. Even the id does not show up in the second file, it doesn't matter, I can keep the missing values. So my question is: how can I merge the two files and keep only the rows whose id show up in the first file? I know how to do it is SAS, just use the following code: merge data1(in=in1) data2(in=in2); by id; if in1; but I really have no idea about how to do it in R. thank you in advance, karean -- View this message in context: http://n4.nabble.com/merging-issue-tp1013356p1013375.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rollapply
See: http://tolstoy.newcastle.edu.au/R/help/04/03/1446.html On Wed, Jan 13, 2010 at 3:45 PM, Pete B peter.breckn...@bp.com wrote: Hi I would like to understand how to extend the function (FUN) I am using in rollapply below. ## With the following simplified data, test1 yields parameters for a rolling regression data = data.frame(Xvar=c(70.67,70.54,69.87,69.51,70.69,72.66,72.65,73.36), Yvar =c(78.01,77.07,77.35,76.72,77.49,78.70,77.78,79.58)) data.z = zoo(d) test1 = rollapply(data.z, width=3, FUN = function(z) coef(lm(z[,1]~z[,2], data=as.data.frame(z))), by.column = FALSE, align = right) print(test1) ## Rewriting this to call myfn1 gives test2 (and is consistent with test1 above) myfn1 = function(mydata){ dd = as.data.frame(mydata) l = lm(dd[,1]~dd[,2], data=dd) c = coef(l) } test2 = rollapply(data.z, width=3, FUN= myfn1, by.column = FALSE, align = right) print(test2) ## I would like to be able to use the predict function to obtain a prediction (and its std error) from the rolling regression I have just calculated. My effort below issues a warning that 'newdata' had 1 row but variable(s) found have 3 rows. (if I run this outside of rollapply I don't get this warning) Also, I don't see the predicted value or its se with print(fm2[[1]]). Again, if I run this outside of rollapply I am able to extract the predicted value. Xpred=c(70.67) myfn2 = function(mydata){ dd = as.data.frame(mydata) l = lm(dd[,1]~dd[,2], data=dd) c = coef(l) p = predict(l, data.frame(Xvar=Xpred),se=T) ret=c(l,c,p) } fm2 = rollapply(data.z, width=3, FUN= myfn2, by.column = FALSE, align = right) print(fm2[[1]]) Any insights would be gratefully received. Best regards Pete -- View this message in context: http://n4.nabble.com/Rollapply-tp1013345p1013345.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operating on each row of data frame
Look at the apply function ?apply x = data.frame(x1=c(1,2,3,4,5),x2=c(2,4,6,8,10),x3=c(1,3,5,7,9)) x$x5=apply(x,1,mean) x$x6=apply(x,1,sd) print(x) Abhishek Pratap wrote: Hi All I have a data frame in which there are 4 columns . Column 1 : name Column 2-4 : values I would like to calculate mean/Standard error of values in column 2-4 and store them in column 5,6 respectively. I have done the following but doesn't seem to work mean_N_SE -function(x) { name - x[1] vals - c(x[2:4]) temp_mean - mean(vals) SE - sqrt(var(x)/length(x)) } apply(d,1,mean_N_SE) where d = data frame. Can someone help me with this. Thanks! -Abhi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/Operating-on-each-row-of-data-frame-tp1013365p1013397.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package dependencies
See the dep function defined here: http://tolstoy.newcastle.edu.au/R/e6/help/09/03/7159.html On Wed, Jan 13, 2010 at 11:39 AM, Colin Millar c.mil...@marlab.ac.uk wrote: Hi there, My question relates to getting information about R packages. In particular i would like to be able to find from within R: what are a packages dependencies what are a packages reverse dependencies does a package contain a dll The reason i ask is: The organisation that i work for is introducing a secure intranet operating on windows PCs and laptops, and this requires that all software / executables / dlls are validated before they are combined to produce a generic PC build. I would like to maximise the packages available to our staff and so for the packages that we have listed as buisness needs, i would like to include all reverse dependencies of this collection that do not have dlls. I hope this makes sense (the question not the reason). Kind regards, Colin. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging issue.........
hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 110.2 211.1 39.7 610.2 78.9 10 9.7 11 10.2 The second file is like the following: idtrait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 1210.2 1310.1 now I want to merge the two files by the variable id, I only want to keep the ids which show up in the first file. Even the id does not show up in the second file, it doesn't matter, I can keep the missing values. So my question is: how can I merge the two files and keep only the rows whose id show up in the first file? I know how to do it is SAS, just use the following code: merge data1(in=in1) data2(in=in2); by id; if in1; but I really have no idea about how to do it in R. thank you in advance, karean -- View this message in context: http://n4.nabble.com/merging-issue-tp1013356p1013356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting the number of times a string appears
Hi all, I have a vector of strings and need to count the number of times a string appears in the vector. eg: [1] spp6 spp10 spp6 spp6 spp4 spp2 spp9 spp10 spp5 spp2 spp2 spp3 [13] spp4 spp3 spp6 spp10 spp6 spp4 spp9 spp3 spp6 spp1 spp10 spp8 [25] spp2 spp10 spp9 spp7 spp1 spp3 spp8 spp6 spp3 spp8 spp6 spp5 [37] spp5 spp9 spp3 spp1 spp4 spp5 spp9 spp3 spp3 spp5 spp4 spp9 [49] spp3 spp7 spp7 spp2 spp6 spp5 spp7 spp4 spp8 spp9 spp2 spp6 [61] spp3 spp3 spp2 spp6 spp3 spp5 spp6 spp6 spp4 spp1 spp1 spp1 [73] spp10 spp8 spp1 spp6 spp1 spp5 spp8 spp9 spp5 spp6 spp9 spp10 [85] spp2 spp6 spp10 spp1 spp2 spp3 spp5 spp8 spp2 spp7 spp4 spp7 [97] spp2 spp6 spp2 spp6 Is it possible to create a vector of counts for each spp1-spp10? Any help or ideas would be appreciated. Cheers, Jesse [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operating on each row of data frame
Thanks all for a very quick solution. It is actually good to know different ways to do the same things. It expands my limited understanding of R :). -A On Wed, Jan 13, 2010 at 5:12 PM, Stephan Kolassa stephan.kola...@gmx.dewrote: Hi, does this do what you want? d - cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd)) HTH, Stephan Abhishek Pratap schrieb: Hi All I have a data frame in which there are 4 columns . Column 1 : name Column 2-4 : values I would like to calculate mean/Standard error of values in column 2-4 and store them in column 5,6 respectively. I have done the following but doesn't seem to work mean_N_SE -function(x) { name - x[1] vals - c(x[2:4]) temp_mean - mean(vals) SE - sqrt(var(x)/length(x)) } apply(d,1,mean_N_SE) where d = data frame. Can someone help me with this. Thanks! -Abhi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of times a string appears
Jesse, see ?table and try table(stringVector) Greg On 1/13/10 2:12 PM, Jesse Sinclair wrote: Hi all, I have a vector of strings and need to count the number of times a string appears in the vector. eg: [1] spp6 spp10 spp6 spp6 spp4 spp2 spp9 spp10 spp5 spp2 spp2 spp3 [13] spp4 spp3 spp6 spp10 spp6 spp4 spp9 spp3 spp6 spp1 spp10 spp8 [25] spp2 spp10 spp9 spp7 spp1 spp3 spp8 spp6 spp3 spp8 spp6 spp5 [37] spp5 spp9 spp3 spp1 spp4 spp5 spp9 spp3 spp3 spp5 spp4 spp9 [49] spp3 spp7 spp7 spp2 spp6 spp5 spp7 spp4 spp8 spp9 spp2 spp6 [61] spp3 spp3 spp2 spp6 spp3 spp5 spp6 spp6 spp4 spp1 spp1 spp1 [73] spp10 spp8 spp1 spp6 spp1 spp5 spp8 spp9 spp5 spp6 spp9 spp10 [85] spp2 spp6 spp10 spp1 spp2 spp3 spp5 spp8 spp2 spp7 spp4 spp7 [97] spp2 spp6 spp2 spp6 Is it possible to create a vector of counts for each spp1-spp10? Any help or ideas would be appreciated. Cheers, Jesse [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Greg Hirson ghir...@ucdavis.edu Graduate Student Agricultural and Environmental Chemistry 1106 Robert Mondavi Institute North One Shields Avenue Davis, CA 95616 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of times a string appears
?table On 14/01/2010, at 11:12 AM, Jesse Sinclair wrote: Hi all, I have a vector of strings and need to count the number of times a string appears in the vector. eg: [1] spp6 spp10 spp6 spp6 spp4 spp2 spp9 spp10 spp5 spp2 spp2 spp3 [13] spp4 spp3 spp6 spp10 spp6 spp4 spp9 spp3 spp6 spp1 spp10 spp8 [25] spp2 spp10 spp9 spp7 spp1 spp3 spp8 spp6 spp3 spp8 spp6 spp5 [37] spp5 spp9 spp3 spp1 spp4 spp5 spp9 spp3 spp3 spp5 spp4 spp9 [49] spp3 spp7 spp7 spp2 spp6 spp5 spp7 spp4 spp8 spp9 spp2 spp6 [61] spp3 spp3 spp2 spp6 spp3 spp5 spp6 spp6 spp4 spp1 spp1 spp1 [73] spp10 spp8 spp1 spp6 spp1 spp5 spp8 spp9 spp5 spp6 spp9 spp10 [85] spp2 spp6 spp10 spp1 spp2 spp3 spp5 spp8 spp2 spp7 spp4 spp7 [97] spp2 spp6 spp2 spp6 Is it possible to create a vector of counts for each spp1-spp10? Any help or ideas would be appreciated. Cheers, Jesse [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advantages of using SQLite for data import in comparison to csv files
You could look at read.csv.sql in sqldf (http://sqldf.googlecode.com) as well. On Wed, Jan 13, 2010 at 2:00 PM, Juliet Jacobson julietjacob...@aim.com wrote: Hello everybody out there using R, I'm using R for the analysis of biological data and write the results down using LaTeX, both on a notebook with linux installed. I've already tried two options for the import of my data: 1. Import from a SQLite database 2. Import from individual csv files edited with sed, awk and sort. Both methods actually work very well, since I don't need advanced features like multi-user network access to the data. My data sets are tables with up to 20 columns and 1000 rows, containing mostly numerical values and strings. Moreover, I might also have to handle microarray data, but I'm not so sure about that yet. Moreover, I need to organise tags for a collection of photos, but this data is of course not analysed with R. I'm now beginning to work on a larger project and have to decide, whether it is better to use SQLite or csv-files for handling my data. I fear, it might get difficult to switch between the two system after having accumulated the data, adapted software for backups and revision control, written makefiles etc. Could anyone of you give me a hint on the additional benefits of importing data from a SQLite database to R to the simpler way of organising the data in csv files? Is it for example possible to select values from a column within a certain range from a csv file using awk? Thanks in advance, Juliet Jacobson __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging issue.........
Hi Karean, If your first object is called obj1 and the second called obj2, then: merge(obj1, obj2, all.x=TRUE) id trait1 trait2 1 1 10.29.8 2 2 11.1 10.8 3 39.7 NA 4 6 10.2 10.1 5 78.9 NA 6 109.7 NA 7 11 10.2 NA Hope this helps, Adrian On Wednesday 13 January 2010, karena wrote: hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 110.2 211.1 39.7 610.2 78.9 10 9.7 11 10.2 The second file is like the following: idtrait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 1210.2 1310.1 now I want to merge the two files by the variable id, I only want to keep the ids which show up in the first file. Even the id does not show up in the second file, it doesn't matter, I can keep the missing values. So my question is: how can I merge the two files and keep only the rows whose id show up in the first file? I know how to do it is SAS, just use the following code: merge data1(in=in1) data2(in=in2); by id; if in1; but I really have no idea about how to do it in R. thank you in advance, karean -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging issue.........
Did you consider to look at the help page for merge? h At 22:01 13.01.2010, karena wrote: hi, I have a question about merging two files. For example, I have two files, the first file is like the following: id trait1 110.2 211.1 39.7 610.2 78.9 10 9.7 11 10.2 The second file is like the following: idtrait2 1 9.8 2 10.8 4 7.8 5 9.8 6 10.1 1210.2 1310.1 now I want to merge the two files by the variable id, I only want to keep the ids which show up in the first file. Even the id does not show up in the second file, it doesn't matter, I can keep the missing values. So my question is: how can I merge the two files and keep only the rows whose id show up in the first file? I know how to do it is SAS, just use the following code: merge data1(in=in1) data2(in=in2); by id; if in1; but I really have no idea about how to do it in R. thank you in advance, karean -- View this message in context: http://n4.nabble.com/merging-issue-tp1013356p1013356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of times a string appears
Hi Jesse, If your vector is called aa, then how about: table(aa) aa spp1 spp10 spp2 spp3 spp4 spp5 spp6 spp7 spp8 spp9 7 216 815 9 910 915 Hope this helps, Adrian On Thursday 14 January 2010, Jesse Sinclair wrote: Hi all, I have a vector of strings and need to count the number of times a string appears in the vector. eg: [1] spp6 spp10 spp6 spp6 spp4 spp2 spp9 spp10 spp5 spp2 spp2 spp3 [13] spp4 spp3 spp6 spp10 spp6 spp4 spp9 spp3 spp6 spp1 spp10 spp8 [25] spp2 spp10 spp9 spp7 spp1 spp3 spp8 spp6 spp3 spp8 spp6 spp5 [37] spp5 spp9 spp3 spp1 spp4 spp5 spp9 spp3 spp3 spp5 spp4 spp9 [49] spp3 spp7 spp7 spp2 spp6 spp5 spp7 spp4 spp8 spp9 spp2 spp6 [61] spp3 spp3 spp2 spp6 spp3 spp5 spp6 spp6 spp4 spp1 spp1 spp1 [73] spp10 spp8 spp1 spp6 spp1 spp5 spp8 spp9 spp5 spp6 spp9 spp10 [85] spp2 spp6 spp10 spp1 spp2 spp3 spp5 spp8 spp2 spp7 spp4 spp7 [97] spp2 spp6 spp2 spp6 Is it possible to create a vector of counts for each spp1-spp10? Any help or ideas would be appreciated. Cheers, Jesse [[alternative HTML version deleted]] -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operating on each row of data frame
Hi, does this do what you want? d - cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd)) HTH, Stephan Abhishek Pratap schrieb: Hi All I have a data frame in which there are 4 columns . Column 1 : name Column 2-4 : values I would like to calculate mean/Standard error of values in column 2-4 and store them in column 5,6 respectively. I have done the following but doesn't seem to work mean_N_SE -function(x) { name - x[1] vals - c(x[2:4]) temp_mean - mean(vals) SE - sqrt(var(x)/length(x)) } apply(d,1,mean_N_SE) where d = data frame. Can someone help me with this. Thanks! -Abhi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a question about deleting rows
I have a file like this: idn1n2 n3 n4 n5 n6 1 3 47 8 102 2 4 12 4 3 10 3 7 00 0 0 8 4 1010 0 2 3 5 1110 0 0 5 what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete the row. how can I do that? thank you, karena -- View this message in context: http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of times a string appears
This is great all. It works perfectly. Thank-you. Cheers, Jesse On Wed, Jan 13, 2010 at 14:27, Adrian Dusa dusa.adr...@gmail.com wrote: Hi Jesse, If your vector is called aa, then how about: table(aa) aa spp1 spp10 spp2 spp3 spp4 spp5 spp6 spp7 spp8 spp9 7 216 815 9 910 915 Hope this helps, Adrian On Thursday 14 January 2010, Jesse Sinclair wrote: Hi all, I have a vector of strings and need to count the number of times a string appears in the vector. eg: [1] spp6 spp10 spp6 spp6 spp4 spp2 spp9 spp10 spp5 spp2 spp2 spp3 [13] spp4 spp3 spp6 spp10 spp6 spp4 spp9 spp3 spp6 spp1 spp10 spp8 [25] spp2 spp10 spp9 spp7 spp1 spp3 spp8 spp6 spp3 spp8 spp6 spp5 [37] spp5 spp9 spp3 spp1 spp4 spp5 spp9 spp3 spp3 spp5 spp4 spp9 [49] spp3 spp7 spp7 spp2 spp6 spp5 spp7 spp4 spp8 spp9 spp2 spp6 [61] spp3 spp3 spp2 spp6 spp3 spp5 spp6 spp6 spp4 spp1 spp1 spp1 [73] spp10 spp8 spp1 spp6 spp1 spp5 spp8 spp9 spp5 spp6 spp9 spp10 [85] spp2 spp6 spp10 spp1 spp2 spp3 spp5 spp8 spp2 spp7 spp4 spp7 [97] spp2 spp6 spp2 spp6 Is it possible to create a vector of counts for each spp1-spp10? Any help or ideas would be appreciated. Cheers, Jesse [[alternative HTML version deleted]] -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging issue.........
thank you very much! -- View this message in context: http://n4.nabble.com/merging-issue-tp1013356p1013433.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
Greg - thanks for posting this interesting problem. Albyn - thanks for posting a solution. Now, I have some questions: (1) is the algorithm guaranteed to find a best solution? (2) can there be multiple solutions (it seems like there can be more than 1 solution depending on the data)?, and (3) is there a good reference for this and similar algorithms? Thanks Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Albyn Jones Sent: Wednesday, January 13, 2010 1:19 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge The key idea is that you are building a matrix that contains the solutions to smaller problems which are sub-problems of the big problem. The first row of the matrix SSQ contains the solution for no splits, ie SSQ[1,j] is just the sum of squares about the overall mean for reading chapters1 through j in one day. The iteration then uses row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j chapters in m-1 days) is part of the overall optimal solution, you have already computed it, and so don't ever need to recompute it. TS = SSQ[m-1,j]+(SSQ1[j+1]) computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to n in 1 day. j is a vector in the function, and min(TS) is the minimum over choices of j, ie SSQ[m,n]. At the end, SSQ[128,239] is the optimal value for reading all 239 chapters in 128 days. That's just the objective function, so the rest involves constructing the list of optimal cuts, ie which chapters are grouped together for each day's reading. That code uses the same idea... constructing a list of lists of cutpoints. statisticians should study a bit of data structures and algorithms! albyn On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic programming. It runs pretty quickly, and returns (what I hope is :-) the optimal sum of squares and the cut-points. function(X=bom3$Verses,days=128){ # find optimal BOM reading schedule for Greg Snow # minimize variance of quantity to read per day over 128 days # N = length(X) Nm1 = N-1 SSQ- matrix(NA,nrow=days,ncol=N) Cuts - list() # # SSQ[i,j]: the ssqs about the overall mean for the optimal partition # for i days on the chapters 1 to j # M = sum(X)/days CS = cumsum(X) SSQ[1,]= (CS-M)^2 Cuts[[1]]= as.list(1:N) # for(m in 2:days){ Cuts[[m]]=list() #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]] for(n in m:N){ CS = cumsum(X[n:1])[n:1] SSQ1 = (CS-M)^2 j = (m-1):(n-1) TS = SSQ[m-1,j]+(SSQ1[j+1]) SSQ[m,n] = min(TS) k = min(which((min(TS)== TS)))+m-1 Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n) } } list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]]) } $SSQ [1] 11241.05 $Cuts [1] 2 4 7 9 11 13 15 16 17 19 21 23 25 27 30 31 34 37 [19] 39 41 44 46 48 50 53 56 59 60 62 64 66 68 70 73 75 77 [37] 78 80 82 84 86 88 89 91 92 94 95 96 97 99 100 103 105 106 [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135 137 138 [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163 164 166 [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194 196 199 [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228 234 236 [127] 238 239 On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote: I have a challenge
[R] Updated comparison table for SAS-SPSS Add-ons and R Functions
Hi All, I have substantially expanded the table that compares SAS and SPSS add-on modules to somewhat equivalent R packages. This new version is at: http://r4stats.com/add-on-modules and I would very much appreciate any feedback you might have on it. The site http://r4stats.com is the replacement to http://RforSASandSPSSusers.com and includes the support files for both R for SAS and SPSS Users and the new R for Stata Users, due out in March from Springer. I'll phase the older site out eventually and change the URL to point to the new one. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Research Computing Support Voice: (865) 974-5230 Email: muenc...@utk.edu Web: http://oit.utk.edu/research, News: http://oit.utk.edu/research/news.php = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions
On Wed, Jan 13, 2010 at 11:53 PM, Muenchen, Robert A (Bob) muenc...@utk.edu wrote: Hi All, I have substantially expanded the table that compares SAS and SPSS add-on modules to somewhat equivalent R packages. This new version is at: http://r4stats.com/add-on-modules and I would very much appreciate any feedback you might have on it. The site http://r4stats.com is the replacement to http://RforSASandSPSSusers.com and includes the support files for both R for SAS and SPSS Users and the new R for Stata Users, due out in March from Springer. I'll phase the older site out eventually and change the URL to point to the new one. Maybe the first thing you should do is a global search and replace of 'SPSS' with 'PASW' http://www.spss.com/software/product-name-guide/ Barry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question about deleting rows
yourdataframe = subset(yourdataframe, !(n2==0 n3==0 n4==0 n5==0)) From: karena dr.jz...@gmail.com To:r-help@r-project.org Date: 14/Jan/2010 12:24 p.m. Subject: [R] a question about deleting rows I have a file like this: idn1n2 n3 n4 n5 n6 1 3 47 8 102 2 4 12 4 3 10 3 7 00 0 0 8 4 1010 0 2 3 5 1110 0 0 5 what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete the row. how can I do that? thank you, karena -- View this message in context: http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question about deleting rows
Try this: x id n1 n2 n3 n4 n5 n6 1 1 3 4 7 8 10 2 2 2 4 1 2 4 3 10 3 3 7 0 0 0 0 8 4 4 10 1 0 0 2 3 5 5 11 1 0 0 0 5 delete - with(x, n2 == 0 n3 == 0 n4 == 0 n5 == 0) delete [1] FALSE FALSE TRUE FALSE FALSE x[!delete,] id n1 n2 n3 n4 n5 n6 1 1 3 4 7 8 10 2 2 2 4 1 2 4 3 10 4 4 10 1 0 0 2 3 5 5 11 1 0 0 0 5 On Wed, Jan 13, 2010 at 5:15 PM, karena dr.jz...@gmail.com wrote: I have a file like this: idn1n2 n3 n4 n5 n6 1 3 47 8 102 2 4 12 4 3 10 3 7 00 0 0 8 4 1010 0 2 3 5 1110 0 0 5 what I want to do is: only if n2=0 and n3=0 and n4=0 and n5=0 then delete the row. how can I do that? thank you, karena -- View this message in context: http://n4.nabble.com/a-question-about-deleting-rows-tp1013403p1013403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
FYI, in bioinformatics, we use dynamic programming algorithms in similar ways to solve similar problems of finding guaranteed-optimal partitions in streams of data (usually DNA or protein sequence, but sometimes numerical data from chip-arrays). These path optimization algorithms are often called Viterbi algorithms, a web search for which should provide multiple references. The solutions are not necessarily unique (there may be multiple paths/partitions with identical integer maxima in some systems) and there is much research on whether the optimal solution is actually the one you want to work with (for example, there may be a fair amount of probability mass within an area/ensemble of suboptimal solutions that overall have greater posterior probabilities than does the optimal solution singleton). See Chip Lawrence's PNAS paper for more erudite discussion, and references therein: www.pnas.org/content/105/9/3209.abstract -Aaron P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at Reed back in 1993, which started me down a somewhat windy road to statistical genomics! -- Aaron J. Mackey, PhD Assistant Professor Center for Public Health Genomics University of Virginia amac...@virginia.edu On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: Greg - thanks for posting this interesting problem. Albyn - thanks for posting a solution. Now, I have some questions: (1) is the algorithm guaranteed to find a best solution? (2) can there be multiple solutions (it seems like there can be more than 1 solution depending on the data)?, and (3) is there a good reference for this and similar algorithms? Thanks Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Albyn Jones Sent: Wednesday, January 13, 2010 1:19 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge The key idea is that you are building a matrix that contains the solutions to smaller problems which are sub-problems of the big problem. The first row of the matrix SSQ contains the solution for no splits, ie SSQ[1,j] is just the sum of squares about the overall mean for reading chapters1 through j in one day. The iteration then uses row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j chapters in m-1 days) is part of the overall optimal solution, you have already computed it, and so don't ever need to recompute it. TS = SSQ[m-1,j]+(SSQ1[j+1]) computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to n in 1 day. j is a vector in the function, and min(TS) is the minimum over choices of j, ie SSQ[m,n]. At the end, SSQ[128,239] is the optimal value for reading all 239 chapters in 128 days. That's just the objective function, so the rest involves constructing the list of optimal cuts, ie which chapters are grouped together for each day's reading. That code uses the same idea... constructing a list of lists of cutpoints. statisticians should study a bit of data structures and algorithms! albyn On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic programming. It runs pretty quickly, and returns (what I hope is :-) the optimal sum of squares and the cut-points. function(X=bom3$Verses,days=128){ # find optimal BOM reading schedule for Greg Snow # minimize variance of quantity to read per day over 128 days # N = length(X) Nm1 = N-1 SSQ-
Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions
From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On Behalf Of Barry Rowlingson Sent: Wednesday, January 13, 2010 7:03 PM To: Muenchen, Robert A (Bob) Cc: r-help@r-project.org Subject: Re: [R] Updated comparison table for SAS-SPSS Add-ons and R Functions Maybe the first thing you should do is a global search and replace of 'SPSS' with 'PASW' http://www.spss.com/software/product-name-guide/ Barry One of the things I updated was to *remove* the now-obsolete PASW! Since IBM bought the company, they did away with that and renamed things IBM SPSS See the list at: http://spss.com/software/statistics/ They still have some old web pages to clean up as you point out. Cheers, Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: object of type 'closure' is not subsettable
Hi everyone, Would somebody please explain (or point me to a reference that explains) the following error: Error: object of type 'closure' is not subsettable I was trying to use rep() to replicate a function: example_function - function() { return(TRUE) } rep(example_function, 3) Error: object of type 'closure' is not subsettable But I just cannot understand this error. I can combine functions using c without any problems: c(example_function, example_function) [[1]] function () { return(TRUE) } [[2]] function () { return(TRUE) } What am I doing wrong when I use rep()? Thanks in advance, Matthew Walker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package spam for R64-devel
Dear Uwe and all, First of all, I want to congratulate you for your dedication in providing and maintaining R for 64bit operating systems. I tried the 64bit version of R, under a windows server 2003 system. It seems to work properly, but am concerned since I need to use the package fields, which depends on the package spam, which seems to have a check error. I know 64bit versions of R and its packages are just starting to roll, but I wonder if there's a possibility of making the spam package working on 64bit R. From what I saw in the log file ( http://www.statistik.tu-dortmund.de/~ligges/CRAN/bin/windows64/contrib/r-devel/check/spam-check.log) it seems to be a problem with tests. Is it possible to run the R CMD check for the spam package with the --no-tests flag? By the way, the fields package was built using the --no-tests flag Many thanks for any help you might be able to provide, Julian Ramirez Research Assistant International Centre for Tropical Agriculture, CIAT Colombia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: object of type 'closure' is not subsettable
See ?rep where it says that the argument must be a vector. Try rep(list(sin), 3) On Wed, Jan 13, 2010 at 8:11 PM, Matthew Walker matthew.walke...@ulaval.ca wrote: Hi everyone, Would somebody please explain (or point me to a reference that explains) the following error: Error: object of type 'closure' is not subsettable I was trying to use rep() to replicate a function: example_function - function() { return(TRUE) } rep(example_function, 3) Error: object of type 'closure' is not subsettable But I just cannot understand this error. I can combine functions using c without any problems: c(example_function, example_function) [[1]] function () { return(TRUE) } [[2]] function () { return(TRUE) } What am I doing wrong when I use rep()? Thanks in advance, Matthew Walker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
Hi Aaron! It's always nice to see a former student doing well. Thanks for the notes and references, too! albyn On Wed, Jan 13, 2010 at 07:29:57PM -0500, Aaron Mackey wrote: FYI, in bioinformatics, we use dynamic programming algorithms in similar ways to solve similar problems of finding guaranteed-optimal partitions in streams of data (usually DNA or protein sequence, but sometimes numerical data from chip-arrays). These path optimization algorithms are often called Viterbi algorithms, a web search for which should provide multiple references. The solutions are not necessarily unique (there may be multiple paths/partitions with identical integer maxima in some systems) and there is much research on whether the optimal solution is actually the one you want to work with (for example, there may be a fair amount of probability mass within an area/ensemble of suboptimal solutions that overall have greater posterior probabilities than does the optimal solution singleton). See Chip Lawrence's PNAS paper for more erudite discussion, and references therein: www.pnas.org/content/105/9/3209.abstract -Aaron P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at Reed back in 1993, which started me down a somewhat windy road to statistical genomics! -- Aaron J. Mackey, PhD Assistant Professor Center for Public Health Genomics University of Virginia amac...@virginia.edu On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan rvarad...@jhmi.edu wrote: Greg - thanks for posting this interesting problem. Albyn - thanks for posting a solution. Now, I have some questions: (1) is the algorithm guaranteed to find a best solution? (2) can there be multiple solutions (it seems like there can be more than 1 solution depending on the data)?, and (3) is there a good reference for this and similar algorithms? Thanks Best, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Albyn Jones Sent: Wednesday, January 13, 2010 1:19 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge The key idea is that you are building a matrix that contains the solutions to smaller problems which are sub-problems of the big problem. The first row of the matrix SSQ contains the solution for no splits, ie SSQ[1,j] is just the sum of squares about the overall mean for reading chapters1 through j in one day. The iteration then uses row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j chapters in m-1 days) is part of the overall optimal solution, you have already computed it, and so don't ever need to recompute it. TS = SSQ[m-1,j]+(SSQ1[j+1]) computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to n in 1 day. j is a vector in the function, and min(TS) is the minimum over choices of j, ie SSQ[m,n]. At the end, SSQ[128,239] is the optimal value for reading all 239 chapters in 128 days. That's just the objective function, so the rest involves constructing the list of optimal cuts, ie which chapters are grouped together for each day's reading. That code uses the same idea... constructing a list of lists of cutpoints. statisticians should study a bit of data structures and algorithms! albyn On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: WOW, your results give about half the variance of my best optim run (possibly due to my suboptimal use of optim). Can you describe a little what the algorithm is doing? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Albyn Jones [mailto:jo...@reed.edu] Sent: Tuesday, January 12, 2010 5:31 PM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] optimization challenge Greg Nice problem: I wasted my whole day on it :-) I was explaining my plan for a solution to a colleague who is a computer scientist, he pointed out that I was trying to re-invent the wheel known as dynamic programming. here is my code, apparently it is called bottom up dynamic
Re: [R] Formula for normal distribution with know mean and standard error and n terms
steve_fried...@nps.gov wrote: I am searching for a method to calculate a normal distribution. For example this equation is used to calculate the normal curve when the mean and standard deviation are know. p(x) = (1/σ*sqrt(2π)) x exp (- (x-μ)2/2σ2) However, some of the literature I'm reading (I'm building an ecological niche model for vegetation along several ecological gradients) report the standard error instead and n sample size. Is there an equivalent formula ? If so, how can I also normalize the p(x) term to be within the 0-1 range? What you have there (p) is a density rather than the distribution. note that p(x) is NOT a probability, so it doesn't lie between 0 and 1 (integrals of p(x).dx are probabilities and do lie between 0 and 1) The function to compute p is dnorm. Try ?dnorm in R. if you're given the standard error of a mean (which I'll call se) and n, then sigma = sqrt(n)*se (because se = sigma/sqrt(n) ). If it's the standard error of something other than the mean you'll need to give more details. -- View this message in context: http://n4.nabble.com/Formula-for-normal-distribution-with-know-mean-and-standard-error-and-n-terms-tp1013280p1013552.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: Problems connecting with MySQL using odbcDriverConnect (RODBC package) on Linux
Thanks you solved and share with us. But, why don't you use the RMySQL, which connects to MySQL without the need of ODBC? Caveman On Wed, Jan 13, 2010 at 1:48 AM, Marcus, Jeffrey jeffrey.mar...@nuance.com wrote: I think I figured this out. I should not have put the Driver name in braces. Changing it from {MySQL} to MySQL seems to work. -Original Message- From: Marcus, Jeffrey Sent: Tuesday, January 12, 2010 6:09 PM To: 'r-help@r-project.org' Subject: Problems connecting with MySQL using odbcDriverConnect (RODBC package) on Linux I am sure I'm doing something wrong here but not sure what. Our system administrator recently installed UnixODBC and the MyODBC driver on a Linux box running Linux version 2.6 x86_64. I have an .odbc.ini file in my home directory with following lines: [mydb] Description = MySQL server on my-server Driver=/usr/lib64/libmyodbc3.so SERVER=my-server I can successfully do the following: library(RODBC) channel - odbcConnect(mydb) sqlQuery(channel, show databases) And in general, I have no problems using odbcConnect to connect to the mydb DSN. However, for various reasons I want to make a DSN-less connection using odbcDriverConnect. However, everything I've tried generated a data source not found message (see below for details) After reading through various documents, I tried doing following. (1) Put an odbcinst.ini file in my home directory with following lines [MySQL] Description = ODBC for MySQL Driver=/usr/lib64/libmyodbc3.so Setup = /usr/lib/libodbcmyS.so FileUsage = 1 (2) Install it with odbcinst -i -f. This seems to work as when I type odbcinst -j I get DRIVERS: /home/jmarcus/odbcinst.ini SYSTEM DATA SOURCES: /home/jmarcus/odbc.ini USER DATA SOURCES..: /home/jmarcus/.odbc.ini (2) Set the environment variable to point to this file: bash-3.2$ ODBCSYSINI=/home/jmarcus bash-3.2$ export ODBCSYSINI (3) Start R Note that R has inherited environment variable Sys.getenv(ODBCSYSINI) ODBCSYSINI /home/jmarcus (4) Try to connect to the MySQL server conn - odbcDriverConnect(connection=Driver={MySQL};Server=my-server;Database=m y_database;Uid=my_username;Pwd=my_password) This generates following: Warning messages: 1: In odbcDriverConnect(connection = Driver={MySQL};Server=my-server;Database=my_database;Uid=my_username;Pw d=my_password) : [RODBC] ERROR: state IM002, code 0, message [unixODBC][Driver Manager]Data source name not found, and no default driver specified 2: In odbcDriverConnect(connection = Driver={MySQL};Server=my-server;Database=my_database;Uid=my_username;Pw d=my_password) : ODBC connection failed Can anyone see what I'm doing wrong? Thanks. Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- OpenSource Software Consultant CENFOSS (www.cenfoss.co.mz) SP Tech (www.sptech.co.mz) email: orvaq...@cenfoss.co.mz cell: +258828810980 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installing RCurl when libcurl is in non-standard location
Hi, I'm struggling to install RCurl for 32-bit linux and am hoping for some suggestions. I obtained RCurl_1.3-1.tar.gz from CRAN today, and am using a very recent version of R: R version 2.10.1 Patched (2010-01-12 r50970). I'm not the sysadmin for this system (disclaimer: my sysadmin skills are not very good, I'm afraid). curl is available centrally on the system but it's a little old (7.12.3 - looks from some older r-help posts like this is too old for RCurl). Therefore I installed libcurl 7.19.7 in a non-standard location (because I'm not the sysadmin), and I think I'm pointing R towards this new libcurl OK, but I'm not 100% sure about that. The output of locate (see below) makes me a little suspicious, but the output of the R CMD INSTALL makes it seem like the new libcurl I installed IS being used. I've included various output below that I hope will help in figuring this out. Is there anything else that would be useful to know? I can also ask our sysadmin for help if that makes more sense than asking you all via r-help. Thanks very much in advance for any ideas, Janet Young --- [2] zork20:/home/jayoung uname -a Linux zork20 2.6.12-1.1381_FC3smp #1 SMP Fri Oct 21 04:03:26 EDT 2005 i686 athlon i386 GNU/Linux [3] zork20:/home/jayoung setenv MAKE gmake [4] zork20:/home/jayoung which gmake /usr/bin/gmake [5] zork20:/home/jayoung gmake -version GNU Make 3.80 Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [6] zork20:/home/jayoung which curl-config /home/jayoung/traskdata/bin_linux/curl-config [7] zork20:/home/jayoung curl-config --version libcurl 7.19.7 [8] zork20:/home/jayoung locate curl-config /usr/bin/curl-config /usr/share/man/man1/curl-config.1.gz [16] zork20:/home/jayoung /usr/bin/curl-config --version libcurl 7.12.3 [9] zork20:/home/jayoung locate libcurl /usr/lib/libcurl.so.3 /usr/lib/libcurl.so /usr/lib/libcurl.a /usr/lib/libcurl.so.3.0.0 /usr/share/man/man3/libcurl-multi.3.gz /usr/share/man/man3/libcurl-easy.3.gz /usr/share/man/man3/libcurl-errors.3.gz /usr/share/man/man3/libcurl-share.3.gz /usr/share/man/man3/libcurl-tutorial.3.gz /usr/share/man/man3/libcurl.3.gz [10] zork20:/home/jayoung ls ~/traskdata/lib_linux/libcu* /home/jayoung/traskdata/lib_linux/libcurl.a /home/jayoung/traskdata/lib_linux/libcurl.la* /home/jayoung/traskdata/lib_linux/libcurl.so@ /home/jayoung/traskdata/lib_linux/libcurl.so.3@ /home/jayoung/traskdata/lib_linux/libcurl.so.3.0.0* /home/jayoung/traskdata/lib_linux/libcurl.so.4@ /home/jayoung/traskdata/lib_linux/libcurl.so.4.0.0* /home/jayoung/traskdata/lib_linux/libcurl.so.4.1.1* [11] zork20:/home/jayoung printenv LD_LIBRARY_PATH /home/btrask/traskdata/lib_linux:/home/jayoung/traskdata/bin_linux/qt/ lib:/home/btrask/traskdata/lib_linux/R/library/RSPerl/libs:/home/ btrask/traskdata/lib_linux/R/lib [14] zork20:/home/jayoung/source_codes/R/other_packages R CMD INSTALL RCurl_1.3-1.tar.gz --configure-args='--libdir=/home/btrask/traskdata/ lib_linux --includedir=/home/btrask/traskdata/include' * installing to library ‘/home/btrask/traskdata/lib_linux/R/library’ * installing *source* package ‘RCurl’ ... checking for curl-config... /home/jayoung/traskdata/bin_linux/curl- config checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ANSI C... none needed checking how to run the C preprocessor... gcc -E Version has a libidn field Version has CURLOPT_URL Version has CURLINFO_EFFECTIVE_URL Version has CURLINFO_RESPONSE_CODE Version has CURLINFO_TOTAL_TIME Version has CURLINFO_NAMELOOKUP_TIME Version has CURLINFO_CONNECT_TIME Version has CURLINFO_PRETRANSFER_TIME Version has CURLINFO_SIZE_UPLOAD Version has CURLINFO_SIZE_DOWNLOAD Version has CURLINFO_SPEED_DOWNLOAD Version has CURLINFO_SPEED_UPLOAD Version has CURLINFO_HEADER_SIZE Version has CURLINFO_REQUEST_SIZE Version has CURLINFO_SSL_VERIFYRESULT Version has CURLINFO_FILETIME Version has CURLINFO_CONTENT_LENGTH_DOWNLOAD Version has CURLINFO_CONTENT_LENGTH_UPLOAD Version has CURLINFO_STARTTRANSFER_TIME Version has CURLINFO_CONTENT_TYPE Version has CURLINFO_REDIRECT_TIME Version has CURLINFO_REDIRECT_COUNT Version has CURLINFO_PRIVATE Version has CURLINFO_HTTP_CONNECTCODE Version has CURLINFO_HTTPAUTH_AVAIL Version has CURLINFO_PROXYAUTH_AVAIL Version has CURLINFO_OS_ERRNO Version has CURLINFO_NUM_CONNECTS Version has CURLINFO_SSL_ENGINES No CURLINFO_COOKIELIST enumeration value. No CURLINFO_LASTSOCKET enumeration value. No CURLINFO_FTP_ENTRY_PATH enumeration value. No CURLINFO_REDIRECT_URL enumeration
Re: [R] apply a function down each column
Thank you very much! It works now perfectly. I even extended it to be able to apply it to the whole dataset: data-read.delim(mhc_data.txt, stringsAsFactors=FALSE) lettermatch - function(a, b) { tb - merge(as.data.frame(table(strsplit(a, ))), as.data.frame(table(strsplit(b, ))), by=Var1) sum(apply(tb[-1], 1, min)) } output-matrix(ncol=(ncol(data)-1),nrow=nrow(data)/2) sim-rep(0, nrow(data)/2) for (y in 2:(ncol(data))) { for (x in 1:(nrow(data)/2)) { a - data[(2*x-1),y] # odd rows b - data[(2*x),y]# even rows sim[x]-(lettermatch(a,b)) } output[,y-1]-sim } colnames(output)-c(names(data[2:length(names(data))])) rownames(output)-c(1:(nrow(data)/2)) output Laetitia Am 12.01.2010 um 18:31 schrieb Peter Ehlers: Laetitia, I was just responding to your comment that R complains about a syntax error. But I realize now that 2x would probably cause an unexpected symbol error. Here's what I get when I run your loop; what do you get? for (x in 1:(nrow(dat)-1)) { + a - as.character(dat[(2x-1),1]) Error: unexpected symbol in: for (x in 1:(nrow(dat)-1)) { a - as.character(dat[(2x b - as.character(dat[(2x),1]) Error: unexpected symbol in b - as.character(dat[(2x lettermatch(a,b) Error in strsplit(a, ) : object 'a' not found } Error: unexpected '}' in } and here's what I get when I fix the obvious syntax error: for (x in 1:(nrow(dat)-1)) { + a - as.character(dat[(2*x-1),1]) + b - as.character(dat[(2*x),1]) + lettermatch(a,b) + } Error in fix.by(by.x, x) : 'by' must specify valid column(s) That leaves two problems: 1) you're looking at the wrong column in dat[,1]; that should be dat[,2], etc. 2) that error message indicates that your index variable (x) gets to invalid values. Try this: for (x in 1:(nrow(dat)/2)) { a - dat[(2*x-1),2] # odd rows b - dat[(2*x),2]# even rows print(lettermatch(a,b)) } You don't need the as.character() if you have character data. Always do a str(dat) before you do any analysis. -Peter Ehlers Laetitia Schmid wrote: Dear Peter, thank you for the suggestion. Unfortunately the star did not help. Did it work for you? For me it seems incomplete somehow. Laetitia From: Peter Ehlers [ehl...@ucalgary.ca] Sent: Tuesday, January 12, 2010 09:54 AM To: Laetitia Schmid Cc: Steve Lianoglou; r-help@r-project.org Subject: Re: [R] apply a function down each column See inline below. Laetitia Schmid wrote: Dear Steve, my solution looks like it would work, but it does not. I attached a text file with an extract of my data. Maybe you can try it yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,, for each column. I do not really know what the problem is. R complains about a syntax error. The function I am applying counts the common strings between the two. Greg Hirson helped me to write it. lettermatch - function(a, b) { tb - merge(as.data.frame(table(strsplit(a, ))), as.data.frame(table(strsplit(b, ))), by=Var1) sum(apply(tb[-1], 1, min)) } For example for the second column I tried: for (x in 1:(nrow(dat)-1)) { a - as.character(dat[(2x-1),1]) Shouldn't that be 2*x-1?? -Peter Ehlers b - as.character(dat[(2x),1]) lettermatch(a,b) } or a - as.character(dat[seq(1, nrow(dat), by=2),2]) b - as.character(dat[seq(2, nrow(dat), by=2), 2]) all.results - lettermatch(a,b) With dat-read.delim(data_lgs.txt,stringsAsFactors=FALSE) I can leave the as.character away in the formula above. Laetitia IndividualsSeq1Seq2Seq3Seq4 C1AATTCCGGCTTT M1 C2AATTCCGGCTTT M2AGGGAACTCCGGCGTT C3AGGGAACTCCGGCGTT M3AGGGAACTCCGGCGTT C4AATTCCGGCCTT M4AAATCGGGCTTT C5AGGGACTTCCCGCTTT M5AGGGCTTTCCTT C6AGGGCTTTCCTT M6AAAGCCTTCTTT C7AAAGACCCCCCGGTTT M7AAGGAACCCCGG C8AATTCCGGCCTT M8AATTCCGGCCTT C9 M9 C11AGGGAAACCGGGGGTT M11AATTCCGGCCTT Am 11.01.2010 um 15:18 schrieb Steve Lianoglou: Hi, On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid laeti...@gmt.su.se wrote: Hello World, I have a function that makes pairwise comparisons between two strings. I would like to apply this function to my data (which consists of columns with different strings) in the way that it compares the first with the second entry, and then the third with the fourth, and then the fifth with the sixth, and so on down each column... So (2x-1) and (2x) would be the different entries to be compared! dat= my data: for the first column: compare dat[(2x-1),1] with
Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?
thank Max. you are so responsible, every time, you give me a lot of help. On my learning road, you are my guide, though we do not know each other. best wishes kevin å¨2010-01-14ï¼Max Kuhn [via R] ml-node+1013265-480375...@n4.nabble.com åéï¼ -åå§é®ä»¶- å件人:Max Kuhn [via R] ml-node+1013265-480375...@n4.nabble.com åéæ¶é´:2010å¹´1æ14æ¥ ææå æ¶ä»¶äºº:bbslover dlu...@yeah.net 主é¢:Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation? In caret, see ?trainControl. Use returnResamp = all Max On Wed, Jan 13, 2010 at 9:47 AM, bbslover [hidden email] wrote: Hello, I am learning randomForest, now I want to boxplot mse and mtry using 20 5-fold cross-validation(using median value), but I have no a good method to do it, except a not good method. randomforest package itself did not contain cross-validating method, and caret package contain cross validation method, but how can I get the the all number of mtry , at the same time corresponding mse? -- View this message in context:http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ [hidden email]mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013265.html To unsubscribe from Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?,click here. -- View this message in context: http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013515.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bootstrap for correlation coefficient
I have the following code: ## to check correlation between the simulated uniform data x2 - uni[,1] ; x2[1:10] y2 - uni[,2] ; y2[1:10] result2 - boot(cbind(x2,y2), f, 20) # get 95% confidence interval boot.ci(result2, type=bca) cor.test(x2,y2, method=pearson, conf.level=0.95) part of my data: x2 - uni[,1] ; x2[1:10] [1] 0.63933145 0.71677785 0.02181925 0.15913391 0.61021930 0.72878176 0.22237891 0.28178186 0.75503612 0.54928692 y2 - uni[,2] ; y2[1:10] [1] 0.65754240 0.49263876 0.01352257 0.19195681 0.65759797 0.89813660 0.24582441 0.12900017 0.78982501 0.68676534 ## Result result2 - boot(cbind(x2,y2), f, 20) result2 ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = cbind(x2, y2), statistic = f, R = 20) Bootstrap Statistics : original bias std. error t1* 0.891797 -0.005272889 0.01198383 Not sure about this: boot.ci(result2, type=bca) Error in bca.ci(boot.out, conf, index[1], L = L, t = t.o, t0 = t0.o, h = h, : estimated adjustment 'a' is NA cor.test(x2,y2, method=pearson, conf.level=0.95) Pearson's product-moment correlation data: x2 and y2 t = 51.7391, df = 689, p-value 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.8754420 0.9061121 sample estimates: cor 0.891797 My question is when I want to find the confidence interval why it gives me such message? How do I get the p-value from the bootstrap? Thank you so much [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.