Re: [R] Package wavelets
Hi, In the decomposition of the dwt When I generate the out their levels goes of the 0 to 15 in the decompositions And i like to known how i do to visualise In the out the most concern levels for me for exemple levels 7 to 14. I like to can say what levels I want visualise. Is it possible in the dwt? Marize Simões -- View this message in context: http://r.789695.n4.nabble.com/Package-wavelets-tp2526023p2526505.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Query regarding Windows based statistical software development using R as programming language
Hi, I am a beginner in R. I have a query as below: Is it possible to develop a Windows based statistical software (user-friendly) like SPSS using R as a programming language? Otherwise, is it possible to use R code directly (no command-line execution) in Windows based programming language such as Visual Basic? Please help me, if possible, with some link to study materials related to such topic. -- Thanks Regards, Soumen Pal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate integers from uniform distribution with fixed mean
Sorry I forgot to talk about the range. But as an example, range (17,23) works. In your codes, mean is not exactly 20 and the samples are not integer. However, what I want is integers with mean 20 exactly. Any tips? Thanks On Thu, Sep 2, 2010 at 12:16 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Thu, Sep 2, 2010 at 7:17 AM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, runif (n,min,max) is the typical code for generate R.V from uniform dist. But what if we need to fix the mean as 20, and we want the values to be integers only? It's not clear what you want. Uniformly random integers with expected mean 20 - but what range? Any range centred on 20 will work, for example you could use sample() with replacement. To see the distribution, use sample() table(sample(17:23,1,TRUE)) which gives a uniform distribution of integers from 17 to 23, so the mean is 20.0057 for 1 samples. Is that what you want? Barry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate integers from uniform distribution with fixed mean
On Sat, Sep 4, 2010 at 8:07 AM, Yi liuyi.fe...@gmail.com wrote: Sorry I forgot to talk about the range. But as an example, range (17,23) works. In your codes, mean is not exactly 20 and the samples are not integer. The samples *are* integers. sample(17:23,1,TRUE) returns integers. However, what I want is integers with mean 20 exactly. Any tips? Well, something will have to go. You can't have a random uniform sample of integers within a given range and have an exact mean every time. Suppose your range was -1 to 1, so possible values -1,0,1, and you want integer mean 0. The only way to do that is to have equal numbers of -1s and +1s in your sample, and the number of zeros is irrelevant - you could have 5000 zeroes and the mean would still be 0 if you had 25 -1s and 25 +1s - thats clearly not a uniform distribution, and you'll have to impose certain conditions if that's what you want. By extension, as long as you have an odd number of integers in your sample and you want the mean to be the median value (so in the 17:23 example, mean of 20) it is sufficient to generate the same number of 17s as 23s, the same number of 18s as 22s, the same number of 19s as 21s, and as many 20s as you like. Not exactly sure of the maths for non-median means, you'd have to pick fewer values on one side to cancel out the extra weight on the other. But given that this 'distribution' is going to be weird in many ways, perhaps you should answer the question: Why? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate integers from uniform distribution with
There is still ambiguity (and I think some misunderstanding) in your query! First, Barry's code does yield integers as the values in the sample. As a smaller illustrative example: x - sample(17:23,20,TRUE) will give results like x # [1] 21 17 23 21 17 17 19 18 17 17 17 22 20 23 20 20 18 20 19 20 which are all integers. Secondly, in general, the mean of the sampled numbers will not be 20 exactly, even though their *expected* mean is 20: mean(x) # [1] 19.3 Barry gave an example of a sample size so large that the mean would very probably be extremely close to 20 (20.0057 when he did it). This will of course vary from sample to sample: mean(sample(17:23,1,TRUE)) # [1] 19.9991 mean(sample(17:23,1,TRUE)) # [1] 20.031 mean(sample(17:23,1,TRUE)) # [1] 20.0207 mean(sample(17:23,1,TRUE)) # [1] 19.9819 You say: However, what I want is integers with mean 20 exactly. This is ambiguous. On the one hand, Barry's procedure samples integers from (17,18,19,20,21,22,23) with equal probability, a distribution which has mean exactly 20 *as the distribution which is being sampled from*, although the mean of the values in any particular sample will very probably not be exactly 20. So, in that sense, Barry's procedure does give you a *method* of sampling integers which has mean 20 exactly. On the other hand, a possible interpretation of what you say is that you want every sample to be such that, after you have obtained the sample (say 'x'), then mean(x) = 20 exactly (as opposed to what you will get from Barry's code, where the mean will be close to, but almost never equal to, 20). If that is what you want, then it is more tricky to acieve. You are then effectively sampling from the conditional distribution: X1, X2, ... , Xn uniformly distributed on (17:23) conditional on X1 + X2 + ... Xn = 20*n. This can be done, but before working out how to do it one would need to be assured that this really is what you mean! Ted. On 04-Sep-10 07:07:41, Yi wrote: Sorry I forgot to talk about the range. But as an example, range (17,23) works. In your codes, mean is not exactly 20 and the samples are not integer. However, what I want is integers with mean 20 exactly. Any tips? Thanks On Thu, Sep 2, 2010 at 12:16 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Thu, Sep 2, 2010 at 7:17 AM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, runif (n,min,max) is the typical code for generate R.V from uniform dist. But what if we need to fix the mean as 20, and we want the values to be integers only? It's not clear what you want. Uniformly random integers with expected mean 20 - but what range? Any range centred on 20 will work, for example you could use sample() with replacement. To see the distribution, use sample() table(sample(17:23,1,TRUE)) which gives a uniform distribution of integers from 17 to 23, so the mean is 20.0057 for 1 samples. Is that what you want? Barry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Sep-10 Time: 08:56:41 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function try and Results of a program
Hello, users. Dear users, ***I have a function f to simulate data from a model (example below used only to show my problems) f-function(n,mean1){ a-matrix(rnorm(n, mean1 , sd = 1),ncol=5) b-matrix(runif(n),ncol=5) data-rbind(a,b) out-data out} *I want to simulate 1000 datasets (here only 5) so I use S-list() for (i in 1:5){ S[[i]]-f(n=10,mean1=0)} **I have a very complicated function for estimation of a model which I want to apply to Each one of the above simulated datasets fun-function(data){data-as.matrix(data) sink(' Example.txt',append=TRUE) cat(\n***\nEstimation \n\nDataset Sim : , i ) d-data%*%t(data) s-solve(d) print(s) out-list (s,d) out } results-list() for(i in 1:5){ tmp - try(fun(data=S[[i]])) results[[i]] - ifelse(is(tmp,try-error),NA,tmp) } My problem is that results have only the 1st element of the result lists of fun (i.e. only although tmp gives me both s and d. Thanks Evgenia -- View this message in context: http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526621.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R program google search
Hi there One way to use Google's search service from R is libary(RCurl) library(RJSONIO) # or library(rjson) val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google search AJAX , v = 1.0) results = fromJSONIO(val) Google requests that you provide your GoogleAPI key val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google search AJAX , v = 1.0, k= my google api key) Similarly, you should provide header information to identify your application, e.g xx = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google search AJAX , v = 1.0, .opts = list(useragen = RGoogleSearch, verbose = TRUE)) D. On 9/3/10 10:33 PM, Waverley @ Palo Alto wrote: My question is how to use R to program google search. I found this information: The SOAP Search API was created for developers and researchers interested in using Google Search as a resource in their applications. Unfortunately google no longer supports that. They are supporting the AJAX Search API. What about R? Thanks. On Fri, Sep 3, 2010 at 2:23 PM, Waverley @ Palo Alto waverley.paloa...@gmail.com wrote: Hi, Can someone help as how to use R to program google search in the R code? I know that other languages can allow or have the google search API If someone can give me some links or sample code I would greatly appreciate. Thanks. -- Waverley @ Palo Alto __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] return from .Call()
Hi, I have a .Call in my R function in a loop that repeats a certain number of times. Each time, the .Call returns a list. So, when I say something like, y-func() would y be a list of lists?(as many as the number of loops?) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] limit on read.socket?
Hi, I have the following piece of code, repeat{ ss-read.socket(sockfd); if(ss==) break output-paste(output,ss) } but somehow, output is not receiving all the data that is coming through the socket.My suspicion is on the if statement. what happens if a white space occurs in between the string arriving over the socket? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to free memory? (gc() doesn't work for me)
Seems to work for me: x - matrix(0,1,1) object.size(x) 80112 bytes gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells174104 4.7 741108 19.8741108 19.8 Vcells 101761938 776.4 113632405 867.0 102762450 784.1 rm(x) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 174202 4.7 741108 19.8741108 19.8 Vcells 1761954 13.5 90905923 693.6 102762450 784.1 On Sat, Sep 4, 2010 at 12:46 AM, Hyunchul Kim hyunchul.kim@gmail.com wrote: Hi, all I have a huge object that use almost all of available memory. R rm(a_huge_object) R gc() doesn't free memory and ?gc doesn't show anything. Are there any suggestion? Thanks in advance, Regards, Hyunchul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Levels in returned data.frame after subset
Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] return from .Call()
On Sat, Sep 4, 2010 at 10:17 AM, raje...@cse.iitm.ac.in raje...@cse.iitm.ac.in wrote: Hi, I have a .Call in my R function in a loop that repeats a certain number of times. Each time, the .Call returns a list. So, when I say something like, y-func() would y be a list of lists?(as many as the number of loops?) No, it'll be the last thing evaluated or the result of a return() call. Why haven't you tried this? Try a simple example: func = function(){ for(i in 1:10){ z=list(a=1,b=2) } } and see what comes back. My suspicion is its not going to be a list of lists. If you want a list of lists then you'll have to put the list together yourself from the returns of the .Call, something like (not tested, but looks okay): func = function(){ ret=list() for(i in 1:10){ ret[[i]]=list(i,i*2,i*3) } return(ret) } Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tail.matrix returns matrix, while tail.mts return vector
Hi I have a few problems with tail/head when applied to multiple time series. I'm not sure as whether I did not understand the function or whether it correspond to an unexpected behavior. When head(a,n) is applied on data.frame or matrix, it returns a data-frame or matrix with first n obs of *each* variable. When applied to a mts object, it returns first n obs of *first* variable only, not of all... The same for tail(). See: head(freeny) ###mts object head(EuStockMarkets) #is equivalent to: head(EuStockMarkets[,1]) I guess it comes from absence of a head method for mts. Does it seem reasonable to have also a head.mts or did I misunderstand something? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in returned data.frame after subset
Hi Ulrik On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo ulrik.ster...@gmail.com wrote: Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Only that this issue has come up many times before, and that this list is archived and searchable. Try RSiteSearch(subset drop levels, restrict = c(Rhelp10, Rhelp08, Rhelp02)) -Ista Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tail.matrix returns matrix, while tail.mts return vector
Hi Mat, You might be able to use the matrix method to get what you want. head.matrix(EuStockMarkets) -Ista On Sat, Sep 4, 2010 at 1:15 PM, mat matthieu.stig...@gmail.com wrote: Hi I have a few problems with tail/head when applied to multiple time series. I'm not sure as whether I did not understand the function or whether it correspond to an unexpected behavior. When head(a,n) is applied on data.frame or matrix, it returns a data-frame or matrix with first n obs of *each* variable. When applied to a mts object, it returns first n obs of *first* variable only, not of all... The same for tail(). See: head(freeny) ###mts object head(EuStockMarkets) #is equivalent to: head(EuStockMarkets[,1]) I guess it comes from absence of a head method for mts. Does it seem reasonable to have also a head.mts or did I misunderstand something? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Luis Miguel Delgado Gomez/BBK está ausente d e la oficina.
Estaré ausente de la oficina desde el 03/09/2010 y no volveré hasta el 11/10/2010. Responderé a su mensaje cuando regrese. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A basic question in model/formula specification
Hi, I am currently trying to fit a multinomial logit model on my data. I have tried to search for some example, and this is the one that I followed and worked. http://www.ats.ucla.edu/stat/r/dae/mlogit.htm However, I am having difficulties finding out the meaning of the model specified in the following line: mlogit.model- mlogit(brand~1|female+age, data = mldata, reflevel=1) The main issue is the |. I found out that it means multi-part formula but I have no idea what it means mathematically in this particular case. Can anyone enlighten me? Many thanks -- View this message in context: http://r.789695.n4.nabble.com/A-basic-question-in-model-formula-specification-tp2526765p2526765.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function try and Results of a program
On Sep 4, 2010, at 6:10 AM, Evgenia wrote: Hello, users. Dear users, ***I have a function f to simulate data from a model (example below used only to show my problems) f-function(n,mean1){ a-matrix(rnorm(n, mean1 , sd = 1),ncol=5) b-matrix(runif(n),ncol=5) data-rbind(a,b) out-data out} *I want to simulate 1000 datasets (here only 5) so I use S-list() for (i in 1:5){ S[[i]]-f(n=10,mean1=0)} **I have a very complicated function for estimation of a model which I want to apply to Each one of the above simulated datasets fun-function(data){data-as.matrix(data) sink(' Example.txt',append=TRUE) cat(\n***\nEstimation \n\nDataset Sim : , i ) d-data%*%t(data) s-solve(d) print(s) out-list (s,d) out } results-list() for(i in 1:5){ tmp - try(fun(data=S[[i]])) results[[i]] - ifelse(is(tmp,try-error),NA,tmp) } My problem is that results have only the 1st element of the result lists of fun (i.e. only although tmp gives me both s and d. Two problems: One: is the misguided use of unmatched sink calls resulting in an accumulation of diversions of the R output. If your run that at the console you need to type sink() five times to get any response back from the console. Two: the misguided use of ifelse when you should be using if () {}else{} to test a single condition and execute conditional assignment. ifelse if for working with vectors, not with lists. Suggestions: use the append = TRUE parameter to sink and unsink at the end of that function I'm not sure about how you are using the test for error but since you did not construct any errors I cannot really be too sure. If it is working for you then use this instead: if (is(tmp,try-error) ){results[[i]] - NA} else{results[[i]] - tmp} -- David. Thanks Evgenia -- View this message in context: http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526621.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decision Tree in Python or C++?
Have anybody used Decision Tree in Python or C++? (or written their own decision tree implementation in Python or C++)? My goal is to run decision tree on 8 million obs as training set and score 7 million in test set. I am testing 'rpart' package on a 64-bit-Linux + 64-bit-R environment. But it seems that rpart is either not stable or running out of memory very quickly. (Is it because R is passing everything as copy instead of as object reference?) Any idea would be greatly appreciated! Have a nice weekend! -- View this message in context: http://r.789695.n4.nabble.com/Decision-Tree-in-Python-or-C-tp2526810p2526810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function try and Results of a program
David, your suggestion about try works perfect for me. I still have a problem with sink. Could you explain me better your suggestion? Thanks alot Evgenia -- View this message in context: http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526822.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function try and Results of a program
On Sep 4, 2010, at 12:41 PM, Evgenia wrote: David, your suggestion about try works perfect for me. I still have a problem with sink. Could you explain me better your suggestion? When you sink to a file, you will continue sending console output to that file until you issue sink(). And every time you do it it creates an extra layer of redirection (the help page calls these diversions) that will need to be undone to get back to regular console behavior. ?sink # yes, one needs to R~all~TM If you wanted a record of what that function was doing you would need to: a) initialize the file with append=FALSE outside the loop (not sure if you need to do that, but it does help to get rid of earlier failed efforts as well b) open the sink file with append=TRUE inside the function c) cat() the two matrices separately since lists cannot be cat()- ted,,, and d)unsink with sink() at the end of the function. sink(example.txt, append=FALSE); cat(\n ); sink() #blank line to initialize fun-function(data){ data-as.matrix(data) sink(example.txt, append=TRUE); cat(\nEstimate : , i, \n ) d-data%*%t(data); cat(d= \n,d, \n) s-solve(d); cat(s= \n,s, \n) out-list(s=s,d=d); sink() return(out) } View this message in context: http://r.789695.n4.nabble.com/Function-try-and-Results-of-a-program-tp2526621p2526822.html Sent from the R help mailing list archive at Nabble.com. -- David Winsemius, MD West Hartford, CT # An unfortunate effect of Nabble use is that it leads one to believe that the entire world sees your earlier postings: #- f-function(n,mean1){ a-matrix(rnorm(n, mean1 , sd = 1),ncol=5) b-matrix(runif(n),ncol=5) data-rbind(a,b) out-data out} *I want to simulate 1000 datasets (here only 5) so I use S-list() for (i in 1:5){ S[[i]]-f(n=10,mean1=0)} **I have a very complicated function for estimation of a model which I want to apply to Each one of the above simulated datasets fun-function(data){data-as.matrix(data) sink(' Example.txt',append=TRUE) cat(\n***\nEstimation \n\nDataset Sim : , i ) d-data%*%t(data) s-solve(d) print(s) out-list (s,d) out } results-list() for(i in 1:5){ tmp - try(fun(data=S[[i]])) results[[i]] - ifelse(is(tmp,try-error),NA,tmp) } My problem is that results have only the 1st element of the result lists of fun (i.e. only although tmp gives me both s and d. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What solve() does?
On Wed, Sep 1, 2010 at 5:36 AM, Petar Milin pmi...@ff.uns.ac.rs wrote: Hello! Can anyone explain me what solve() function does: Gaussian elimination or iterative, numeric solve? In addition, I would need both the Gaussian elimination and iterative solution for the course. Are the two built in R? Thanks! PM Hello, Petar: I think you are assuming that solve uses an elementary linear algebra paper and pencil procedure, but I don't think it does. In a digital computer, those things are not precise, and I think the folks here will even say you shouldn't use solve to get an inverse, but I can't remember all of the details. To see how solve works ... Let me show you a trick I just learned. Read ?solve notice it is a generic method, meaning it does not actually do the calculations for you. Rather, there are specific implementations for different types of cases. To find the implementations, run methods(solve) I get: methods(solve) [1] solve.default solve.qr Then if you want to read HOW solve does what it does (which I think was your question), run this: solve.default or solve.qr In that code, you will see the chosen procedure depends on the linear algebra libraries you make available. I'm no expert on the details, but it appears QR decomposition is the preferred method. You can read about that online or in numerical algebra books. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please explain do.call in this context, or critique to stack this list faster
I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most elegant way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( rbind, mylist) but it does not work to do sapply ( mylist, rbind). Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnson pauljohn at ku.edu ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1 - data.frame(x=rnorm(100),y=rnorm(100)) df2 - data.frame(x=rnorm(100),y=rnorm(100)) df3 - data.frame(x=rnorm(100),y=rnorm(100)) df4 - data.frame(x=rnorm(100),y=rnorm(100)) mylist - list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF - mylist[[1]] for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## complete function in the mice package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m - 4 nr - nrow(df1) nc - ncol(df1) dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is helpful: ## http://stackoverflow.com/questions/tagged/r ## I started to type in the question and 3 plausible answers ## popped up before I could finish. ## The terse answer is: shortAnswer - do.call(rbind,mylist) ## That's the right answer, see: shortAnswer == dataComplete ## But I don't understand why it works. ## More importantly, I don't know if it is fastest, or best. ## It is certainly less error prone than dataComplete ## First, make a bigger test case and use system.time to evaluate phony - function(i){ data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000)) } mylist - lapply(1:1000, phony) ### First, try the terse way system.time( shortAnswer - do.call(rbind, mylist) ) ### Second, try the complete way: m - 1000 nr - nrow(df1) nc - ncol(df1) system.time( dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ) system.time( for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ) ## On my Thinkpad T62 dual core, the shortAnswer approach takes about ## three times as long: ## system.time( bestAnswer - do.call(rbind,mylist) ) ##user system elapsed ## 14.270 1.170 15.433 ## system.time( ## +dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ## + ) ##user system elapsed ## 0.000 0.000 0.006 ## system.time( ## + for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## + ) ##user system elapsed ## 4.940 0.050 4.989 ## That makes the do.call way look slow, and I said hey, ## our stupid for loop at the beginning may not be so bad. ## Wrong. It is a disaster. Check this out: ## resultDF - phony(1) ## system.time( ## + for (i in 2:1000) resultDF - rbind(resultDF, mylist[[i]]) ## +) ##user system elapsed ## 159.740 4.150 163.996 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please explain do.call in this context, or critique to stack this list faster
On 09/04/2010 01:37 PM, Paul Johnson wrote: I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most elegant way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( rbind, mylist) do.call is *constructing* a function call from the list of arguments, my.list. It is shorthand for rbind(mylist[[1]], mylist[[2]], mylist[[3]]) assuming mylist has 3 elements. but it does not work to do sapply ( mylist, rbind). That's because sapply is calling rbind once for each item in mylist, not what you want to do to accomplish your goal. It might help to use a debugging technique to watch when rbind gets called, and see how many times it gets called and with what arguments using those two approaches. Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnsonpauljohn at ku.edu ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1- data.frame(x=rnorm(100),y=rnorm(100)) df2- data.frame(x=rnorm(100),y=rnorm(100)) df3- data.frame(x=rnorm(100),y=rnorm(100)) df4- data.frame(x=rnorm(100),y=rnorm(100)) mylist- list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF- mylist[[1]] for (i in 2:4) resultDF- rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## complete function in the mice package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m- 4 nr- nrow(df1) nc- ncol(df1) dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is helpful: ## http://stackoverflow.com/questions/tagged/r ## I started to type in the question and 3 plausible answers ## popped up before I could finish. ## The terse answer is: shortAnswer- do.call(rbind,mylist) ## That's the right answer, see: shortAnswer == dataComplete ## But I don't understand why it works. ## More importantly, I don't know if it is fastest, or best. ## It is certainly less error prone than dataComplete ## First, make a bigger test case and use system.time to evaluate phony- function(i){ data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000)) } mylist- lapply(1:1000, phony) ### First, try the terse way system.time( shortAnswer- do.call(rbind, mylist) ) ### Second, try the complete way: m- 1000 nr- nrow(df1) nc- ncol(df1) system.time( dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ) system.time( for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]] ) ## On my Thinkpad T62 dual core, the shortAnswer approach takes about ## three times as long: ## system.time( bestAnswer- do.call(rbind,mylist) ) ##user system elapsed ## 14.270 1.170 15.433 ## system.time( ## +dataComplete- as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ## + ) ##user system elapsed ## 0.000 0.000 0.006 ## system.time( ## + for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ]- mylist[[j]] ## + ) ##user system elapsed ## 4.940 0.050 4.989 ## That makes the do.call way look slow, and I said hey, ## our stupid for loop at the beginning may not be so bad. ## Wrong. It is a disaster. Check this out: ## resultDF- phony(1) ## system.time( ## + for (i in 2:1000) resultDF- rbind(resultDF, mylist[[i]]) ## +) ##user system elapsed ## 159.740 4.150 163.996 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save data as .pdf or .JPG
On Wed, Sep 1, 2010 at 7:56 AM, khush bioinfo.kh...@gmail.com wrote: Hi all , I have following script to plot some data. plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) , las=2) axis (1, at = seq(0,1100,50), las =2) axis (2, at = seq(0,25,1), las =2) When I source(script.R), I got the image on interface but I do not want to use screenshot option to save the image? How can save the output to .pdf or .jpg format? Thank you Khushwant Hi! This is one of the things that is difficult for newcomers. I've written down a pretty thorough answer: http://pj.freefaculty.org/R/Rtips.html#5.2 pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save data as .pdf or .JPG
On Sat, 2010-09-04 at 13:57 -0500, Paul Johnson wrote: On Wed, Sep 1, 2010 at 7:56 AM, khush bioinfo.kh...@gmail.com wrote: Hi all , I have following script to plot some data. plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) , las=2) axis (1, at = seq(0,1100,50), las =2) axis (2, at = seq(0,25,1), las =2) When I source(script.R), I got the image on interface but I do not want to use screenshot option to save the image? How can save the output to .pdf or .jpg format? Thank you Khushwant Hi! This is one of the things that is difficult for newcomers. I've written down a pretty thorough answer: http://pj.freefaculty.org/R/Rtips.html#5.2 pj Very nice text, indeed. As my 2 cents I strongly recommend to use for graphs the .png format, .jpg is primarily designed for photographs. Have a nice time Tomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What solve() does?
On Sep 4, 2010, at 2:29 PM, Petar Milin wrote: Thank you so much! This is very useful! Any thoughts about how to run Gaussian elimination? Do some searching? RSiteSearch(gaussian elimination, restrict = c(Rhelp10, Rhelp08, Rhelp02, functions ) ) returns (among other things) a link to a John Fox post from 2005: http://finzi.psych.upenn.edu/R/Rhelp02/archive/49950.html -- David. Best, PM On 04/09/10 20:23, Paul Johnson wrote: On Wed, Sep 1, 2010 at 5:36 AM, Petar Milinpmi...@ff.uns.ac.rs wrote: Hello! Can anyone explain me what solve() function does: Gaussian elimination or iterative, numeric solve? In addition, I would need both the Gaussian elimination and iterative solution for the course. Are the two built in R? Thanks! PM Hello, Petar: I think you are assuming that solve uses an elementary linear algebra paper and pencil procedure, but I don't think it does. In a digital computer, those things are not precise, and I think the folks here will even say you shouldn't use solve to get an inverse, but I can't remember all of the details. To see how solve works ... Let me show you a trick I just learned. Read ?solve notice it is a generic method, meaning it does not actually do the calculations for you. Rather, there are specific implementations for different types of cases. To find the implementations, run methods(solve) I get: methods(solve) [1] solve.default solve.qr Then if you want to read HOW solve does what it does (which I think was your question), run this: solve.default or solve.qr In that code, you will see the chosen procedure depends on the linear algebra libraries you make available. I'm no expert on the details, but it appears QR decomposition is the preferred method. You can read about that online or in numerical algebra books. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] non-zero exit status error when install GenomeGraphs
Hi, I am trying to install GenomeGraphs package from bioconductor, but failed by a non-zero exit error. From the error message, it seems that there is a shared library problem. Any suggestion on fixing it? Thanks so much. sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=en_US.iso885915 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.1 source(http://bioconductor.org/biocLite.R;) biocLite(GenomeGraphs)Warning messages: 1: In safeSource() : Redefining 'biocinstall' 2: In safeSource() : Redefining 'biocinstallPkgGroups' 3: In safeSource() : Redefining 'biocinstallRepos' biocLite(GenomeGraphs) Using R version 2.10.1, biocinstall version 2.5.11. Installing Bioconductor version 2.5 packages: [1] GenomeGraphs Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '/cchome/cchen1/R/x86_64-unknown-linux-gnu-li brary/2.10' trying URL ' http://www.bioconductor.org/packages/2.5/bioc/src/contrib/GenomeGrap hs_1.6.0.tar.gz' Content type 'application/x-gzip' length 585078 bytes (571 Kb) opened URL == downloaded 571 Kb * installing *source* package 'GenomeGraphs' ... ** R ** data ** inst ** preparing package for lazy loading Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library '/apps/rhel5/x86_64/R/R-2.10.1//lib64/R/library/ XML/libs/XML.so': libxmlsec1.so.1: cannot open shared object file: No such file or directory Error : .onLoad failed in 'loadNamespace' for 'XML' Error : package 'biomaRt' could not be loaded ERROR: lazy loading failed for package 'GenomeGraphs' * removing '/userhom2/3/cchen1/R/x86_64-unknown-linux-gnu-library/2.10/GenomeGra phs' The downloaded packages are in '/tmp/Rtmp3wsJxw/downloaded_packages' Warning message: In install.packages(pkgs = pkgs, repos = repos, ...) : installation of package 'GenomeGraphs' had non-zero exit status -- Chen, Chao Psychiatry University of Chicago 924 E 57th St, Chicago, IL 60637 U. S. A. MOE Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University 220# Handan Road, Shanghai (200433) P.R.China [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non-zero exit status error when install GenomeGraphs
(Caveat: I am not a bioc user.) The error messages suggest that you are missing dependencies. I looked at the documentation for GenomeGraphs and it does not list any dependencies, but I have no way of knowing how careful or knowledgeable the authors may or may not have bben when they composed that document. The fact that you are posting to the wrong mailing list and are not including what version of linux (although there is a hint it may be RedHat5) you are running suggests you could be fairly new at this. Is biocLite the correct function for installing a bioc package? It appears it may be, but I'm wondering if there is an argument for dependencies as there is in install.packages() that you need to set to TRUE? If biocLite has a ,... in its argument list (and the error message suggests that it does) then you may get better results with the same call with an addition of dependencies=TRUE. Or you could first install the packages that are reported missing: XML and biomaRt, and then try again as you did before. Links to the bioc mailing lists can be found here: http://www.bioconductor.org/help/index.html -- David. On Sep 4, 2010, at 4:07 PM, chen chao wrote: Hi, I am trying to install GenomeGraphs package from bioconductor, but failed by a non-zero exit error. From the error message, it seems that there is a shared library problem. Any suggestion on fixing it? Thanks so much. sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C [3] LC_TIME=en_US.iso885915LC_COLLATE=en_US.iso885915 [5] LC_MONETARY=C LC_MESSAGES=en_US.iso885915 [7] LC_PAPER=en_US.iso885915 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.1 source(http://bioconductor.org/biocLite.R;) biocLite(GenomeGraphs)Warning messages: 1: In safeSource() : Redefining 'biocinstall' 2: In safeSource() : Redefining 'biocinstallPkgGroups' 3: In safeSource() : Redefining 'biocinstallRepos' biocLite(GenomeGraphs) Using R version 2.10.1, biocinstall version 2.5.11. Installing Bioconductor version 2.5 packages: [1] GenomeGraphs Please wait... Warning in install.packages(pkgs = pkgs, repos = repos, ...) : argument 'lib' is missing: using '/cchome/cchen1/R/x86_64-unknown-linux-gnu-li brary/2.10' trying URL ' http://www.bioconductor.org/packages/2.5/bioc/src/contrib/GenomeGrap hs_1.6.0.tar.gz' Content type 'application/x-gzip' length 585078 bytes (571 Kb) opened URL == downloaded 571 Kb * installing *source* package 'GenomeGraphs' ... ** R ** data ** inst ** preparing package for lazy loading Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library '/apps/rhel5/x86_64/R/R-2.10.1//lib64/R/library/ XML/libs/XML.so': libxmlsec1.so.1: cannot open shared object file: No such file or directory Error : .onLoad failed in 'loadNamespace' for 'XML' Error : package 'biomaRt' could not be loaded ERROR: lazy loading failed for package 'GenomeGraphs' * removing '/userhom2/3/cchen1/R/x86_64-unknown-linux-gnu-library/2.10/GenomeGra phs' The downloaded packages are in '/tmp/Rtmp3wsJxw/downloaded_packages' Warning message: In install.packages(pkgs = pkgs, repos = repos, ...) : installation of package 'GenomeGraphs' had non-zero exit status -- Chen, Chao Psychiatry University of Chicago 924 E 57th St, Chicago, IL 60637 U. S. A. MOE Key Laboratory of Contemporary Anthropology and Center for Evolutionary Biology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University 220# Handan Road, Shanghai (200433) P.R.China [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please explain do.call in this context, or critique to stack this list faster
To echo what Erik said, the second argument of do.call(), arg, takes a list of arguments that it passes to the specified function. Since rbind() can bind any number of data frames, each dataframe in mylist is rbind()ed at once. These two calls should take about the same time (except for time saved typing): rbind(mylist[[1]], mylist[[2]], mylist[[3]], mylist[[4]]) # 1 do.call(rbind, mylist) # 2 On my system using: set.seed(1) dat - rnorm(10^6) df1 - data.frame(x=dat, y=dat) mylist - list(df1, df1, df1, df1) They do take about the same time (I started two instances of R and ran both calls but swithed the order because R has a way of being faster the second time you do the same thing). [1] Order: 1, 2 user system elapsed 0.600.140.75 user system elapsed 0.410.140.54 [1] Order: 2, 1 user system elapsed 0.560.210.76 user system elapsed 0.410.140.55 Using the for loop is much slower in your later example because rbind() is getting called over and over, plus you are incrementally increasing the size of the object containing your results. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together For my own curiosity, are you reading in a bunch of separate data files or are these the results of various operations that you eventually want to combine? Cheers, Josh On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote: I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most elegant way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( rbind, mylist) but it does not work to do sapply ( mylist, rbind). Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnson pauljohn at ku.edu ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1 - data.frame(x=rnorm(100),y=rnorm(100)) df2 - data.frame(x=rnorm(100),y=rnorm(100)) df3 - data.frame(x=rnorm(100),y=rnorm(100)) df4 - data.frame(x=rnorm(100),y=rnorm(100)) mylist - list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF - mylist[[1]] for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## complete function in the mice package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m - 4 nr - nrow(df1) nc - ncol(df1) dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is helpful: ## http://stackoverflow.com/questions/tagged/r ## I started to type in the question and 3 plausible answers ## popped up before I could finish. ## The terse answer is: shortAnswer - do.call(rbind,mylist) ## That's the right answer, see: shortAnswer == dataComplete ## But I don't understand why it works. ## More importantly, I don't know if it is fastest, or best. ## It is certainly less error prone than dataComplete ## First, make a bigger test case and use system.time to evaluate phony - function(i){ data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000)) } mylist - lapply(1:1000, phony) ### First, try the terse way system.time( shortAnswer - do.call(rbind, mylist) ) ### Second, try the complete way: m - 1000 nr - nrow(df1) nc - ncol(df1) system.time( dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ) system.time( for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ) ## On my Thinkpad T62 dual core, the shortAnswer approach takes about ## three times as long: ## system.time( bestAnswer - do.call(rbind,mylist) ) ## user system elapsed ## 14.270 1.170 15.433 ## system.time( ## + dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ## + )
Re: [R] How to generate integers from uniform distribution with
On 04-Sep-10 19:27:54, Yi wrote: Enh, I see. It totally makes sense. Thank you for your perfect explanation. Enjoy the long weekend~ Yi You're welcome! Earlier I tried an experiment with rejection sampling, which seems to work well for the case where you want mean of the sampled values to exactly be the mean of the range being sampled from. The number of tries, even for a large sample, was lower than I had anticipated. Example (sample size = 2, sampled range is {-3,-2,-1,0,1,2,3}, mean = 0 therefore require that sum of sample = 0): S - (-10) ; n - 0 P - ((-3):3) while((S != 0)){ x - sample(P,2,replace=TRUE,prob=c(1,1,1,1,1,1,1)) S - sum(x) ; n - (n+1) } n hist(x) To get your case of sampling the integers from (17:23) with sample mean always exactly 20, simply add 20 to the result x of the above loop. I found that I got values of n like: 126, 43, 403, 811, 385, 568, 590, 1758, 317, 456, 643, ... with every run being completed well within 2 seconds. Conditioning the value of the sum to be equal to the central value (0) of the range is also conditioning the value to be equal to the most probable value of the sum, so the runs will on average be shortest. Conditioning on a different mean (say mean = +1 for a sample of size 2, so sum = 2) would take much much longer (see below). One can use the Normal approximation to the distribution of the sum to estimate how long it might take. One value sampled from ((-3):3) has mean 0 and variance 4.66.. , so the sum of 2 has mean 0 and variance 9.33, hence the probability that the sum will be 0 is approximated by pnorm(0.5,0,sqrt(9.33)) - pnorm(-0.5,0,sqrt(9.33)) = 0.001305845 so the probability of success with one sample of 2 is about 1/766 (which is consistent with the above results for n). On the other hand, conditioning on the mean of x being 1, i.e. on the sum being 2, the chance of success is pnorm(2.5,0,sqrt(9.33)) - pnorm(1.5,0,sqrt(9.33)) which R computes as zero! Hence you have practically no chance of achieving this within any reasonable time. However, of course, the SE of the mean is sqrt((sum(P^2)/6)/2) = 0.01527525, so you are aiming at a point which is about 60 SEs from the mean. The numbers are more reasonable if, instead of conditioning on the mean, you condition on the sum (not too far from 0), e.g. with sample size 2 as before: 1 Sum must be 50, Prob(success) = pnorm(50.5,0,sqrt(9.33)) - pnorm(49.5,0,sqrt(9.33)) = 0.001288472 ~= 1/776 2 Sum must be 100, Prob(success) = pnorm(100.5,0,sqrt(9.33)) - pnorm(99.5,0,sqrt(9.33)) = 0.001237729 ~= 1/808 3 Sum must be 200, Prob(success) = pnorm(200.5,0,sqrt(9.33)) - pnorm(199.5,0,sqrt(9.33)) = 0.001053971 ~= 1/949 4 Sum must be 500, Prob(success) = pnorm(500.5,0,sqrt(9.33)) - pnorm(499.5,0,sqrt(9.33)) = 0.0003421745 ~= 1/2922 and so on. So even aiming at 500 it would on average only take about 3000 tries to hit it. After that it rapidly becomes less likely. Ted. On 04-Sep-10 19:27:54, Yi wrote: Enh, I see. It totally makes sense. Thank you for your perfect explanation. Enjoy the long weekend~ Yi E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Sep-10 Time: 21:53:58 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Decision Tree in Python or C++?
for python, please check http://onlamp.com/pub/a/python/2006/02/09/ai_decision_trees.html On Sat, Sep 4, 2010 at 11:21 AM, noclue_ tim@netzero.net wrote: Have anybody used Decision Tree in Python or C++? (or written their own decision tree implementation in Python or C++)? My goal is to run decision tree on 8 million obs as training set and score 7 million in test set. I am testing 'rpart' package on a 64-bit-Linux + 64-bit-R environment. But it seems that rpart is either not stable or running out of memory very quickly. (Is it because R is passing everything as copy instead of as object reference?) Any idea would be greatly appreciated! Have a nice weekend! -- View this message in context: http://r.789695.n4.nabble.com/Decision-Tree-in-Python-or-C-tp2526810p2526810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- == WenSui Liu wens...@paypal.com statcompute.spaces.live.com == __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please explain do.call in this context, or critique to stack this list faster
Paul; There is another group of functions that are similar to do.call in their action of serial applications of a function to a list or vector. They are somewhat more tolerant in that dyadic operators can be used as the function argument, whereas do.call is really just expanding the second argument The one that is _most_ similar is Reduce() ?Reduce A somewhat smaller example than ours... df1- data.frame(x=rnorm(5),y=rnorm(5)) df2- data.frame(x=rnorm(5),y=rnorm(5)) df3- data.frame(x=rnorm(5),y=rnorm(5)) df4- data.frame(x=rnorm(5),y=rnorm(5)) mylist- list(df1, df2, df3, df4) Reduce(rbind, mylist) x y 1 -0.40175483 -0.96187409 2 0.76629538 0.92201312 3 2.44535842 0.90634825 4 0.57784258 -2.12756145 5 -1.62083235 -0.96310011 6 0.02625574 1.17684408 7 1.52412427 -0.26432372 snipped remaining rows do.call(+, list(1:3)) [1] 1 2 3 do.call(+, list(a=1:3, b=3:5)) [1] 4 6 8 do.call(+, list(a=1:3, b=3:5, cc=7:9)) Error in `+`(a = 1:3, b = 3:5, cc = 7:9) : operator needs one or two arguments Reduce(+, list(a=1:3, b=3:5, cc=7:9)) [1] 11 14 17 Reduce has the capability of accumulate-ing its intermediate results: Reduce(+, 1:10) [1] 55 Reduce(+, 1:10, accumulate=TRUE) [1] 1 3 6 10 15 21 28 36 45 55 On Sep 4, 2010, at 4:41 PM, Joshua Wiley wrote: To echo what Erik said, the second argument of do.call(), arg, takes a list of arguments that it passes to the specified function. Since rbind() can bind any number of data frames, each dataframe in mylist is rbind()ed at once. These two calls should take about the same time (except for time saved typing): rbind(mylist[[1]], mylist[[2]], mylist[[3]], mylist[[4]]) # 1 do.call(rbind, mylist) # 2 On my system using: set.seed(1) dat - rnorm(10^6) df1 - data.frame(x=dat, y=dat) mylist - list(df1, df1, df1, df1) They do take about the same time (I started two instances of R and ran both calls but swithed the order because R has a way of being faster the second time you do the same thing). [1] Order: 1, 2 user system elapsed 0.600.140.75 user system elapsed 0.410.140.54 [1] Order: 2, 1 user system elapsed 0.560.210.76 user system elapsed 0.410.140.55 Using the for loop is much slower in your later example because rbind() is getting called over and over, plus you are incrementally increasing the size of the object containing your results. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together For my own curiosity, are you reading in a bunch of separate data files or are these the results of various operations that you eventually want to combine? Cheers, Josh On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote: I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most elegant way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( rbind, mylist) but it does not work to do sapply ( mylist, rbind). Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnson pauljohn at ku.edu ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1 - data.frame(x=rnorm(100),y=rnorm(100)) df2 - data.frame(x=rnorm(100),y=rnorm(100)) df3 - data.frame(x=rnorm(100),y=rnorm(100)) df4 - data.frame(x=rnorm(100),y=rnorm(100)) mylist - list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF - mylist[[1]] for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## complete function in the mice package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m - 4 nr - nrow(df1) nc - ncol(df1) dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is
Re: [R] Please explain do.call in this context, or critique to stack this list faster
On Sat, Sep 4, 2010 at 2:37 PM, Paul Johnson pauljoh...@gmail.com wrote: I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it This has nothing specifically to do with do.call but note that R is faster at handling matrices than data frames. Below we see that rbind-ing 4 data frames takes over 100 times as long as rbind-ing matrices with the same data: mylist - list(iris[-5], iris[-5], iris[-5], iris[-5]) L - lapply(mylist, as.matrix) library(rbenchmark) benchmark( + df = do.call(rbind, mylist), + mat = do.call(rbind, L), + order = relative, replications = 250 + ) test replications elapsed relative user.self sys.self user.child sys.child 2 mat 2500.011 0.02 0.00 NANA 1 df 2501.06 106 1.03 0.01 NANA -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please explain do.call in this context, or critique to stack this list faster
Hi: Here's my test: l - vector('list', 1000) for(i in seq_along(l)) l[[i]] - data.frame(x=rnorm(100),y=rnorm(100)) system.time(u1 - do.call(rbind, l)) user system elapsed 0.490.060.60 resultDF - data.frame() system.time(for (i in 1:1000) resultDF - rbind(resultDF, l[[i]])) user system elapsed 10.340.06 10.53 identical(u1, resultDF) [1] TRUE The problem with the second approach, which is really kind of an FAQ by now, is that repeated application of rbind as a standalone function results in 'Spaceballs: the search for more memory!' The base object gets bigger as the iterations proceed, something new is being added, so more memory is needed to hold both the old and new objects. This is an inefficient time killer because as the loop proceeds, increasingly more time is invested in finding new memory. Interestingly, this doesn't scale linearly: if we make a list of 1 100 x 2 data frames, I get the following: l - vector('list', 1) for(i in seq_along(l)) l[[i]] - data.frame(x=rnorm(100),y=rnorm(100)) system.time(u1 - do.call(rbind, l)) user system elapsed 55.56 30.62 88.11 dim(u1) [1] 100 2 str(u1) 'data.frame': 100 obs. of 2 variables: $ x: num -0.9516 -0.6948 0.0523 2.5798 -0.0862 ... $ y: num 1.466 0.165 1.375 0.571 -1.099 ... rm(u1) rm(resultDF) resultDF - data.frame() # go take a shower and come back system.time(for (i in 1:10) resultDF - rbind(resultDF, l[[i]])) user system elapsed 977.33 121.41 1130.26 dim(resultDF) [1] 100 2 This time, neither do.call nor iterative rbind did very well. One common way around this is to pre-allocate memory and then to populate the object using a loop, but a somewhat easier solution here turns out to be ldply() in the plyr package. The following is the same idea as do.call(rbind, l), only faster: system.time(u3 - ldply(l, rbind)) user system elapsed 6.070.016.09 dim(u3) [1] 100 2 str(u3) 'data.frame': 100 obs. of 2 variables: $ x: num -0.9516 -0.6948 0.0523 2.5798 -0.0862 ... $ y: num 1.466 0.165 1.375 0.571 -1.099 ... HTH, Dennis On Sat, Sep 4, 2010 at 11:37 AM, Paul Johnson pauljoh...@gmail.com wrote: I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most elegant way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( rbind, mylist) but it does not work to do sapply ( mylist, rbind). Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnson pauljohn at ku.edu ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1 - data.frame(x=rnorm(100),y=rnorm(100)) df2 - data.frame(x=rnorm(100),y=rnorm(100)) df3 - data.frame(x=rnorm(100),y=rnorm(100)) df4 - data.frame(x=rnorm(100),y=rnorm(100)) mylist - list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF - mylist[[1]] for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## complete function in the mice package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m - 4 nr - nrow(df1) nc - ncol(df1) dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is helpful: ## http://stackoverflow.com/questions/tagged/r ## I started to type in the question and 3 plausible answers ## popped up before I could finish. ## The terse answer is: shortAnswer - do.call(rbind,mylist) ## That's the right answer, see: shortAnswer == dataComplete ## But I don't understand why it works. ## More importantly, I don't know if it is fastest, or best. ## It is certainly less error prone than dataComplete ## First, make a bigger test case and use system.time to evaluate phony - function(i){
Re: [R] non-zero exit status error when install GenomeGraphs
On 09/04/2010 01:38 PM, David Winsemius wrote: (Caveat: I am not a bioc user.) The error messages suggest that you are missing dependencies. I looked at the documentation for GenomeGraphs and it does not list any dependencies, but I have no way of knowing how careful packageDescription(GenomeGraphs)$Depends [1] methods, biomaRt, grid and likewise for biomaRt. Also at http://bioconductor.org/packages/release/bioc/html/GenomeGraphs.html for the current (release) version, or replacing 'release' with '2.5' (from the original poster's biocLite invocation) for the version relevant to R-2.10 http://bioconductor.org/packages/release/bioc/html/GenomeGraphs.html or knowledgeable the authors may or may not have bben when they composed that document. The fact that you are Not sure what 'that document' means, but if it's the package vignette or reference manual then that's not the appropriate place for stating package dependencies -- like all R packages, this information belongs in the package DESCRIPTION file, with dependencies enforced at installation time. Also the Bioconductor build system only makes available packages that do not produce errors on R CMD build and R CMD check, i.e., packages that have fully specified their dependencies. And the build system is versioned, so the original poster is getting (Bioconductor) packages that are appropriate for their system (though CRAN packages come from CRAN and so are not versioned in sync with R). posting to the wrong mailing list and are not including what version of linux (although there is a hint it may be RedHat5) you are running suggests you could be fairly new at this. Is biocLite the correct function for installing a bioc package? It appears it may be, but I'm wondering if there is an Yes it is. It does install dependencies, to the same extent that install.packages() does; biocLite is a wrapper around install.packages that inserts the appropriate (for the R version) Bioconductor repositories in front of CRAN repositories. Likely the original poster has an installed version of the XML package, but it is not installed correctly or the installation has become compromised in some way, e.g., by removing or updating libxml in the operating system. Your advice -- install (or otherwise troubleshoot) XML -- is likely part of the right solution, but since XML comes from the R repository and is therefore only known to build with the current version of R, it makes sense for the original poster to update their version of R first. Martin argument for dependencies as there is in install.packages() that you need to set to TRUE? If biocLite has a ,... in its argument list (and the error message suggests that it does) then you may get better results with the same call with an addition of dependencies=TRUE. Or you could first install the packages that are reported missing: XML and biomaRt, and then try again as you did before. Links to the bioc mailing lists can be found here: http://www.bioconductor.org/help/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
Hi I know asking which test to use is frowned upon on this list... so please do read on for at least a couple on sentences... I have some multivariate data slit as follows Tumour Site (one of 5 categories) # Chemo Schedule (one of 3 cats) ## Cycle (one of 3 cats*) ## Dose (one of 3 cats*) # *These are actually integers but for all our other analysis so far we have grouped them into logical bands of categories. The dependant variable is Reaction or No Reaction I have individually analysed each of the independant variables against Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## produced p values less than 0.05, and those marked # produce p values close to 0.05. We believe that Cycle is the crucial piece of data - the others just appear to be different because there are more early cycles in certain groups than others. SO - I believe what I need to do is a Linear Logistic Regression on the 4 independant variables. And I'm expecting it to show that the tumour site, schedule and dose don't matter, only the cycle matters. Done a lot of reading and I'm clueless!! I think I want to do something like: glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson) I am then expecting to see some very long output with lots of numbers... ...my question is TWO fold - 1. is glm the right thing to use before I waste my time and 2. how do I interpret the result! (I'm kind of expect a lecture here as I'm really looking for a nice snappy 'p0.05 means this variable is the one having the influence' type answer and I suspect I'm going to be told thats not possible...! To be clear the example given in the docs is: library(MASS) data(anorexia) anorex.1- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) The output of anorex.1 is: Call: glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) Coefficients: (Intercept)PrewtTreatCont TreatFT 49.7711 -0.5655 -4.0971 4.5631 Degrees of Freedom: 71 Total (i.e. Null); 68 Residual Null Deviance:4525 Residual Deviance: 3311 AIC: 490 and the output of summary(anorex.1) is: Call: glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) Deviance Residuals: Min1QMedian3Q Max -14.1083 -4.2773 -0.54845.4838 15.2922 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 49.771113.3910 3.717 0.000410 *** Prewt-0.5655 0.1612 -3.509 0.000803 *** TreatCont-4.0971 1.8935 -2.164 0.033999 * TreatFT 4.5631 2.1333 2.139 0.036035 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 48.69504) Null deviance: 4525.4 on 71 degrees of freedom Residual deviance: 3311.3 on 68 degrees of freedom AIC: 489.97 Number of Fisher Scoring iterations: 2 --- Either can someone point me to a decent place that would explain what the means or provide me some pointers? i.e. which of the variables has the influence on the outcome in the anorexia data? Please don't shout!! happy to be pointed to a reference but would prefer one in common english not some stats mumbo jumbo! Calum __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Query regarding Windows based statistical software development using R as programming language
I am not sure how best to answer your question since the phrases user-friendly and like SPSS do not belong in the same sentence in my mind (unless separated by a word along the lines of unlike). And Windows Based Programming Language feels a bit like an oxymoron. But since the R Commander package exists (there are other tools as well, JGR, R-PLUS, others) and provides a menu/dialog interface to R, and since the Rexcel project integrates this into MS Excel so that the user can use the power of R without ever leaving Excel and realizing that they are using a superior tool, I expect the answer is Yes. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Soumen Pal Sent: Friday, September 03, 2010 10:21 PM To: r-help@r-project.org Subject: [R] Query regarding Windows based statistical software development using R as programming language Hi, I am a beginner in R. I have a query as below: Is it possible to develop a Windows based statistical software (user-friendly) like SPSS using R as a programming language? Otherwise, is it possible to use R code directly (no command-line execution) in Windows based programming language such as Visual Basic? Please help me, if possible, with some link to study materials related to such topic. -- Thanks Regards, Soumen Pal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Levels in returned data.frame after subset
The advantage of computers is that they do exactly what they are told. The disadvantage of computers is that they do exactly what they are told. R is a set of instructions to the computer, those instructions are a combinations from the original programmers and from you. Who should make important decisions about the structure of your data? A group of (admittedly brilliant) programmers who have never seen your data nor know what questions you are trying to answer, or you (who hopefully knows more about your data and questions)? I don't claim to be more intelligent/knowledgable than the programmers of R, but I am grateful that they have/had sufficient humility to allow for the possibility that I may actually know something about my data and questions that they don't (or maybe they are just to lazy to do my job for me, but that is also appropriate). In your example below, why do you care what the levels of gender are after the subset? Why waste time/effort dropping the levels for a column that by definition only has one value? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Ulrik Stervbo Sent: Saturday, September 04, 2010 6:53 AM To: r-help@r-project.org Subject: [R] Levels in returned data.frame after subset Dear List, When I subset a data.frame, the levels are not re-adjusted (see example). Why is this? Am I missing out on some basic stuff here? Thanks Ulrik m - data.frame(gender = c(M, M,F), ht = c(172, 186.5, 165), wt = c(91,99, 74)) dim(m) [1] 3 3 levels(m$gender) [1] F M s - subset(m, m$gender == M) dim(s) [1] 2 3 levels(s$gender) [1] F M cat - sapply(s, is.factor); s[cat] - lapply(s[cat], factor) dim(s) [1] 2 3 levels(s$gender) [1] M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
On Sep 4, 2010, at 6:53 PM, st...@wittongilbert.free-online.co.uk wrote: Hi I know asking which test to use is frowned upon on this list... so please do read on for at least a couple on sentences... I have some multivariate data slit as follows Tumour Site (one of 5 categories) # Chemo Schedule (one of 3 cats) ## Cycle (one of 3 cats*) ## Dose (one of 3 cats*) # *These are actually integers but for all our other analysis so far we have grouped them into logical bands of categories. The dependant variable is Reaction or No Reaction I have individually analysed each of the independant variables against Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## produced p values less than 0.05, and those marked # produce p values close to 0.05. We believe that Cycle is the crucial piece of data - the others just appear to be different because there are more early cycles in certain groups than others. SO - I believe what I need to do is a Linear Logistic Regression on the 4 independant variables. And I'm expecting it to show that the tumour site, schedule and dose don't matter, only the cycle matters. Done a lot of reading and I'm clueless!! I think I want to do something like: glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson) I am then expecting to see some very long output with lots of numbers... ...my question is TWO fold - 1. is glm the right thing to use before I waste my time Yes, but if your outcome variable is binomial then the family argument should be binomial. (And if you thought it should be poisson, then why below did you use gaussian??? and 2. how do I interpret the result! Result? What result? I do see any description of your data, nor any code. (I'm kind of expect a lecture here as I'm really looking for a nice snappy 'p0.05 means this variable is the one having the influence' type answer and I suspect I'm going to be told thats not possible...! I think you need to consult a statistician or someone who has taken the time to read that statistical mumbo jumbo you don't want to learn. This mailing list is not set up to be a tutorial site. (Re your request below: Some years ago I saw one of those programmed learning texts by Kleinbaum on logistic regression. Maybe you could read it and see if it makes your consulting sessions go more smoothly.) http://www.bookfinder.com/search/?author=kleinbaumtitle=logistic+regressionlang=enisbn=submit=Begin+searchnew_used=*destination=uscurrency=USDmode=basicst=srac=qr I have a couple of Kleinbaum's (et al) other texts and find them to be well written and reasoned, so I suspect the citation above would be as accessible as any. To be clear the example given in the docs is: library(MASS) snipped an example that was not relevant to logistic regression --- Either can someone point me to a decent place that would explain what the means or provide me some pointers? i.e. which of the variables has the influence on the outcome in the anorexia data? Please don't shout!! happy to be pointed to a reference but would prefer one in common english not some stats mumbo jumbo! Calum -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use prediction
Hi, I have a question regarding the usage of prediction in R: I have an input data set X, and an output data set Y, I can build up the correlation between them using kcca() of kernlab, but after I have that correlation, how can I predict the output Y1 of a new input X1? I read about the gausspr() but I don't know how to bring the result of kcca() to use as parameters for gausspr(). Any replies is appreciated! Thanks a lot, James. -- View this message in context: http://r.789695.n4.nabble.com/How-to-use-prediction-tp2527030p2527030.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I fixe convergence=1 in optim
Hi R users, I am using the optim funciton to maximize a log likelihood function. My code is as follows: p-optim(c(-0.2392925,0.4653128,-0.8332286, 0.0657, -0.0031, -0.00245, 3.366, 0.5885, -0.8, 0.0786,-0.00292,-0.00081, 3.266, -0.3632, -0.49,0.1856, 0.00394, -0.00193, -0.889, 0.5379, -0.63, 0.213, 0.00338, -0.00026, -0.8912, -0.3023, -0.56), f, method =BFGS, hessian =TRUE, y=y,X=X,W=W) After I ran the code, I got the following results: p $par [1] 2.235834e-02 1.282826e-01 -3.786014e-01 7.422526e-02 3.037931e-02 -2.570156e-03 3.365872e+00 2.618893e-01 -1.987859e-06 [10] 7.970083e-02 2.878574e-03 -1.391019e-03 3.265966e+00 -4.153697e-01 -3.185684e-03 1.833200e-01 -7.247683e-03 -3.156813e-03 [19] -8.889219e-01 6.208612e-01 2.678643e-04 2.183787e-01 2.715062e-02 2.943905e-04 -8.913260e-01 -5.100482e-01 -3.477559e-04 $value [1] -932.1423 $counts function gradient 1439 100 $convergence [1] 1 $message NULL $hessian ( I omitted the approximation results for the hessian here to save space) ~~ The error code 1 for convergence shown above means that the iteration limit maxit had been reached. How can I fix this problem and achieve convergence for my optimization problem? Can I increase the number of maxit so that convergence might occur? Thanks for your help. If more information is needed, please let me know. Maomao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function Gini or Ineq
Hi Marcio You might like to look at some equivalents from the field of ecology, for which there are existing functions. Have a look at the function diversity in the package vegan. This provides the Simpson diversity index, which is the complement of the Gini coefficient (Gini = 1 - Simpson). See attached paper by Stirling (2007). I'm not sure what you want to do with your weightings, but you could have a look at Rao's quadratic entropy index: this is a weighted diversity index (in ecology usually weighted by the abundance of the species, which are the objects for which diversity is measured). You can get this from the function divc in the package ade4. There are also some other weighted diversity indices in the package FD (functional diversity). HTH Karen On Fri 03Sep10, Mestat wrote: Hi listers, Does it necessary to install any package in order to use the GINI or INEQ functions. If I use the following command the R tells me that didn't find the GINI function. x-c(541, 1463, 2445, 3438, 4437, 5401, 6392, 8304, 11904, 22261) G-gini(x) Thanks in advance, Marcio -- View this message in context: http://r.789695.n4.nabble.com/Function-Gini-or-Ineq-tp2525852p2525852.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Query regarding Windows based statistical software development using R as programming language
Have a look at deducer. http://www.deducer.org/manual.html On Sat, Sep 4, 2010 at 12:20 PM, Soumen Pal soumen.4...@gmail.com wrote: Hi, I am a beginner in R. I have a query as below: Is it possible to develop a Windows based statistical software (user-friendly) like SPSS using R as a programming language? Otherwise, is it possible to use R code directly (no command-line execution) in Windows based programming language such as Visual Basic? Please help me, if possible, with some link to study materials related to such topic. -- Thanks Regards, Soumen Pal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- CH Chan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I fixe convergence=1 in optim
On Sep 4, 2010, at 4:18 PM, Sally Luo wrote: Hi R users, I am using the optim funciton to maximize a log likelihood function. My code is as follows: p-optim(c(-0.2392925,0.4653128,-0.8332286, 0.0657, -0.0031, -0.00245, 3.366, 0.5885, -0.8, 0.0786,-0.00292,-0.00081, 3.266, -0.3632, -0.49, 0.1856, 0.00394, -0.00193, -0.889, 0.5379, -0.63, 0.213, 0.00338, -0.00026, -0.8912, -0.3023, -0.56), f, method =BFGS, hessian =TRUE, y=y,X=X,W=W) After I ran the code, I got the following results: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~ p $par [1] 2.235834e-02 1.282826e-01 -3.786014e-01 7.422526e-02 3.037931e-02 -2.570156e-03 3.365872e+00 2.618893e-01 -1.987859e-06 [10] 7.970083e-02 2.878574e-03 -1.391019e-03 3.265966e+00 -4.153697e-01 -3.185684e-03 1.833200e-01 -7.247683e-03 -3.156813e-03 [19] -8.889219e-01 6.208612e-01 2.678643e-04 2.183787e-01 2.715062e-02 2.943905e-04 -8.913260e-01 -5.100482e-01 -3.477559e-04 $value [1] -932.1423 $counts function gradient 1439 100 $convergence [1] 1 $message NULL $hessian ( I omitted the approximation results for the hessian here to save space) ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~ The error code 1 for convergence shown above means that the iteration limit maxit had been reached. How can I fix this problem and achieve convergence for my optimization problem? Can I increase the number of maxit so that convergence might occur? I am wondering how you expect us to guess at the answer? You are the one who know what f is and you are the one who has the option of increasing maxit. If the question is how to increase maxit, then the answer is perhaps as easy as: ?optim -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to free memory? (gc() doesn't work for me)
Hi, all Thank you for your comments. I think that I misunderstood what gc() does because gc() is working as you posted. I posted my question because gc() doesn't reduce memory in use in a few system memory monitoring tools that I tested. Regards, Hyunchul On Sat, Sep 4, 2010 at 8:50 PM, jim holtman jholt...@gmail.com wrote: Seems to work for me: x - matrix(0,1,1) object.size(x) 80112 bytes gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells174104 4.7 741108 19.8741108 19.8 Vcells 101761938 776.4 113632405 867.0 102762450 784.1 rm(x) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 174202 4.7 741108 19.8741108 19.8 Vcells 1761954 13.5 90905923 693.6 102762450 784.1 On Sat, Sep 4, 2010 at 12:46 AM, Hyunchul Kim hyunchul.kim@gmail.com wrote: Hi, all I have a huge object that use almost all of available memory. R rm(a_huge_object) R gc() doesn't free memory and ?gc doesn't show anything. Are there any suggestion? Thanks in advance, Regards, Hyunchul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please explain do.call in this context, or critique to stack this list faster
One common way around this is to pre-allocate memory and then to populate the object using a loop, but a somewhat easier solution here turns out to be ldply() in the plyr package. The following is the same idea as do.call(rbind, l), only faster: system.time(u3 - ldply(l, rbind)) user system elapsed 6.07 0.01 6.09 I think all you want here is rbind.fill: system.time(a - rbind.fill(l)) user system elapsed 1.426 0.044 1.471 system.time(b - do.call(rbind, l)) user system elapsed 98 60 162 all.equal(a, b) [1] TRUE This is considerably faster than do.call + rbind because I spend a lot of time working out how to do this most efficiently. You can see the underlying code at http://github.com/hadley/plyr/blob/master/R/rbind.r - it's relatively straightforward except for ensuring the output columns are the same type as the input columns. This is a good example where optimised R code is much faster than C code. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I fixe convergence=1 in optim
To change the default maximum number of iterations (mxit =100 for derivative based algorithm), add mxit = whatever number you want. In most cases, you need a very good initial value! This is a real challenge in using optim(). Quite often, if the initial values is not well selected, optim() can give you nonsense estimates even the algorithm converges after number of iterations. -- View this message in context: http://r.789695.n4.nabble.com/How-can-I-fixe-convergence-1-in-optim-tp2527034p2527087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple ts() object question
Dear Community, say, I have an annual ts() object sampled from 1960 to 1969 like: ta-ts(1:10, start=1960, frequency=1) How can I extract the value from the year 1965? I mean, not by: ta[6] but by something like: ta[1965] where I'm directly referring to the year of the observation? Thank you in advance! -- View this message in context: http://r.789695.n4.nabble.com/simple-ts-object-question-tp2527085p2527085.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple ts() object question
On Sun, Sep 5, 2010 at 12:08 AM, StatWM wmus...@gmx.de wrote: Dear Community, say, I have an annual ts() object sampled from 1960 to 1969 like: ta-ts(1:10, start=1960, frequency=1) How can I extract the value from the year 1965? I mean, not by: ta[6] but by something like: ta[1965] where I'm directly referring to the year of the observation? Thank you in advance! Use window.ts ta - ts(1:10, start = 1960) window(ta, start = 1965, end = 1965) Time Series: Start = 1965 End = 1965 Frequency = 1 [1] 6 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple ts() object question
Hi, There is probably an easier way, but this will work: ta[time(ta)==1965] With your data, I get: ta[time(ta)==1965] [1] 6 HTH, Josh On Sat, Sep 4, 2010 at 9:08 PM, StatWM wmus...@gmx.de wrote: Dear Community, say, I have an annual ts() object sampled from 1960 to 1969 like: ta-ts(1:10, start=1960, frequency=1) How can I extract the value from the year 1965? I mean, not by: ta[6] but by something like: ta[1965] where I'm directly referring to the year of the observation? Thank you in advance! -- View this message in context: http://r.789695.n4.nabble.com/simple-ts-object-question-tp2527085p2527085.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I fixe convergence=1 in optim
Peng, C cpeng.usm at gmail.com writes: To change the default maximum number of iterations (mxit =100 for derivative based algorithm), add mxit = whatever number you want. that's maxit i.e. optim(...,control=list(maxit=...)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.