Re: [R] time series processing - count of datestamp delta's, per group
Hi Martin, it sounds like you want the difference between the first and the last observation per user, not, e.g., all the date differences between successive observations of each separate user. Correct me if I'm wrong. That said, let's build some toy data: set.seed(1) dataset - data.frame(User=sample(LETTERS[1:5],100,replace=TRUE), Date=sample(as.Date(2014-01-01)+0:364,100,replace=TRUE)) Now we can calculate these differences and plot a histogram or tabulate: foo - with(dataset,by(Date,User,function(xx)diff(range(xx hist(foo) table(foo) The key here is really the by() function, which calculates a function (here an anonymous function function(xx)diff(range(xx))) applied to some data (here dataset$Date) separately for each level of a grouping factor (here dataset$User). HTH, Stephan On 23.03.2014 01:32, Martin Tomko wrote: Apologies if the question is a but naïve, I am a novice in time series data handling in R I have the following type of data, in a long format ( as called by the spacetime vignette – the table contains also space, not noted here): User | Date | Otherdata | A | 01/01/2014 | aa A | 01/01/2014 | bb A | 01/01/2014 | cc B | 01/01/2014 | aa B | 05/01/2014 | cc A | 07/01/2014 | aa C | 05/02/2014 | xx C | 20/02/2014 | yy Etc [A,B,C,…] are user Ids (some strings). Date is converted into a Date format (2013-10-15) The table is sorted by User and then by Date, and is over 800K records long. There are about 20K users. User | Date | Otherdata | A | 2014-01-01 | aa A | 2014-01-01 | bb A | 2014-01-01 | cc A | 2014-01-07 | aa B | 2014-01-01 | aa B | 2014-01-05 | cc C | 2014-02-05 | xx C | 2014-02-20 | yy I want to: Get a frequency table ( and ultimately plot) of the count of differences (in days) between records of a user. Meaning, I would first get the unique days recorded: A | 2014-01-01 A | 2014-01-07 B | 2014-01-01 B | 2014-01-05 C | 2014-02-05 C | 2014-02-20 And then want to run the differences between timestamps within a group defined by the user, in days: A| 6 B| 4 C|15 Imagining that I have tens of thousands of records, I then want the table with the counts of differences ( across all users) ( in our case it would be 6, 4 and 15, all counte = 1) IN the larger sample, something like this: DeltaDays | Count 1 | 150 2 | 320 … N | X I know there are all sorts of packages for time analysis, but I could not find a simple function like this (incl searching here http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf ). I assume that something working on a simple data frame would be sufficient, but I am happy ( prefer?) to use TS. I would appreciate any hints. The ultimate analysis involves also space, so hints in the direction of space-time are welcome. Ultimately, I would like to separate records for each user into a dataset that can be handled separately, but splitting it into a large number of files does not seem wise. Any hint also appreciated. Thanks, Martin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Citing Package Contributing Authors
Hi, it usually is a good idea to look at the output of citation() (which, however, also often is auto-generated) or at the authors listed in package vignettes. And thanks for citing R package authors. When I review papers, I often have to remind authors of this... Best Stephan On 26.08.2013 21:56, Charles Determan Jr wrote: Greetings, I am familiar with the function cite('packageName') which provides the output generated from the DESCRIPTION file. In most cases this is sufficient but I was wondering if there are contributing authors (in addition to the primary) also listed on the CRAN page. Is there a proper way to account for them or are they generally not listed? Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving model and other objects from caret
Have you looked at ?save and ?load? As I already wrote here: http://stackoverflow.com/questions/14761496/saving-and-loading-a-model-in-r Best, Stephan On 07.02.2013 22:33, James Jong wrote: Say I train a model in caret, e.g.: RFmodel - train(X,Y,method='rf',trControl=myCtrl,tuneLength=1) How can I save this to disk and load it later in R? How about an object of the class resamples? resamps - resamples( list( RF = RFmodel, SVM = SVMmodel, KNN = KNNmodel, NN = NNmodel )) Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with stepAIC
Hi Katja, try fitting the original model using ML (not REML) with the parameter method = ML: PModell1 -lme(sqrt(Earthwormsm.2)~ Treatment+Pflanzenfrischmasse+aBodenfeuchte+bBodenfeuchte+Gfrischmasse+Ltrockenmasseanteil+KCN+I+Eindringtiefe, random=~1|Block/Treatment/Cluster/Patch, data=Test1,na.action = na.omit, method = ML) Good luck! Stephan On 29.09.2012 09:36, Katja wrote: Dear help community, I'm a R-beginner and use it for my master thesis. I've got a mixed model and want to analyse it with lme. There are a lot Cofactors that coult be relevant. To extract the important ones I want to do the stepAIC, but always get an error warning. Structure of my data: data.frame': 72 obs. of 54 variables: $ Block : Factor w/ 3 levels A,B,C: 1 1 1 1 1 1 1 1 1 1 ... $ Treatment : Factor w/ 3 levels 1,2,3: 1 1 1 1 1 1 1 1 2 2 ... $ Paddock: Factor w/ 9 levels A1,A2,A3,..: 1 1 1 1 1 1 1 1 2 2 ... $ Cluster: Factor w/ 4 levels 1,2,3,4: 1 1 2 2 3 3 4 4 1 1 ... $ Growth : Factor w/ 2 levels s,t: 1 2 1 2 1 2 1 2 1 2 ... $ Patch : Factor w/ 72 levels A1_1s,A1_1t,..: 1 2 3 4 5 6 7 8 9 10 ... $ Lufttemperatur : num 17.3 17.3 17.3 17.3 17.3 ... $ NSmm : num 0 0 0 0 0 0 0 0 0 0 ... $ Luftfeuchte: num 79.2 79.2 79.2 79.2 79.2 ... $ Windstärke : num 0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96 1.27 1.27 ... $ Bodentemperatur: num 18.1 18.1 18.1 18.1 18.1 ... $ Earthwormsm.2 : num 0 0 0 20 24 60 12 20 0 4 ... $ Biomassgm.2: num 0 0 0 24.5 17.8 ... $ meanBiomassg : num 0 0 0 1.227 0.743 ... $ Species: num 0 0 0 1 2 2 0 0 0 1 ... $ Juvenilem.2: num 0 0 0 4 4 1 2 3 0 0 ... $ JuvenileRate : num NA NA NA 80 80 ... $ AdultsRate : num NA NA NA 20 20 ... $ Pflanzenfrischmasse: num 139 803 189 739 261 ... $ Pflanzentrockenmasseanteil : num 27.8 29.3 28.7 40 27.7 ... $ NAnteilGesamt : num 2.38 3.35 2.47 2.21 2.55 4.45 2.33 2.23 2.3 1.68 ... $ CAnteilGesamt : num 43.9 45 45 43.6 43.8 ... $ aBodenfeuchte : num 24.4 21.8 14.7 18.2 23.9 ... $ bBodenfeuchte : num 17.3 15.6 NA NA 19.1 ... $ cBodenfeuchte : num NA NA NA NA NA ... $ Gfrischmasse : num 84.6 739.3 130.8 566.8 94 ... $ Gtrockenmasse : num 25.6 215.4 41.7 239.4 30.5 ... $ Gtrockenmasseanteil: num 30.2 29.1 31.9 42.2 32.5 ... $ GtrockenmasseanteilanGesamt: num 78.8 98.3 86 96 76.5 ... $ GN : num 2.1 3.35 2.31 2.17 2.28 ... $ GC : num 43.8 45 44.9 43.6 43.5 ... $ GCN: num 20.9 13.4 19.4 20.1 19.1 ... $ Lfrischmasse : num 12.1 0 0 0 19 ... $ Ltrockenmasse : num 2.56 0 0 0 3.92 1.84 0 0 2.2 0 ... $ Ltrockenmasseanteil: num 21.2 0 0 0 20.7 ... $ LtrockenmasseanteilanGesamt: num 7.89 0 0 0 9.83 1.06 0 0 3.73 0 ... $ LN : num 3.96 NA NA NA 4.09 ... $ LC : num 43.1 NA NA NA 45.6 ... $ LCN: num 10.9 NA NA NA 11.1 ... $ Kfrischmasse : num 20.16 8.96 38.48 56.24 31.2 ... $ Ktrockenmasse : num 4.32 3.64 6.8 10 5.44 ... $ Ktrockenmasseanteil: num 21.4 40.6 17.7 17.8 17.4 ... $ KTrockenmasseanteilanGesamt: num 13.32 1.66 14.01 4.01 13.64 ... $ KN : num 3.09 3.11 3.43 3.37 2.95 ... $ KC : num 45 48.3 46 44 44.3 ... $ KCN: num 14.6 15.5 13.4 13.1 15 ... $ I : num 2.12 1.4 0.92 0.49 1.03 0.53 1.36 0.51 1.1 2.31 ... $ II : num 2.61 3.97 2.82 NA 1.94 1.98 2.84 1.66 2.38 1.99 ... $ III: num 2.24 3.51 3.17 NA 2.95 3.12 2.16 3.65 3.14 2.83 ... $ IV : num 2.52 2.91 3.11 NA 1.97 4.39 3.22 4.61 2.26 2.77 ... $ V : num 2.38 2.91 2.61 NA 3.34 3.32 3.21 NA 2.44 2.26 ... $ VI : num 2.01 2.61 2.61 NA 3.61 3.57 NA NA 3.06 2.18 ... $ VII: num NA 3.73 2.61 NA 3.25 0.27 NA NA NA 3.04 ... $ Eindringtiefe : int 64 80 80 8 80 61 55 32 73 80 ... model fo the first parameter I want to check: PModell1 -lme(sqrt(Earthwormsm.2)~ Treatment+Pflanzenfrischmasse+aBodenfeuchte+bBodenfeuchte+Gfrischmasse+Ltrockenmasseanteil+KCN+I+Eindringtiefe, random=~1|Block/Treatment/Cluster/Patch, data=Test1,na.action = na.omit) Everytime I try stepAIC with different settings: Modell2- stepAIC(PModell1, method=lme) Fehler in extractAIC.lme(fit, scale, k = k, ...) : AIC für REML-Näherung undefiniert
Re: [R] percentile of a given value: is there a reverse quantile function?
?ecdf Best, Stephan On 03.03.2012 13:37, drflxms wrote: Dear all, I am familiar with obtaining the value corresponding to a chosen probability via the quantile function. Now I am facing the opposite problem I have a value an want to know it's corresponding percentile in the distribution. So is there a function for this as well? Thank you for your support in advance, Felix __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot many x and y
Hi Alex, could you be a little more specific as to what exactly you mean by plotting many x's and y's with one legend per plot? Please note what appears at the bottom of every R-help mail: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Following this piece of advice usually increases your chances for a helpful answer. Best, Stephan Am 06.06.2011 21:46, schrieb Alaios: Dear all could you please plot many x's and y's with one legend per plot? I would like to thank you in advance for your help Best Regards Alex. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force the for loop to stop
Hi, you could set a dummy variable to FALSE outside the outermost loop. If the break condition is met in the inner loop, set the dummy variable to TRUE before breaking and test its truth status in the outer loop. HTH Stephan Am 01.06.2011 21:25, schrieb Salih Tuna: Hi, I am looking for a command in R that would force the for loop to stop after it finds what it is looking for. As an example for(i in 1:5){ for(j in 3:6){ if(i==j) # do something... break; } } And i don't want the loop to execute once i = 3 and stop. Is there a way to do this? best, salih [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force the for loop to stop
Hi Salih, here you go: dummy - FALSE for ( ii in 1:5 ) { for ( jj in 3:6 ) { cat(ii=,ii,; jj=,jj,\n,sep=) if ( ii == jj ) { dummy - TRUE break } } if ( dummy ) break } ### Note that I am using ii and jj as loop indices, not i and j. This makes it a lot easier to search for the loop counter in more complex scripts - if you just search for i, most of your hits will be something else than the loop counter. HTH, Stephan Am 01.06.2011 22:06, schrieb Salih Tuna: Hi Stephan, Thanks a lot. But i am not very good at R yet so i dont know how to set the dummy variable to FALSE. Can you please help me with that as well? On Wed, Jun 1, 2011 at 8:34 PM, Stephan Kolassastephan.kola...@gmx.dewrote: Hi, you could set a dummy variable to FALSE outside the outermost loop. If the break condition is met in the inner loop, set the dummy variable to TRUE before breaking and test its truth status in the outer loop. HTH Stephan Am 01.06.2011 21:25, schrieb Salih Tuna: Hi, I am looking for a command in R that would force the for loop to stop after it finds what it is looking for. As an example for(i in 1:5){ for(j in 3:6){ if(i==j) # do something... break; } } And i don't want the loop to execute once i = 3 and stop. Is there a way to do this? best, salih [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems Dating....
Hi Nat, I guess something like as.Date(as.character(3/4/2007),format=%d/%m/%Y) should work - as.character() coerces the factors to characters, which the as.Date() function can work with, given the right format argument. HTH Stephan Am 01.06.2011 22:59, schrieb Struckmeier, Nathanael: I'm trying to convert a column in a data frame with dates from a Factor type to a Date Object but I am encountering and error. (I am having trouble plotting an x,y scatter and I suspect it's something with my data format). I have a table with two columns and 8,000 rows. dsort=read.delim(C:\\Documents and Settings\\E066582\\My Documents\\R\\R-2.13.0\\bin\\dsort.txt) dsort#name of data.frame colnames(dsort)[1] #name of column 1 [1] Date colnames(dsort)[2] #name of column 2 [1] Qty class(dsort$Date) #checked data type of column Date and it came back as a factor [1] factor Date2=as.Date(dsort$Date)#attempt at changing the data type from a factor to a date object (see error below). Error in charToDate(x) : character string is not in a standard unambiguous format Dates in my table are listed in 3/4/2007 format. StatBat2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying a function to a subset
Or just include is.na=TRUE in the definition of kurtosis(): kurtosis-function(x) { m4-sum((x-mean(x,na.rm=TRUE))^4,na.rm=TRUE)/length(x) s4-var(x,na.rm=TRUE)^2 m4/s4 - 3 } HTH Stephan Am 29.05.2011 11:34, schrieb Jim Holtman: kurtosis(fem[!is.na(fem)]) Sent from my iPad On May 29, 2011, at 4:54, gaiarridogaiarr...@usal.es wrote: Here´s my problem, i have developed the function kurtosis (using R-book as a guide) with this commands: kurtosis-function(x) { m4-sum((x-mean(x))^4)/length(x) s4-var(x)^2 m4/s4 - 3 } Then create the object fem, which is the difference between the count of a trait in the left side of the body minus the right side. It´s a numeric variable. But i´ve got no data for some individuals, so in the variable fem i got numbers and smoe NA. When i try to apply kurtosis to fem, the NA, don´t allow me to do that, that`s what i get: kurtosis(fem) [1] NA I tried with simple tricks, but i´ve got no results: kurtosis(fem,na.rm=T) Error en kurtosis(fem, na.rm = T) : el argumento(s) no fue utilizado(s) (na.rm = T) What can i do? the values of fem vary from -4 to 9. Thanks very much kurtosis(fem[fem!=NA]) [1] NA - Mario Garrido Escudero PhD student Dpto. de Biología Animal, Ecología, Parasitología, Edafología y Qca. Agrícola Universidad de Salamanca -- View this message in context: http://r.789695.n4.nabble.com/Applying-a-function-to-a-subset-tp3558567p3558567.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset
Dear all, may I suggest the acronym IOTT for the inter-ocular trauma test? Now we just need someone to implement iot.test(). I assume it will appear on CRAN within the next 24 hours. Looking forward to yet another base package, Stephan Am 25.05.2011 23:36, schrieb Greg Snow: How can anyone overlook the intra-ocular trauma test (or sometimes called the inter-ocular concussion test). But the i-o trauma test needs either a small data set or an appropriate graph of the data (or can you look at a dataset of a hundred columns and a million rows and do an intra-ocular trauma test?). We were not told the size of the dataset or enough information to know what type of graph to make. You do make a good point though that with minimal additional information the intra-ocular trauma test can be useful (well if it is significant, there are many datasets that fail the intra-ocular trauma test, but still yield interesting results after careful study). And for any dataset that has a significant intra-ocular trauma test result, that should trump the results of SnowsCorrectlySizedButOtherwiseUselessTestOfAnything. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] minimum distance between line segments
Hi, this sounds like a standard problem in Computational Geometry - I guess game developers have to deal with something like this all the time. You may want to look at a textbook or two. An article with the promising title On fast computation of distance between line segments can be found here: http://dx.doi.org/10.1016/0020-0190(85)90032-8 I found that one via Wikipedia: http://en.wikipedia.org/wiki/Proximity_problems Good hunting! Stephan Am 08.03.2011 22:55, schrieb Darcy Webber: Dear R helpers, I think that this may be a bit of a math question as the more I consider it, the harder it seems. I am trying to come up with a way to work out the minimum distance between line segments. For instance, consider 20 random line segments: x1- runif(20) y1- runif(20) x2- runif(20) y2- runif(20) plot(x1, y1, type = n) segments(x1, y1, x2, y2) Inititally I thought the solution to this problem was to work out the distance between midpoints (it quickly became apparent that this is totally wrong when looking at the plot). So, I thought that perhaps finding the minimum distance between each of the lines endpoints AND their midpoints would be a good proxy for this, so I set up a loop that uses pythagoras to work out these 9 distances and find the minimum. But, this solution is obviously flawed as well (sometimes lines actually intersect, sometimes the minimum distances are less etc). Any help/dection on this one would be much appreciated. Thanks in advance, Darcy. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anyone know a forum for stats advice?
Hi Clare, you want to go here: http://stats.stackexchange.com/questions HTH Stephan Am 04.03.2011 12:08, schrieb Clare Embling: Hi, I know this forum is for R-related issues, but the question I have is a statistical question I was wondering if anyone could recommend a good statistics forum where I can ask the question? My question is relating to bootstrapping of binary data (ecology data) - I can give more detail, but wasn't sure I could address the question here as it is more statistical based than R based (though all the analysis is done in R). Thanks in advance Clare __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing decimal numbers
Hi, this is R FAQ 7.31. http://cran.r-project.org/doc/FAQ/R-FAQ.html HTH, Stephan Am 01.02.2011 14:49, schrieb mlancee: Hi, I have a seemingly easy question that has been keeping be busy for quite a while. The problem is the following: 0.1 + 0.1 + 0.1 == 0.3 [1] FALSE Why is this false? Another example is 0.2 + 0.1 == 0.3 [1] FALSE or 0.25 + 0.05 == 0.20 + 0.10 [1] FALSE However, I do get TRUE if I use integers, or for example the following 0.1 + 0.1 == 0.2 [1] TRUE It is probably something very basic, but I did not manage to find the answer. Thanks, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Uniform Distribution
Hi Alex, help.search(uniform) HTH, Stephan Am 08.09.2010 15:36, schrieb Alaios: Hello, I would like to uniformly distribute values from 0 to 200. Can someone help me find the appropriate uniform distribution generator? I would like to thank you in advance for your help. Best Regards Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeated measurements ANOVA
Hi Alex, I'm slightly unclear as to why you would want to restructure your nice six-column data.frame (why six? One column for the data and four for the factors should make five, shouldn't it? I guess you have a subject ID in one column?) into some monstrosity which I assume you would fill with lots of indicator variables. R does all this for you, just do something like library(nlme) lme(response~factor1+factor2+factor3+factor4,random=~1|ID,data=dataset) assuming that your data.frame is called dataset with column names response, factor1, ..., factor4 and ID (and that the above is the model you want). Take a look at the help page for lme() and the Orthodont data set, which is used as an example in the lme() help page. And next time, send along a snippet of your data.frame, that would help us help you. HTH Stephan Am 07.09.2010 20:19, schrieb Walther, Alexander: Dear list, i am setting up a GLM for a repeated measurement ANOVA using the lm and ANOVA function. my design contains four factors with 5, 5, 2 and 2 (= 14) levels, respectively. the data are stored in a data.frame with six columns, one for the data themselves and the remainings for the factors where strings indicate the factor levels in each row. now i would like to restructure this data.frame using cbind which yields a 100 x 14 array. so far i only included two subjects in the analysis and the 100 rows emerge because each subject contributes 50 values. for the ANOVA however, it seems to me that i should create a multi-dimensional array where each dimension accounts for one specific factor and its levels. is it possible to do this in R? if so, does the lm or ANOVA function necessitates this type of array or is there yet another way to continue? Best Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R time series analysis
Hi, basically, you know 5 periods later. If you use a good error measure, that is. I am a big believer in AIC for model selection. I believe that arima() also gives you the AIC of a fitted model, or try AIC(arima1). Other ideas include keeping a holdout sample or some such. I'd recommend looking at a time series textbook. HTH, Stephan Am 05.09.2010 22:37, schrieb lord12: How do you evaluate the predictive models? For example if I have: arima1 = arima(training, order = c(1,1,1)) arima2 = arima(training, order = c(0,0,0)) x.fore = predict(arima1, n.ahead=5) x.fore1 = predict(arima2, n.ahead = 5) How do I know which arima model is better for prediction? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict.lm, matrix in formula and newdata
Dear all, I am stumped at what should be a painfully easy task: predicting from an lm object. A toy example would be this: XX - matrix(runif(8),ncol=2) yy - runif(4) model - lm(yy~XX) XX.pred - data.frame(matrix(runif(6),ncol=2)) colnames(XX.pred) - c(XX1,XX2) predict(model,newdata=XX.pred) I would have expected the last line to give me the predictions from the model based on the new data given in XX.pred... but all I get are in-sample fits along with a warning 'newdata' had 3 rows but variable(s) found have 4 rows. Why would predict.lm worry about the number of rows in the model matrix? Unfortunately, ?predict.lm does not seem to be helpful, and neither RSiteSearch nor rseek.org have been useful. I'm sure that I am making an elementary error somewhere (am I misunderstanding the lm(yy~XX) part?) and would appreciate a gentle nudge in the right direction. Thank you, Stephan -- GMX DSL SOMMER-SPECIAL: Surf Phone Flat 16.000 für nur 19,99 ¿/mtl.!* __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with normality req. for ANOVA
Hi, simulating would still require you to operationalize the lack of normality. Are the tails too heavy? Is the distribution skewed? Does it have multiple peaks? I suspect that the specific choices you would make here would *strongly* influence the result. My condolences on the client you are facing. Good luck, Stephan (ex-BAH) Bert Gunter wrote: You could try sensitivity analyses via simulation, though. On Mon, Aug 2, 2010 at 11:31 AM, wwreith reith_will...@bah.com wrote: The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with a problem
Hi all, zoo::rollmean() is a nice idea. But if I understand Mike correctly, he wants 5 out of any 7 consecutive logicals to be TRUE, where these 5 do not necessarily need to be consecutive themselves. (remaining open question: could, e.g., the condition on c1 be TRUE for rows 1,2,3,4,5 and on c2 for rows 3,4,5,6,7, or would it need to be TRUE for the same rows?). Then something like this would make sense: any(rollmean(dat$c1=100,7)=5/7-.01 rollmean(dat$c2=8,7)=5/7-.01)) or any(rollmean(dat$c1=100,7)=5/7-.01 dat$c2=8,7)=5/7-.01)) depending on the open question above. The -.01 above may be necessary in light of FAQ 7.31. HTH, Stephan Joshua Wiley schrieb: Hi Michael, The days in your example do not look continuous (at least from my thinking), so you may have extra requirements in mind, but take a look at this code. My general thought was first to turn each column into a logical vector (c1 = 100 and c2 = 8). Taking advantage of the fact that R treats TRUE as 1 and FALSE as 0, compute a rolling mean. If (and only if) 5 consecutive values are TRUE, the mean will be 1. Next I added the rolling means for each column, and then tested whether any were 2 (i.e., 1 + 1). Cheers, Josh ### #Load required package library(zoo) #Your data with ds converted to Date #from dput() dat - structure(list(ds = structure(c(14702, 14729, 14730, 14731, 14732, 14733, 14734, 14735, 14736, 14737, 14738, 14739, 14740, 14741, 14742, 14743, 14744), class = Date), c1 = c(100L, 11141L, 3L, 7615L, 6910L, 5035L, 3007L, 4L, 8335L, 2897L, 6377L, 3177L, 7946L, 8705L, 9030L, 8682L, 8440L), c2 = c(0L, 15L, 16L, 14L, 17L, 3L, 15L, 14L, 17L, 13L, 17L, 17L, 15L, 0L, 16L, 16L, 1L)), .Names = c(ds, c1, c2), row.names = c(NA, -17L), class = data.frame) #Order by ds dat - dat[order(dat$ds), ] yourvar - 0 #Test that 5 consecutive values from c1 AND c2 meet requirements if(any( c(rollmean(dat$c1 = 100, 5) + rollmean(dat$c2 = 8, 5)) == 2) ) {yourvar - 1} ### On Sat, Jul 17, 2010 at 2:38 PM, Michael Hess mlh...@med.umich.edu wrote: Sorry for not being clear. In the dataset there are around 100 or so days of data (in the case also rows of data) I need to make sure that the person meets that c1 is at least 100 AND c2 is at least 8 for 5 of 7 continuous days. I will play with what I have and see if I can find out how to do this. Thanks for the help! Michael Stephan Kolassa 07/17/10 4:50 PM Mike, I am slightly unclear on what you want to do. Do you want to check rows 1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all* rows you are looking at, and same for c2? You can sort your data like this: data - data[order(data$ds),] Type ?order for help. But also do this for added enlightenment...: library(fortunes) fortune(dog) Next, your analysis on the sorted data frame. As I said, I am not entirely clear on what you are looking at, but the following may solve your problem with choices 1 to 7 and any one above. foo - 0 for ( ii in 1:(nrow(data)-8) ) { if (any(data$c1[ii+seq(0,6)]=100) any(data$c2[ii+seq(0,6)]=8)) { foo - 1 break } } The variable foo should contain what you want it to. Look at ?any (and, if this does not do what you want it to, at ?all) for further info. No doubt this could be vectorized, but I think the loop is clear enough. Good luck! Stephan Michael Hess schrieb: Hello R users, I am a researcher at the University of Michigan looking for a solution to an R problem. I have loaded my data in from a mysql database and it looks like this data ds c1 c2 1 2010-04-03100 0 2 2010-04-30 11141 15 3 2010-05-01 3 16 4 2010-05-02 7615 14 5 2010-05-03 6910 17 6 2010-05-04 5035 3 7 2010-05-05 3007 15 8 2010-05-06 4 14 9 2010-05-07 8335 17 10 2010-05-08 2897 13 11 2010-05-09 6377 17 12 2010-05-10 3177 17 13 2010-05-11 7946 15 14 2010-05-12 8705 0 15 2010-05-13 9030 16 16 2010-05-14 8682 16 17 2010-05-15 8440 15 What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if not there 3,9until it loops over the entire file. If it finds a set that matches, set a new variable equal to 1, if never finds a match, set it equal to 0. I have done this in stata but on this project we are trying to use R. Is this something that can be done in R, if so, could someone point me in the correct direction. Thanks, Michael Hess University of Michigan Health System ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
Re: [R] Help with a problem
Mike, I am slightly unclear on what you want to do. Do you want to check rows 1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all* rows you are looking at, and same for c2? You can sort your data like this: data - data[order(data$ds),] Type ?order for help. But also do this for added enlightenment...: library(fortunes) fortune(dog) Next, your analysis on the sorted data frame. As I said, I am not entirely clear on what you are looking at, but the following may solve your problem with choices 1 to 7 and any one above. foo - 0 for ( ii in 1:(nrow(data)-8) ) { if (any(data$c1[ii+seq(0,6)]=100) any(data$c2[ii+seq(0,6)]=8)) { foo - 1 break } } The variable foo should contain what you want it to. Look at ?any (and, if this does not do what you want it to, at ?all) for further info. No doubt this could be vectorized, but I think the loop is clear enough. Good luck! Stephan Michael Hess schrieb: Hello R users, I am a researcher at the University of Michigan looking for a solution to an R problem. I have loaded my data in from a mysql database and it looks like this data ds c1 c2 1 2010-04-03100 0 2 2010-04-30 11141 15 3 2010-05-01 3 16 4 2010-05-02 7615 14 5 2010-05-03 6910 17 6 2010-05-04 5035 3 7 2010-05-05 3007 15 8 2010-05-06 4 14 9 2010-05-07 8335 17 10 2010-05-08 2897 13 11 2010-05-09 6377 17 12 2010-05-10 3177 17 13 2010-05-11 7946 15 14 2010-05-12 8705 0 15 2010-05-13 9030 16 16 2010-05-14 8682 16 17 2010-05-15 8440 15 What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if not there 3,9until it loops over the entire file. If it finds a set that matches, set a new variable equal to 1, if never finds a match, set it equal to 0. I have done this in stata but on this project we are trying to use R. Is this something that can be done in R, if so, could someone point me in the correct direction. Thanks, Michael Hess University of Michigan Health System ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can anybody help me understand AIC and BIC and devise a new metric?
Hi, one comment: Claeskens and Hjort define AIC as 2*log L - 2*p for a model with likelihood L and p parameters; consequently, they look for models with *maximum* AIC in model selection and averaging. This differs from the vast majority of authors (and R), who define AIC as -2*log L + 2*p and search for the model with *minimum* AIC. Their definition of BIC is similarly the negative of normal BIC. I would compare this to defining \pi as the base of the natural logarithm and e as the ratio of a circle's circumference to its diameter: of course, you can do perfectly valid mathematics with your own definitions, but it is a recipe for confusion. Anyone who only reads Claeskens and Hjort, fires up R and selects the model with the maximum AIC from the candidate models is in for some *nasty* surprises. Worse, as far as I see, Claeskens and Hjort nowhere mention that they are using a definition that is diametrically opposed to what is (overwhelmingly) common, and they do not comment on this. However, Claeskens and Hjort managed to publish a book, which I have yet to do, so it is quite possible that there is a major flaw in my thinking. If so, I haven't found it yet, and I would be very grateful if somebody pointed out what I misunderstand. Otherwise, I would be *very* careful indeed about basing my analysis strategy on their book, although the rest of the content is very helpful indeed - you only need to remember where to switch signs and change maximize to minimize etc. For AIC and BIC novices, I would recommend going with Burnham Anderson, which Kjetil cited below. Best, Stephan Kjetil Halvorsen schrieb: You should have a look at: Model Selection and Model Averaging Gerda Claeskens K.U. Leuven Nils Lid Hjort University of Oslo Among other this will explain that AIC and BIC really aims at different goals. On Mon, Jul 5, 2010 at 4:20 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: On Mon, Jul 5, 2010 at 7:35 AM, LosemindL comtech@gmail.com wrote: Hi all, Could anybody please help me understand AIC and BIC and especially why do they make sense? Any good text that discusses model selection in detail will have some discussion of AIC and BIC. Frank Harrell's book 'Regression Modeling Strategies' comes immediately to mind, along with Hastie, Tibshirani and Friedman (Elements of Statistical Learning) and Burnham and Anderson's book (Model Selection and Multi-Model Inference), but there are many other worthy texts that cover the topic. The gist is that AIC and BIC penalize the log likelihood of a model by subtracting different functions of its number of parameters. David's suggestion of Wikipedia is also on target. Furthermore, I am trying to devise a new metric related to the model selection in the financial asset management industry. As you know the industry uses Sharpe Ratio as the main performance benchmark, which is the annualized mean of returns divided by the annualized standard deviation of returns. I didn't know, but thank you for the information. Isn't this simply a signal-to-noise ratio quantified on an annual basis? In model selection, we would like to choose a model that yields the highest Sharpe Ratio. However, the more parameters you use, the higher Sharpe Ratio you might potentially get, and the higher risk that your model is overfitted. I am trying to think of a AIC or BIC version of the Sharpe Ratio that facilitates the model selection... You might be able to make some progress if you can express the (penalized) log likelihood as a function of the Sharpe ratio. But if you have several years of data in your model and the ratio is computed annually, then isn't it a random variable rather than a parameter? If so, it changes the nature of the problem, no? (Being unfamiliar with the Sharpe ratio, I fully recognize that I may be completely off-base in this suggestion, but I'll put it out there anyway :) BTW, you might find the R-sig-finance list to be a more productive resource in this problem than R-help due to the specialized nature of the question. HTH, Dennis Anybody could you please give me some pointers? Thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/Can-anybody-help-me-understand-AIC-and-BIC-and-devise-a-new-metric-tp2278448p2278448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __
Re: [R] how to define a function in R
Hi, I recommend that you look at the following help pages and experiment a little (maybe create a toy directory with only three or four files with a few lines each): ?files ?dir ?grep ?strsplit Good luck! Stephan jd6688 schrieb: Here are what i am going to accomplish: I have 400 files named as xxx.txt. the content of the file looks like the following: namecount 1. aaa 100 2. bbb2000 3. ccc300 4. ddd 3000 more that 1000 rows in each files. these are the areas i need help: 1. how can i only read in the files with the string patterns ggg or fff as part of the file names? for instance, I only need the file names with the ggg or fff in it x_ggg_y_1.txt _fff__xxx.txt i don't need to read in the files, such as _aaa_.txt 2.how cam rename the files: for instance: x_ggg_y_1.txt==changed to ggg1a.txt 3.after the files read in, how can i only keep the rows with the aaa and bbb, everything elses show be removed from the files, but the files still remain the same file name? for instance, in the x_ggg_y_1.txt file, it shouls looks like: namecount 1. aaa100 2. bbb2000 3. aaa300 4. bbb400 Thanks so lot, I am very new to R, I am looking forward to any helps from you. On Tue, Jul 6, 2010 at 7:19 PM, Nordlund, Dan (DSHS/RDA) [via R] ml-node+2280308-38709657-312...@n4.nabble.comml-node%2b2280308-38709657-312...@n4.nabble.com wrote: -Original Message- From: [hidden email]http://user/SendEmail.jtp?type=nodenode=2280308i=0[mailto: r-help-boun...@r- project.org] On Behalf Of jd6688 Sent: Tuesday, July 06, 2010 3:49 PM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=2280308i=1 Subject: [R] how to define a function in R 1. how to write a R script? 2.How to write a SAS like macro/generic process to process multiple files by using the same funstion in R? Thanks in advance Don't thank me too soon. :-) Your question is equivalent to me going to SAS-L and asking someone to teach me SAS macro language. You need to provide more information about what your task actually is. Where are these file names that you want to process coming from? What do you want to do with them? The basic approach would be to put the file names in a list and then pass the list items one at a time to your function. But how to do that really depends on what you are trying to do. You might also read the posting guide listed at the bottom of every posting. Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ [hidden email] http://user/SendEmail.jtp?type=nodenode=2280308i=2mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://r.789695.n4.nabble.com/how-to-define-a-function-in-R-tp2280290p2280308.html To unsubscribe from how to define a function in R, click here (link removed) =. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] process of stepwise selection
Hi Elaine, in general, stepwise selection is a very bad idea: Whittingham, M. J.; Stephens, P. A.; Bradbury, R. B. Freckleton, R. P. Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 2006, 75, 1182-1189 HTH Stephan elaine kuo schrieb: Dear list, I wanna select the significant variables relative to bird distribution, using stepwise method. However, the result is always the best-fit model. Please kindly suggest if it is possible to show the selection process. Thank you Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Lower Envelope of Points on a Plot?
Hi, one possibility would be to calculate the convex hull using chull(). I believe that the hull points are returned by chull() in a clockwise order (?), so the points between the rightmost and the leftmost point in the chull() result are the lower half of the convex hull. Remove these points from the original dataset (a variant of peeling convex hulls) and iterate until you have removed your prespecified percentage of points - these all lie outside of the final lower hull (though the percentage will, of course, only be approximated, but you should be able to modify this to taste). HTH Stephan David Winsemius schrieb: On Jun 30, 2010, at 2:05 PM, Asha Sharma wrote: Hi, I am looking for a way to find the lower envelope of points on a plot, preferably specifying what percentage of points should be allowed to lie outside the envelope. There must be a straightforward way to do this, but I do not seem to be able to find it. I would greatly appreciate any help. You probably want something like the lower half of the convex hull. You should find quite a bit of code with your favorite r search engine on the topic of convex hull. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Lower Envelope of Points on a Plot?
Hi, To be honest, what you mean by a (linear) lower envelope becomes less and less clear to me. Take any cloud of 2-d points. The convex hull consists of a number of these points - or alternatively, of a number of line segments between successive ones of these points. Thus, every one of these segments defined by a pair of points on the bottom half of the convex hull would, if prolonged to infinity, define a line with all your points in the half-space above the line, which sounds a lot like what you seem to look for - in which case, the lower envelope would not be well-defined. Or does your cloud of points have two points A and B such that all points are to the right of A and to the left of B and above the line AB? In this case, you could call the segment AB the lower envelope in your sense, but finding such A and B (if they exist) is trivial: look for the leftmost point, if there are multiple ones of these, take the bottom one, that's A, and the same for B. Life would be easier for us if you provided a small reproducible example and told us what you expect the result to be. HTH Stephan Asha Sharma schrieb: Hi, Thanks to both of you for taking the time to answer my question. I was maybe not very clear in the way I framed my question. By plot, I meant an x-y plot with a cloud of points which should have a linear lower envelope. Is there a way to both plot as well as get the parameters of the lower envelope (intercept, slope, etc.) and to also set the percentage of points outside it? Thanks! Asha Stephan Kolassa wrote: Hi, one possibility would be to calculate the convex hull using chull(). I believe that the hull points are returned by chull() in a clockwise order (?), so the points between the rightmost and the leftmost point in the chull() result are the lower half of the convex hull. Remove these points from the original dataset (a variant of peeling convex hulls) and iterate until you have removed your prespecified percentage of points - these all lie outside of the final lower hull (though the percentage will, of course, only be approximated, but you should be able to modify this to taste). HTH Stephan David Winsemius schrieb: On Jun 30, 2010, at 2:05 PM, Asha Sharma wrote: Hi, I am looking for a way to find the lower envelope of points on a plot, preferably specifying what percentage of points should be allowed to lie outside the envelope. There must be a straightforward way to do this, but I do not seem to be able to find it. I would greatly appreciate any help. You probably want something like the lower half of the convex hull. You should find quite a bit of code with your favorite r search engine on the topic of convex hull. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is wrong with this code?
Hi, your problem is called string matching. Search for that term on rseek.org, there are a couple of functions and packages. And Wikipedia can tell you everything you ever wanted to know about string matching (and more). HTH, Stephan Cable, Samuel B Civ USAF AFMC AFRL/RVBXI schrieb: Anyway, I still wouldn't mind some advice on character matching. Thanks. If so we need a reproducible example of what you are doing. OK, let's say I have three strings. Str1=abc. Str2=abcd. Str3=efgh. I want to compare Str1 and Str2 in such a way that R detects that Str2 indeed contains Str1. I want something that returns a TRUE value. Then I want to compare Str1 and Str3 so that R detects that Str3 does not contain Str1. I don't want it to give me an error because it can't find Str1. I just want to get a FALSE value. Is there a way to do this that's less messy than what I came up with? Thanks again. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exponential Smoothing: Forecast package
Hi Phani, something like this looks promising: # library(forecast) library(Mcomp) MAPE.for.Holt - function (x,series,bignum=10e6) { foo - try(ets(series$x,model=AAN,damped=FALSE,alpha=x[1],beta=x[2],restrict=FALSE),silent=TRUE) if ( class(foo) == try-error ) { bignum } else { mean(abs(fitted(foo)-series$x)/series$x,na.rm=TRUE) } } bar - optim(par=c(.1,.1),fn=MAPE.for.Holt,series=M3[[1]]) # At least it converges. However, I have had problems with parameters leaving the allowed space (that's what the try() and the bignum is for), and even with convergence, some unrealistically big smoothing constants resulted, which in turn were not very stable for varying starting parameters... HTH, Stephan phani kishan schrieb: Hey, Thanks for the tip Stephan. But you could tell me how to pass the series to the function calling ets? Initially I planned to do it this way: wrapper-function(x) { alpha-x[1] beta-x[2] ph-x[3] series-x[4] foofit-ets(series,model=AZZ,alpha=alpha,beta=beta,phi=phi,additive.only=T,opt.crit=c(mse)) accuracy(foofit)[5] ##for MAPE } I then planned to use the optim using optim(c(alpha,beta,phi,series),wrapper) What I hoped to do is also select MAPE as a criteria for selection of my alpha and beta. However I shouldn't pass my series in this form right? As it would be optimized too in the process? Could you suggest a way around this. And I did find a way around could this allow me to set MAPE as a criteria? Phani On Tue, Jun 29, 2010 at 12:47 AM, Stephan Kolassa stephan.kola...@gmx.dewrote: Hi Phani, to get the best Holt's model, I would simply wrap a suitable function calling ets() within optim() and optimize for alpha and beta - the values given by ets() without constraints would probably be good starting values, but you had better start the optimization with a variety of starting values to make sure you don't end up in a local minimum. I know of no comparison just between Holt and Brown - but you could use the above methods and the M3 Competition dataset (in Mcomp) to look how the two methods compare on a (more or less) benchmark dataset. HTH Stephan phani kishan schrieb: Hey, I am using the ets() function in the forecast package to find out the best fit parameters for my time-series. I have about 50 sets of time series data. I'm currently using the function as follows: ets(x,model=AZZ,opt.crit=mse) As to my observation about 5-10 of them have been identified by ets to have a trend and an alpha, beta values have been thrown up - which have been same in all these cases. When I read up online it came up as a Brown's double exponential smoothing as opposed to Holt's exponential smoothing (where alpha and beta differ). I am guessing this is happening as AIC/AICc/BIC select a model based on accuracy as well as a weight on number of parameters (1 in case of brown's, 2 in case of holt's). Now if I want to see results of the best parameters from the Holt's method, how should I go about it? And is there any study comparing the accuracy of brown's double exponential model versus holt's exponential model? Thanks in advance, Phani __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exponential Smoothing: Forecast package
Hi Phani, to get the best Holt's model, I would simply wrap a suitable function calling ets() within optim() and optimize for alpha and beta - the values given by ets() without constraints would probably be good starting values, but you had better start the optimization with a variety of starting values to make sure you don't end up in a local minimum. I know of no comparison just between Holt and Brown - but you could use the above methods and the M3 Competition dataset (in Mcomp) to look how the two methods compare on a (more or less) benchmark dataset. HTH Stephan phani kishan schrieb: Hey, I am using the ets() function in the forecast package to find out the best fit parameters for my time-series. I have about 50 sets of time series data. I'm currently using the function as follows: ets(x,model=AZZ,opt.crit=mse) As to my observation about 5-10 of them have been identified by ets to have a trend and an alpha, beta values have been thrown up - which have been same in all these cases. When I read up online it came up as a Brown's double exponential smoothing as opposed to Holt's exponential smoothing (where alpha and beta differ). I am guessing this is happening as AIC/AICc/BIC select a model based on accuracy as well as a weight on number of parameters (1 in case of brown's, 2 in case of holt's). Now if I want to see results of the best parameters from the Holt's method, how should I go about it? And is there any study comparing the accuracy of brown's double exponential model versus holt's exponential model? Thanks in advance, Phani __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading file
Have you set the correct working directory? ?setwd ?getwd HTH Stephan Robert Tsutakawa schrieb: I am trying to read a source program into a mac pro laptop, which uses Snow Leopard. R is unable to find the file containing my source program. I'm using the function source( file name). I need some examples or detailed instructions. I have no problem reading the file using PC. Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create new variable: percentile value of variable in data frame
Hi Jon, does the empirical cumulative distribution function do what you want? dat$q.score - ecdf(dat$score)(dat$score) ?ecdf HTH Stephan Jonathan Beard schrieb: Hello all, Thanks in advance for you attention. I would like to generate a third value that represents the quantile value of a variable in a data frame. # generating data x - as.matrix(seq(1:30)) y - as.matrix(rnorm(30, 20, 7)) tmp1 - cbind(x,y) dat - as.data.frame(tmp1) colnames(dat) - c(id, score) dat # finding percentiles of score qs - as.matrix(quantile(dat$score, type=3, probs = seq(0,1,.1))) colnames(qs) - c( score) qs # is there a way to put the quantile value for a value of 'score' into a new variable, # such that the new data frame would have three variables: id, score and q.score? ## running R version 2.8.1 (2008-12-22) on Vista Thanks so much! -Jon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] shapiro.test
Hi, David Winsemius schrieb: snip This would imply that ozon is a list or dataframe. snip And you tried to give the whole list to a function that only wants a vector. And whenever you suspect that your data types clash, try str() to find out just what kind of thing your data is. Here: str(ozon) HTH, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding column to data frame conditionally
Hi Kristina, Thierry's solution is certainly the correct one in terms of keeping within R's philosophy... but I personally find a series of conditional assignments easier to understand - see below for an example. HTH, Stephan # # Example freqg - data.frame(mat=c(1,1,2,2),flank=c(1,2,1,2)) freqg$condition[freqg$mat==1 freqg$flank==1] - 1 freqg$condition[freqg$mat==2 freqg$flank==1] - 2 freqg mat flank condition 1 1 1 1 2 1 2NA 3 2 1 2 4 2 2NA # ONKELINX, Thierry schrieb: Dear Kristina, Use ifelse(). Note that you must use '==' to test for equality. '=' is an assignment. Freqg$condition - with(freqg, ifelse(mat==1 flank==1, 1, ifelse(mat == 2 flank == 1, 2, NA))) HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Kristina Schmitz Verzonden: donderdag 27 mei 2010 11:33 Aan: r-help@r-project.org Onderwerp: [R] adding column to data frame conditionally Dear all and thanks in advance for helping me with a rather stupid question: I imported a data set (freqg) into R consisting of 14 variables. Now a want to compute a variable and add it in an additional column to my data frame. The value of this new variable (condition) depends on the values of two other variables (mat and flank) already included in the data frame. For example: if mat=1 and flank=1 - condition=1 if mat=2 and flank=1 - condition=2 ... What I got is this code, which doesn't really work (it results in a new column called condition which takes only the value TRUE): freqg-transform(freqg,condition=(mat=1)(flank=1)) Thanks in advance and kind regards! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.diag
Hi Helga, did you load the boot library, which contains glm.diag(), by calling library(boot)? HTH Stephan Helga margrete holmestad schrieb: I have made a poisson regression by using glm. Then I want to check if I have over-dispersion in my model. I tried to use glm.diag(fit.1), but then R told me it couldn't find the function glm.diag. Does anybody know why? Or is there an other methode to check if there is over-dispersion in my model _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From date to week day
Or: weekdays(as.Date(2010-05-24)) HTH, Stephan Wu Gong schrieb: ?strptime will helps. d - as.Date(01/05/2007,%m/%d/%Y) format(d, %A, %b %d, %Y) [1] Friday, Jan 05, 2007 - A R learner. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function rep
Hi, this is FAQ 7.31: pb and pr are floating-point numbers that are coerced to integer for rep(), and this does not always work the way you want. HTH Stephan Covelli Paolo schrieb: Hi, I've got the following code: p - 0.34 pb - p*100 pr - (1-p)*100 A - rep(0,pb) # a vector with 34 zeros B - rep(1,pr) # a vector with 66 ones Now if I type length(A), R answer correctly 34 but if I type length(B), R answer 65 instead of 66. I don't understand why it happens. Can anyone help me? Thanks in advance. Paolo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help in histogram
Hi, first get the densities without plotting the histogram: foo - hist(x, plot=FALSE) then plot the histogram and feed the rounded densities, converted to character, to the labels argument (instead of just labels=TRUE): hist(x, freq=F, xlab='',ylab=Percent of Total, col=skyblue, labels=as.character(round(foo$density,2)), right=FALSE,main=Position of Hypothetical Protein) HTH, Stephan Changbin Du schrieb: x- sample(1:14, 319, rep=T) hist(x, freq=F, xlab='',ylab=Percent of Total, col=skyblue, labels=TRUE, right=FALSE,main=Position of Hypothetical Protein) Is there is way to round the labels to 2 decimal digits, for example, 0.088 is changed to 0.09. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interpretation of p values for highly correlated logistic analysis
Hi Claus, welcome to the wonderful world of collinearity (or multicollinearity, as some call it)! You have a near linear relationship between some of your predictors, which can (and in your case does) lead to extreme parameter estimates, which in some cases almost cancel out (a coefficient of +/-40 on a categorical variable in logistic regression is a lot, and the intercept and two of the roman parameter estimates almost cancel out) but which are rather unstable (hence your high p-values). Belsley, Kuh and Welsch did some work on condition indices and variance decomposition proportions, and variance inflation factors are quite popular for diagnosing multicollinearity - google these terms for a bit, and enlightenment will surely follow. What can you do? You should definitely think long and hard about your data. Should you be doing separate regressions for some factor levels? Should you drop a factor from the analysis? Should you do a categorical analogue of Principal Components Analysis on your data before the regression? I personally have never done this, but correspondence analysis has been recommended as a discrete alternative to PCA on this list, see a couple of books by M. J. Greenacre. Best of luck! Stephan claus orourke schrieb: Dear list, I want to perform a logistic regression analysis with multiple categorical predictors (i.e., a logit) on some data where there is a very definite relationship between one predicator and the response/independent variable. The problem I have is that in such a case the p value goes very high (while I as a naive newbie would expect it to crash towards 0). I'll illustrate my problem with some toy data. Say I have the following data as an input frame: roman animal colour 1 alphadog black 2 betacat white 3 alphadog black 4 alphacat black 5 betadog white 6 alphacat black 7 gammadog white 8 alphacat black 9 gammadog white 10 betacat white 11 alphadog black 12 alphacat black 13 gammadog white 14 alphacat black 15 betadog white 16 betacat black 17 alphacat black 18 betadog white In this toy data you can see that roman:alpha and roman:beta are pretty good predictors of colour Let's say I perform logistic analysis directly on the raw data with colour as a response variable: options(contrasts=c(contr.treatment,contr.poly)) anal1 - glm(data$colour~data$roman+data$animal,family=binomial) then I find that my P values for each individual level coefficient approach 1: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -41.65 19609.49 -0.0020.998 data$romanbeta 42.35 19609.49 0.0020.998 data$romangamma43.74 31089.48 0.0010.999 data$animaldog 20.48 13866.00 0.0010.999 while I expect the p value for roman:beta to be quite low because it is a good predictor of colour:white On the other hand, if I then run an anova with a Chi-sq test on the result model, I find as I would expect that 'roman' is a good predictor of colour. anova(anal1,test=Chisq) Analysis of Deviance Table Model: binomial, link: logit Response: data$colour Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL 1724.7306 data$roman 2 19.323915 5.4067 6.366e-05 *** data$animal 1 1.587614 3.81910.2077 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Can anyone please explain why my p value is so high for the individual levels? Sorry for what is likely a stupid question. Claus p.s., when I run logistic analysis on data that is more 'randomised' everything comes out as I expect. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reshaping data
Hi Kim, look at the reshape() command with direction=wide. Or at the reshape package. HTH, Stephan Kim Jung Hwa schrieb: Hi All, Can someone help me reshape following data: Var1 Var2 Val A X 1 A Y 2 A Z 3 B X 4 B Y 5 B Z 6 to some kind of matrix/tabular format (preferably as a matrix), may be like Var1 X Y Z A 1 2 3 B 4 5 6 Any help would be greatly appreciated, Kim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ODD and EVEN numbers
(foo %% 2) == 0 See ?%% HTH Stephan tj schrieb: Hi, anyone here who knows how to determine if an integer is odd or even in R? Thanks. tj __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summing data based on certain conditions
?by may also be helpful. Stephan Steve Murray schrieb: Dear all, I have a dataset of 1073 rows, the first 15 which look as follows: data[1:15,] date year month day rammday thmmday 1 3/8/1988 1988 3 81.430.94 2 3/15/1988 1988 3 152.860.66 3 3/22/1988 1988 3 225.063.43 4 3/29/1988 1988 3 29 18.76 10.93 5 4/5/1988 1988 4 54.492.70 6 4/12/1988 1988 4 128.574.59 7 4/16/1988 1988 4 16 31.18 22.18 8 4/19/1988 1988 4 19 19.67 12.33 9 4/26/1988 1988 4 263.141.79 10 5/3/1988 1988 5 3 11.516.33 11 5/10/1988 1988 5 105.642.89 12 5/17/1988 1988 5 17 37.46 20.89 13 5/24/1988 1988 5 249.869.81 14 5/31/1988 1988 5 31 13.008.63 15 6/7/1988 1988 6 70.430.00 I am looking for a way by which I can create monthly totals of rammday (rainfall in mm/day; column 5) by doing the following: For each case where the month value and the year are the same (e.g. 3 and 1988, in the first four rows), find the mean of the the corresponding rammday values and then times by the number of days in that month (i.e. 31 in this case). Note however that the number of month values in each case isn't always the same (e.g. in this subset of data, there are 4 values for month 3, 5 for month 4 and 5 for month 5). Also the months will of course recycle for the following years, so it's not simply a case of finding a monthly total for *all* the 3s in the whole dataset, just those associated with each year in turn. How would I go about doing this in R? Any help will be gratefully received. Many thanks, Steve _ We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding common an unique elements in character vectors
Hi Thomas, %in% does the trick: vector_1 - c(Belgium, Spain, Greece, Ireland, Luxembourg, Netherlands,Portugal) vector_2 - c(Denmark, Luxembourg) vector_1[!(vector_1 %in% vector_2)] HTH, Stephan Thomas Jensen schrieb: Dear R-list, I have a problem which I think is quite basic, but so far google has not helped me. I have two vectors like this: vector_1 - c(Belgium, Spain, Greece, Ireland, Luxembourg, Netherlands, Portugal) vector_2 - c(Denmark, Luxembourg) I would like to find the elements in vector_1 that are not in vector_2 so that i get a vector with these countries: Belgium, Spain, Greece, Ireland, Netherlands, Portugal. Thanks a lot, Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating dataframe of all possible variable combinations
Hi Mike, the following works for me: SITE - ordered(c(101,102,103,104)) WDAY - ordered(c(MON,TUE,WED,THR,FRI),levels=c(MON,TUE,WED,THR,FRI)) TOD - ordered(c(MORN,AFTN),levels=c(MORN,AFTN)) foo - expand.grid(SITE=SITE,WDAY=WDAY,TOD=TOD) foo[order(foo$SITE),] If this doesn't solve your problem, perhaps you could give us a minimal code snippet that demonstrates what exactly you are doing without getting what you are looking for? Bye, Stephan Hosack, Michael schrieb: Hello, I need to create a dataframe containing all possible combinations of three variables: SITE (101,102,103,104), WDAY (MON,TUE,WED,THR,FRI), and TOD (MORN, AFTN). There should be a total of 40 unique combinations in my dataframe. I used expand.grid() successfully(?) to create my dataframe, but then when I went to order it by SITE, the resultant dataframe only contained four rows, one for each site. There must be something about this function that I don't understand. Any advice would be appreciated. Thank you, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Arima forecasting
Hi Matteo, just use forecast.Arima() with h=2 to get forecasts up to 2 steps ahead. R will automatically use forecast.Arima() if you call forecast() with an Arima object. library(forecast) model - auto.arima(AirPassengers) forecast(model,h=2) HTH, Stephan Matteo Bertini schrieb: Hello everyone, I'm doing some benchmark comparing Arima [1] and SVR on time series data. I'm using an out-of-sample one-step-ahead prediction from Arima using the fitted method [2]. Do someone know how to have a two-steps-ahead forecast timeseries from Arima? Thanks, Matteo Bertini [1] http://robjhyndman.com/software/forecast [2] AirPassengers example on page 5 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for() loop
Hi Martin, it is slightly unclear to me what you are trying to achieve... are you trying to tabulate how often each value appears in datjan[,4]? Then table(datjan[,4]) may be what you want. HTH Stephan Schmidt Martin schrieb: Hello I'm working with R since a few month and have still many trivial questions - I guess! Here is what I want: I have this matrix: dim(datjan) [1] 899 4 The first 10 rows looks as follows: datjan[1:10,] V1 V2 V3 V4 1 1961 1 1 24 2 1961 1 2 24 3 1961 1 3 24 4 1961 1 4 24 5 1961 1 5 24 6 1961 1 6 27 7 1961 1 7 27 8 1961 1 8 27 9 1961 1 9 27 10 1961 1 10 27 I tried now to create a for() loop, which gives me the sum of the 30 different classes (1:30!) in [,4]. for(i in 1:30){ sum(datjan[,4]==i) } R is then actually calculating the sum of i which certainly doesn't exist and results in a 0 value t1-sum(datjan[,4]==1) t2-sum(datjan[,4]==2) .until '30' This way its working, but I won't find a end by doing all this by hand, because there are many other matrix waiting. So, how can I make work that loop?? thanks for helping me __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is this a bug (or a feature) in hist(x)$density ??
Hi Tal, basically, by summing over the (pointwise) density, you are approximating the integral over the density (which should be around 1) - but to really do a rectangular approximation, you will of course need to multiply each function value by the width of the corresponding rectangle. I'd recommend reading through Wikipedia's article on the Riemann integral, you may be enlightened. http://en.wikipedia.org/wiki/Riemann_integral HTH, Stephan Tal Galili schrieb: Hi all, A friend send me a question on why does this: x-rpois(100,1) sum( hist(x)$density ) Gives out 2 I tried this: sum( hist(x, freq =T)$density ) It didn't help. Then he came back with the following insight: # with breaks b-c(0,0.9,1:8) sum(hist(x,breaks=b)$density) # Much more then 2 # but if we add weights according to the interval length sum(hist(x,breaks=b)$density * diff(b)) # it works What do you think ? Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making multiple columns to a single column
Hi Hyo, how about as.vector(ttx1) or as.vector(t(ttx1))? HTH Stephan Hyo Lee schrieb: Hi guys, I have a very simple question. I'm trying to make multiple columns to a single column. For example, *ttx1* is a 46*72 matrix. so, I tried this. *d1=ttx1[,1] d2=ttx1[,2] ... d72=ttx1[,72]* now, d1, d2, ...,d72 become a 46*1 matrix. And then.. I tried.. *dd=rbind(d1, d2, ..., d72)* I thought *dd* should be 3312*1 matrix; but it becomes 72*46. I really wanted to make it a single column (3312*1). Do you know what is wrong in this code? Or, do you have a better idea in making multiple columns to a single column? Thank you so much. -Hyo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly sampling subsets of dataframe variable
Hi Mike, take an index vector that selects Monday and Tuesday out of each week, and then run a restricted random permutation on this vector which only permutes indices within each week. rperm() is in the sna package. library(sna) foo - rep(c(TRUE,TRUE,FALSE,FALSE,FALSE),26) your.data[foo[rperm(rep(seq(1,26),each=5))],] HTH, Stephan Hosack, Michael schrieb: Fellow R users, I am stumped on what would seem to be something fairly simple. I have a dataframe that has a variable named 'WEEK' that takes the numbers 1:26 (26 week time-period) with each number repeated five times consecutively (once for each weekday, Monday through Friday). Ex. 123.2626262626. I would like to randomly extract two weekdays per five day week for each of 26 weeks and store this data as a separate dataframe. I have been unable to get the sample function to work properly. I have also tried using the runif function to assign random numbers to each row of my dataframe, sort the dataframe first by week number then by random number value, and finally select the first two elements from each week subset (26 weeks total, giving 52 randomly selected values). I can't figure out how to select the first two elements. My goal is to randomly select two weekdays per week (without replacement) for each of 26 consecutive weeks. Any advice would be greatly appreciated. Thank you, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Shade area under curve
Hi, use dnorm() for the density and polygon() to shade the area underneath, with suitably many x values so your density looks smooth. HTH, Stephan claytonmccandless schrieb: I want to shade the area under the curve of the standard normal density. Specifically color to the left of -2 and on. How might i go about doing this? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame question
Hi, data.frame(x=x,y=as.numeric(x%in%y)) HTH, Stephan joseph schrieb: hello can you show me how to create a data.frame from two factors x and y. column 1 should be equal to x and column 2 is 1 if it is common to y and 0 if it is not. x=factor(c(A,B,C,D,E,F,G)) y=factor(c(B,C,G)) the output should look like this: A0 B1 C1 D0 E0 F0 G1 Thanks Joseph Dhahbi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equation for model generated by auto.arima
Hi, In that case, I'd recommend reading a good book on time series analysis. Forecasting: Methods and Applications by Makridakis, Wheelwright and Hyndman is very accessible. Alternatively, there are probably tons of webpages on ARIMA, so google around. Best, Stephan testuser schrieb: Thanks for the reply, Stephan. I don't want to use R to predict the future value. I am looking to write the logic in a programming language like Java to predict future values using the model coefficients generated by R. For this, I would like to know what formula to use to estimate the value at any time t. I looked at the forecast package but cannot find how to calculate the value. Any help is appreciated. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Equation for model generated by auto.arima
Hi, the help page for arima() suggests looking at predict.Arima(), so take a look at ?predict.Arima(). You will probably not use the coefficients, but just feed it the output from arima(). And take a look at auto.arima() in the forecast package. HTH Stephan testuser schrieb: I would like to know how to use the coefficients generated by the ARIMA model to predict future values. What formula should be used with the coeffcients to determine the future values. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exporting Graphs
Hi Karthik, I think you will need to do something like jpeg(histograms.jpg) hist(rnorm(100)) dev.off() HTH Stephan Karthik schrieb: Hello Tal, This is the code. hist(rnorm(100)) jpeg(histogram.jpeg) --- Even when I decrease the quality, I still have the same problem. hist(rnorm(100)) jpeg(histogram.jpeg,quality=30) Thank you for taking a look. Karthik On Sun, Feb 21, 2010 at 11:32 AM, Tal Galili tal.gal...@gmail.com wrote: Hi Karthik, Please give a sample code of what it is that you are doing that is causing this. Also, have a look at: ?pdf Or ?png Cheers, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Feb 21, 2010 at 9:27 PM, Karthik kwr...@gmail.com wrote: I am a beginner to R, and am working on exporting graphs. Even when I reduce the quality, it takes 30 or 40 minutes to export the graph. Does anyone have suggestions on how to make it faster? Thank you! Karthik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GARCH AIC
Hi David, str(g) gives you a ton of output, and the @fit slot has a $ics component, part of which has the promising name of AIC... (g...@fit)$ics[1] HTH, Stephan David Rubins schrieb: Hi, Is there anyway to extract the AIC and BIC from the summary statistics in fGarch package? g - garchfit(data) summary(g) will display the AIC and BIC, but I can't find a way to store those in a variable (in order to loop through different parameters, investigating their effect). Thank you, David Rubins __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing dates in dataframes
Hi, it looks like when you read in your data.frames, you didn't tell R to expect dates, so it treats the Date columns as factors. Judicious use of something along these lines before doing your comparisons may help: arr$Date - as.Date(as.character(arr$Date),format=something) Then again, it may be possible to do the actual merging using merge(). HTH Stephan James Rome schrieb: I have two data frames. One (arr) has all arrivals to an airport for a year, and the other (gw) has the dates and quarter hour of the day when the weather is good. arr has a Date and quarter hour column. names(arr) [1] Date weekday hour monthminute [6] quarter ICAO Flight AircraftType Tail [11] Arrived STA Runway FromTo Delay [16] Operator gw I added the gw column to arr and initialized it to all FALSE names(gw) [1] Date minute hour quarter [5] Efficiency.Val Weekly.Avg Arrival.ValWeekly.Avg.1 [9] Departure.Val Weekly.Avg.2 Num.of.HoldRunway [13] Weather First point of confusion: gw[1,1] [1] 1/1/09 353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ... 9/9/09 Why do I get 353 levels? I am trying to identify the quarter hours with good weather in the arr data frame. What I want to do is to go through the rows in gw, and to set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw row. So I tried gooddates = function(all, good) { la = length(all) # All the flights lw = length(good) # The good 15-minute periods for(j in 1:lw) { d=good$Date[j] q=good$quarter[j] all[all$DateTime==d all$quarter==q,17]=TRUE } } but when I run this, I get Error in Ops.factor(all$DateTime, d) : level sets of factors are different I know the level sets are different, that is what I am trying to find. But I think I am comparing single elements from the data frames. So what am I doing wrong? And there ought to be a better way to do this. Thanks in advance, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bootstrapping
Hi Aaron, try the argument statistic=mean. Then boot() will give you the mean turn angle in your actual data (which appears to be 6 degrees, judging from what you write), as well as the means of the bootstrapped data. Then you can get (nonparametric) bootstrap CIs by quantile(boot$t,probs=c(.025,.975)). As far as I can see, there is really no need to look at sd(). A more interesting question would be how to deal with the fact that -180=+180, there may be something to think about here... HTH, Stephan aaron.fo...@students.tamuk.edu schrieb: Hi All, I'm new to R so please bear with me. I have a dataset with 337 turn angles ranging from -180 to 180 degrees. I need to bootstrap (sample with replacement) 1,000 times to create expected average turn angle with 95% CIs. The code is pretty straightforward (-boot(data =, statistic = ,R =)) but I am unsure how to input my observed mean (6 degrees) and standard deviation (66 degrees) into the statistic component. I realize there is a 'function' code but I can't seem to carry the results over to the 'boot' code. Thanks, Aaron M. Foley PhD Candidate Caesar Kleberg Wildlife Research Institute Texas AM University - Kingsville Cousins Hall, Room 201 Kingsville, TX 78363 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] = returns wrong result? Why
Hi Trafim, take a look at FAQ 7.31. HTH Stephan Trafim Vanishek schrieb: Dear all, Does anybody know the probable reason why = gives false when it should give true? These two variables are of the same type, and everything works in the cycle but then it stops when they are equal. this is the output result Rk[47] = RB[21] [1] FALSE Rk[47] [1] 0.002842007 RB[21] [1] 0.002842007 Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Operating on each row of data frame
Hi, does this do what you want? d - cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd)) HTH, Stephan Abhishek Pratap schrieb: Hi All I have a data frame in which there are 4 columns . Column 1 : name Column 2-4 : values I would like to calculate mean/Standard error of values in column 2-4 and store them in column 5,6 respectively. I have done the following but doesn't seem to work mean_N_SE -function(x) { name - x[1] vals - c(x[2:4]) temp_mean - mean(vals) SE - sqrt(var(x)/length(x)) } apply(d,1,mean_N_SE) where d = data frame. Can someone help me with this. Thanks! -Abhi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading a file with mixed cyrillic/latin characters
Dear useRs, I am trying to read a tab-delimited Unicode text file containing both latin and cyrillic characters and failing miserably. The file looks like this (I hope it comes across right): A B C 3 foo ФОО 5 bar БАР read.table(foo.txt,sep=\t,header=TRUE) I am guessing that I can use the fileEncoding argument to read.table() to read this, but I can find no list of supported values of fileEncoding, and fileEncoding=Unicode gives an error. The FAQ and the FAQ for Windows don't help. I have searched both the list archives and RSeek and am still seeking enlightenment. I am running R 2.10.1 on Windows XP, sessionInfo() below. Cheers Stephan R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginer data.frame
Hi Jean-Baptiste, two points: 1) Your variable df is a *local* variable which you define in your function myfunc(), so it is not known outside myfunc(). When you ask is.data.frame(df), R looks at the global definition of df - which is the density function of the F distribution. To make your function run (especially interactively) will require a major rewrite. 2) Generally, using variable names that are already used as R objects (like df in your example) is a bad idea. For an example of the problems you can run into, see 1) above. 3) Loops are not the R way. Depending on what you want to do with the subset of your data.frame, you may want to do something like this: x.df[substr(x.df$Code,1,1)==R,] Look at ?substr to learn more - this function is vectorized, meaning that it takes a vector input and returns a vector output. Look at section 2.7 in An introduction to R. Good luck! Stephan Jean-Baptiste Combes schrieb: Hello, I use R 2.10, and I am new in R (I used to use SAS and lately Stata), I am using XP. I have a data which has a data.frame format called x.df (read from a csv file). I want to take from this data observations for which the variable Code starts with an R. I took all the Code and put them into a vector vec-grep(R[A-Z][A-Z],x.df$Code,value=TRUE) Then I created a function that is supposed to take all the lines in the my data x.df for which Code equals one value of vec. See the code below where I created a loop to do that. myfunc-function(data,var2,var1) + { + i=1 + while (i632){ + line-subset(data,var2==var1[i]) + if (i==1){ + df-line + df-data.frame(df) + } + else { + line-data.frame(line) + df-rbind(df,line) + } + i-i+1 + } + fix(df) + } The results of my program higly depend on the few last lines of the program. If I put fix(df), as above, the function opens a window with my data and it seems a sensible results (I have not checked in details but I barely have what I am suppose to get). myfunc-function(data,var2,var1) ... + } + df-data.frame(df) + print(is.data.frame(df)) + } myfunc(x.df,x.df$Code,vec) [1] TRUE print(is.data.frame(df)) [1] FALSE In the case above I ask whether or not the df is a data.frame and the answer is true, when the program has ended, I ask again and the answer is false. Could anyone tell me what to do to get this data and could anyone tell me why those differences in the results? as.data.frame(df) Erreur dans as.data.frame.default(df) : impossible de convertir automatiquement la classe function en un tableau de données (data.frame) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coerce vector into array - change filling sequence
Hi, you can permute array dimensions using aperm(): x - 1 : 24 z - array(x, dim=c(6,2,2)) y - aperm(z,perm=c(3,2,1)) y[1,1,] HTH, Stephan Kohleth Chia schrieb: Dear all, When I coerce a vector into a multi dimensional array, I would like R to start filling the array along the last dimension, then the 2nd last etc. Let's jump straight into an example. x - 1 : 24 y - array(dim=c(2,2,6)) I would like to have: y[1,1,1] = 1 y[1,1,2] = 2 ... y[1,1,6] = 6 y[1,2,1] = 7 y[1,2,2] = 8 ... y[2,1,1] = 13 ... y[2,2,1] = 19 if I do y- array(x, dim=c(2,2,6)), i think I will get y[1,1,1] = 1 y[2,1,1] = 2 (or something not I want) instead. Of course, I need a fast solution, as I am actually dealing with array of much larger size. Any input will be appreciated Thanks a lot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question regarding if statement in while loop
Hi Stephanie, it sounds like R's exception handling may help, something like this: foo - try(eblest(i, dir5, sterr5, weight5, aux5)) if ( class(foo) == try-error ) next Take a look at ?try. HTH, Stephan Stephanie Coffey schrieb: Hi all, I'm running R version 2.9.2 on a PC. I'm having a problem with a loop, and have tried using an if statement within to fix it, but to no avail. Any advice would be appreciated. Here is my code: * eblest - function(i,dir, sterr, weight, aux) { n - nrow(dir) Y - as.matrix(dir[,i], ncol=1) sigma2ei - as.matrix(sterr[,i]^2, ncol=1) w - as.matrix((weight[,3])*(sigma2ei), ncol=1) X - as.matrix(subset(aux, select=c(3,5:7,9:10,13))) a - EBLUP.area(Y,cbind(w,1),sigma2ei,n)#The EBLUP.area function is a function already in R. } # It gives a bunch of output, some of what I need. #THIS IS THE LOOP I'M HAVING A PROBLEM WITH: results - data.frame(length=nrow(dir5)) i - 3 while (i =some number) { eblest(i, dir5, sterr5, weight5, aux5) out - cbind(i, a$EBLUP, a$mse) results - cbind(results, out) i - i+1 } *** I have tried running the eblest function for a specific set of input (i, dir5, etc as in the function) This function runs ok. However, sometimes eblest does not create the expected output, sometimes due to the solution being singular or other reasons. I'm not so concerned about that at this point - it's a function of the data. My problem is that, as you can see, after eblest runs, the out pieces of information are bound together in the results matrix. When the eblest does not run correctly, the loop stops. For example, when i=3 and i=4, eblest runs ok, and I get a maxtrix with the following columns: iEBLUP se.EBLUP i EBLUP se.EBLUP 3 xy4abh ...but when it loops back around to start i=5, I get an error because the solution is singular. But I don't want the loop to stop (I have ~100 of these i's for which I need to execute this function. So I would like to set an if condition that will cause the loop to step ahead (in this case, to 6), and continue looping...Something like if exists(out)=FALSE next, else... However, when I tried that, the results matrix was empty created at all. In case it helps, out has a numeric mode, and is a matrix... I've written these functions and loops using online help and examples I've found on websites, but clearly I'm missing something. Thanks for any help you can give!! ~Stephanie Coffey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boot() with an array-valued statistic?
Dear guRus, is there a version of boot() that deals with array data and array-valued statistics? For example: foo - array(rnorm(120),dim=c(3,5,8)) means - apply(foo, MARGIN=c(2,3), FUN=mean) means contains the means over the first dimension of foo: a 5x8 array. Now I would like to bootstrap this array, perhaps with something along the lines of some array.boot(), with an additional MARGIN parameter: means.boot - array.boot(foo, statistic=mean, MARGIN=c(2,3), R=1000) I would like means.boot to contain (e.g., in analogy to boot(), in a component $t) an array of dimensions 1000x5x8, containing componentwise means of sampled slices of foo, e.g., to get confidence intervals like this: apply(means.boot$t,MARGIN=c(2,3),FUN=quantile,probs=c(.975,.025)) I have been playing around with plyr and various flavors of apply, and searching only yielded lots of hits for boot.array(), which is something completely different... Any help would be greatly appreciated! Cheers Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting
Hi teo, try lines() instead of points(). HTH Stephan teo schrieb: Hi: Could you please guide me how to plot weekly.t2. Below are my values and I am only able to plot weekly t1 but failing to add a second line representng weekly t2. Thanks. weekly.t1 [1] 228.5204 326.1224 387.2449 415.4082 443.6735 454.6939 451.5306 475.1020 483.2653 483.0612 488.9362 488.3673 493.4694 515.3061 509.7959 [16] 501.0204 513.5714 516.5306 509.4898 516.4286 509.7959 498.6735 492.9592 473.6735 487.7551 482.9293 weekly.t2 [1] 71.57143 95.41667 110.11905 120.65476 124.3 130.11905 132.44048 137.38095 141.60714 135.35714 131.89873 126.7 125.47619 [14] 133.57143 134.34524 137.38095 134.04762 144.40476 148.27381 147.08333 146.42857 137.5 127.14286 127.20238 130.41667 127.67857 plot(c(1:26), weekly.t1, type=l, xlab=Week since treatment start, ylab= Daily Treatment Dose, main=RIOTT doses, col=blue, axes = FALSE) points(c(1:26), weekly.t2, type=l, col=red) axis(side=1, at= c(1:26), as.character(c(1:26))) axis(side=2, at = seq(from=70, to=700, by=10)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating permutations that always include one specific element
Hi Ray, First possibility: just select those combinations that contain AL: combos.with.AL - possible.combos[rowSums(possible.combos==AL)0,] Second possibility: create all 3-combos *without* AL: bands.without.AL - c(B, DB, DG, G, K, LB, LG, MG, O, P, PI, PK, PU, R, V, W, Y) possible.combos.without.AL - permutations(17, 3, bands.without.AL) Then insert AL in the first, second, third and fourth position of this matrix: foo - cbind(AL,possible.combos.without.AL) combos.with.AL - rbind(foo,foo[,c(2,1,3,4)],foo[,c(2,3,1,4)],foo[,c(2,3,4,1)]) The first one is easier to understand but requires you to first build the big object possible combos, most of which you discard. May be a problem in larger instances. HTH, Stephan Raymond Danner schrieb: Dear R community, I am trying to create a matrix of permutations of a vector: bands - c(AL, B, DB, DG, G, K, LB, LG, MG, O, P, PI, PK, PU, R, V, W, Y) Each permutation must be 4 characters long. permutations() from the gtools package does this easy enough: possible.combos - permutations(18, 4, bands) However, ³AL² must be one of the elements in each permutation. Any ideas? Thanks in advance, Ray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integer(0) and NA do not equal FALSE
Hi Jonathan, grep() returns a vector giving either the indices of the elements of 'x' that yielded a match or, if 'value' is 'TRUE', the matched elements of 'x' (quoting from the help page, see ?grep). So you probably want to test whether this vector is empty or not - in other words, whether it has length zero or not: if( length(grep(hi, hop, fixed = TRUE)) 0 ) print('yes, your substring is in your string') else print('no, your substring is not in your string') (And you can remove the fixed = TRUE if you are only interested in whether the expression matches or not.) One off-topic point: you want your else at the end of the second line, not the beginning of the third. R evaluates line by line, and when it gets to the end of the second line and doesn't see your else, it has no way of knowing that the if is not yet finished. I've found that liberal use of curly braces makes life much easier. HTH, Stephan Jonathan schrieb: Hi, A noobie question: I'm simply trying to run a conditional statement that evaluates if a substring is found within a larger string. I find that if it IS found, my function returns TRUE (great!), but if not, the condition does not evaluate to FALSE. ex): if( grep(hi, hop, fixed = TRUE) ) print('yes, your substring is in your string') else print('no, your substring is not in your string') alternatively, I could replace grep with pmatch: if (pmatch('hi','hop')) print('yes, your substring is in your string') else print('no, your substring is not in your string') The first example, using grep, returns logical(0). The second, using pmatch, returns NA. Any idea how to convert either of those to FALSE, or else a different function that would do the trick? Thanks, Jon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] forecasting
?predict HTH Stephan DispersionMap schrieb: I have some data that i ran a regression on and got the usual r output, with the a intercept and b coefficient for my independent variable. i want to forecast the number of future events using these parameters. What function / packages in R would you recommend to be used in good effect for these purposes?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Literature analysis
Hi, from what I understand, you may be interested in text mining, so perhaps you want to look at the tm package. Then again, depending on what you are really trying to do, you may be better served with perl, awk and similar tools than with R... HTH, Stephan Schwan schrieb: Dear all, i am new in R. I am writing a review paper about batteries. However, i am interested in analyzing all the papers by keywords, author, references and year. This could be done by refviz a software, which is only running on windows machines and which is not free. So my question to you is, is it somehow possible to write a script that can do all of this work? And if yes, with what i should start? Thanks a lot in advance, Schwan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Packages Crack the 3,000 Mark!
Hi Bob, Muenchen, Robert A (Bob) wrote: Does anyone have a program that graphs the growth of R packages? I don't know if that historical data is around. John Fox had a slide on this in his useR 2008 talk The Social Organization of the R Project (page 7), with package counts up to March 2008. As Source of Data he gave https://svn.r-project.org/R/branches/. I've been digging around in there but really have no idea how he found the relevant data there. I'd be quite interested in this, too, so if you find out anything please drop me a line... Best, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modeling and forecasting commodity time series?
Hi Luna, you may want to look at the IIF website, http://www.forecasters.org They have a mailing list for forecasters - you may get more of a response there than on a dedicated R list. HTH, Stephan Luna Moon schrieb: Hi all, Could anybody please shed some lights on me about good books/literature about modeling and forecasting financial time series in the commodity space? Thanks so much! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exclude data for boxplot stats using mathematical operator
Try this: boxplot.stats(x[x[,2]0,2],do.conf=FALSE) HTH, Stephan e-letter schrieb: Readers, I have a data set as follows: 1,1 2,2 3,3 4,4 5,3 6,2 7,-10 8,-9 9,-3 10,2 11,3 12,4 13,5 14,4 15,3 16,2 17,1 I entered this data set using the command 'read.csv'. I want to exclude values fewer than zero in column 2 so then I tried the following command: boxplot.stats(x[0,2],do.conf=FALSE) Error: syntax error, unexpected GT, expecting ',' in boxplot.stats(x[ Any help please? Yours, rhelp at conference.jabber.org r 251 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] State Space models in R
Hi, Rob Hyndman's forecast package does exponential smoothing forecasting based on state space models (and lots of other stuff, ARIMA et al.). It's not exactly the companion package to his book, but it comes close. The book's (Forecasting with Exponential Smoothing - The State Space Approach) webpage is here: http://www.exponentialsmoothing.net/ HTH, Stephan Giovanni Petris schrieb: Hello everybody, I am writing a review paper about State Space models in R, and I would like to cover as many packages as I reasonably can. So far I am familiar with the following tools to deal with SS models: * StructTS, Kalman* (in stats) * packages dse[1-2] * package sspir * package dlm I would like to have some input from users who work with SS models: are there any other packages for SS models that I am missing?, which package do you use and why?, what do you think are advantages/ disadvantages of the package you use? Of course I do have my own preferences (biased, of course) and opinions about the different packages, but I would also like to summarize in my paper the feedback I get from the R community. Thank you in advance. Best, Giovanni Petris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: deterministic position_jitter geom_line with position_jitter
Dear guRus, I am starting to work with the ggplot2 package and have two very dumb questions: 1) deterministic position_jitter - the jittering is stochastic; is there any way to get a deterministic jittering? For instance: example.data - data.frame(group=c(foo,bar,foo,bar,foo,bar),x=c(1,1,2,2,3,3),y=c(1,1,0,2,1,1)) set.seed(2009) qplot(x,y,data=example.data,shape=group,position=position_jitter(w=0.1,h=0)) For x=1, the foo point is to the left of the bar point, and for x=3 the other way around. I would like to have all foo points at seq(1,3)-epsilon and all bar points at seq(1,3)+epsilon. Do I need to manually modify example.data$x groupwise for this? 2) geom_line with position_jitter - when I call multiple geoms with position_jitter, each geom gets its own jittering. For example (continuing with example.data above): set.seed(2009) qplot(x,y,data=example.data,geom=c(point,line),shape=group,position=position_jitter(w=0.1,h=0)) The lines do not connect the points - is there any way to have the geom_line connect all the foo points on the one hand and all the bar points on the other hand? -- What I've done: searched through HW's book, googled, searched RSeek. For point 2) above, I tried using multiple layers and resetting the seed in between, to wit: pp - ggplot(example.data,aes(x,y,shape=group)) set.seed(2009) pp - pp+layer(geom=point,position=position_jitter(w=0.1,h=0)) set.seed(2009) pp - pp+layer(geom=line,position=position_jitter(w=0.1,h=0)) print(pp) This doesn't do what I want, either... Why I'm doing this: example.data actually contains group means across a covariate x, and they need to be plotted as dots plus error bars (psychologists' convention), so use boxplots instead is a perfectly correct reply to my questions which unfortunately does not help me. Thanks for your time! Stephan -- sessionInfo(): R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 attached base packages: [1] grid grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] ggplot2_0.8.3 reshape_0.8.3 plyr_0.1.9 proto_0.3-8 svSocket_0.9-43 svMisc_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 Hmisc_3.7-0 [10] survival_2.35-4 loaded via a namespace (and not attached): [1] cluster_1.12.0 lattice_0.17-25 tools_2.9.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Normalize data set
Are you looking for reshape()? HTH, Stephan Edward Chen schrieb: Hi all, I have a mxn matrix that consists of 28077 rows of features and 30 columns of samples. I want to normalize each row for the samples for each feature. I have tried normalize and scale functions but they don't seem to work out the way I want to. Thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting the graph of density with an unknown distribution
Hi, Try kernel smoothing via the density() function. And take a look at ecdf(). HTH, Stephan sendona essile schrieb: How can I plot the graph of a density of a sample with an unknown distribution? I can provide any sample size which is required. I want to have a smooth density graph of my data. New Email addresses available on Yahoo! Get the Email name you#39;ve always wanted on the new @ymail and @rocketmail. Hurry before someone else does! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a string up
strsplit(1041281__2009_08_20_.lev, split=_)[[1]][1] HTH, Stephan stephen sefick schrieb: x - 1041281__2009_08_20_.lev I would like to split this string up and only extract the leading numbers. 1041281 to use as a label for a data column in a bigger for loop function to read in data. regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Several simple but hard tasks to do with R
Hi Rakknar, I believe that the menu command File - Save to File (or some such, I use the German version) in the R GUI for Windows (I'm unclear on your OS) has not yet been suggested. This writes a file containing the entire R session. Does this create the kind of log you are looking for? HTH, Stephan Rakknar schrieb: This is why sink() has the split=TRUE option, which logs output to a connection and sends it to the screen. The R sink(analysis.log,split=TRUE) is almost identical to the Stata log using analysis.log with sink() to close the log and flush to disk at the end of the session. I've already used that and it doesn't work. It only register the output, not the inputs. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why are there small circles in my plot
Hi, Mao Jianfeng schrieb: plot(c(min(myda$traits),max(myda$traits)),c(-0.03,0.5), xlab='State', ylab='ylab') Here, you are plotting two data points: (min(myda$traits),-0.03) and (max(myda$traits),0.5). Try this: plot(c(min(myda$traits),max(myda$traits)),c(-0.03,0.5), xlab='State', ylab='ylab',type='n') HTH, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Collinearity in Linear Multiple Regression
Hi Alex, I personally have had more success with the (more complicated) collinearity diagnostics proposed by Belsley, Kuh Welsch in their book Regression Diagnostics than with Variance Inflation Factors. See also: Belsley, D. A. A Guide to using the collinearity diagnostics. Computational Economics, 1991, 4, 33-50 However, I know of no R package that implements these diagnostics. Anyway, it's not hard to do so oneself. Good luck! Stephan Alex Roy schrieb: Dear all, How can I test for collinearity in the predictor data set for multiple linear regression. Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Collinearity in Linear Multiple Regression
Hi Tim, Variance proportions (and condition indices) are exactly the tools described in Belsley, Kuh Welsch, Regression Diagnostics - see my previous post. Good to see I'm not the only one to use them! BKW also describe in detail how to calculate all this using SVD, so you don't need to use SAS... And I certainly agree that a problematic system means that you need to do more work - probably either collect more data or refine your research agenda, as collinearity may just be inherent in the independent variables you have been collecting. Best, Stephan Tim Paysen schrieb: Actually, the CI index and VIF are just a start. It is best to look at what they call a matrix of variance proportions (found in SAS and a few other places...)--which hardly anyone understands (including the SAS folks). It is a matrix of estimates of what the variences of the regression coefficients would be if you could figure them out in the first place. It shows which factors dominate over others IN THE PARTICULAR SETUP you are analyzing. The matrix is often calculated using eigenvalues, but is best done with Singular Value Decomposition techniques (you don't have to have a square matrix, and you maintain better precision). Analysts will say that it can display an unstable system -- which is correct, but they generally say that, if its true, you have bad data and should throw it out--or collect more. I suggest care, because it may be illustrating the nature of the system you are studying. The only decent reference that I know of is a little book (hard to read) that I can't remember off the top of my head. Have to look it up. Timothy E. Paysen, Phd Research Forester (ret.) From: John Sorkin jsor...@grecc.umaryland.edu To: Alex Roy alexroy2...@gmail.com; r-help@r-project.org Sent: Tuesday, July 21, 2009 4:19:11 AM Subject: Re: [R] Collinearity in Linear Multiple Regression I suggest you start by doing some reading about Condition index (CI) and variation inflation factor (VIF). Once you have reviewed the theory, a search of search.r-project.org (under the help menu in a windows-based R installation) for VIF will help you obtain values for VIF, c.f. http://finzi.psych.upenn.edu/R/library/HH/html/vif.html John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Alex Roy alexroy2...@gmail.com 7/21/2009 7:01 AM Dear all, How can I test for collinearity in the predictor data set for multiple linear regression. Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quadratic programming
Hi, The CRAN Task View on Optimization may help: http://stat.ethz.ch/CRAN/web/views/Optimization.html HTH, Stephan barbara.r...@uniroma1.it schrieb: Devo risolvere un problema di minimo vincolato con vincoli di uguaglianza e un altro con vincoli di uguaglianza e disuguaglianza. Cosa posso utilizzare? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deleting rows or cols that do not meet cut off
f[rowSums(f=1)0,colSums(f=1)0] Judging from your result, you want less than or equal to 1. HTH, Stephan Crosby, Jacy R schrieb: How can I delete both rows and columns that do not meet a particular cut off value. Example: d - rbind(c(0,1,6,4), + c(2,5, 7,5), + c(3,6,1,6), + c(4,4,4,4)) f - as.matrix(d) f [,1] [,2] [,3] [,4] [1,]0164 [2,]2575 [3,]3616 [4,]4444 I would like to delete all rows and columns that do not contain at least one element with a value less than 1. So I'd end up with: f [,1] [,2] [,3] [1,]016 [3,]361 Note: 1 is an arbitrary cut-off value. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deleting rows or cols that do not meet cut off
Hi Wacek, Wacek Kusnierczyk schrieb: ... or gain a bit on performance by doing the threshold comparison on the whole matrix just once at once: dd = d = 1 d[apply(d, 1, any), apply(d, 2, any)] d[apply(dd, 1, any), apply(dd, 2, any)] Or not? Cheers, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug when subtracting decimals?
Hi, Gavin Simpson wrote: I bemoan the apparent inability of those asking such questions to use the resources provided to solve these problems for themselves... Looking at all the people who quite obviously do NOT read the posting guide and provide commented, minimal, self-contained, reproducible code, I wonder whether the mailing list could be configured to reply to each new mail (not replies in a thread) with an automated mail like this: Have you read the posting guide and the FAQs? If you do not get a reply within two days, you may want to look at both and think about reformulating your query. Oh, and while you are at it, look through the archives, a lot of questions have already been asked and answered before. Just a thought, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Citing R
citation() HTH, Stephan Tom Backer Johnsen schrieb: What is the correct citation to R in BibTeX format? I have looked in the R pages but so far without any luck. Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series forecasting
Hi Perry, my impression after a very cursory glance: this looks like noise. Perhaps you should think a little more about your series - what kind of seasonality there could be (is this weekly data? or monthly?), whether the peaks and troughs could be due to some kind of external driver, whether you really have count data, that kind of thing. Until then, there is little else to do than to use a very simple method, e.g. forecast the last observation (random walk) or the mean of the observations (white noise), or the median. All of these benchmarks can be surprisingly hard to beat. If you have seasonality but no external influence, you could look at smoothing methods, they are nice to interpret and usually perform very well. I'd recommend Hyndman et al., Forecasting with Exponential Smoothing: The State Space Approach and the accompanying forecast R package, mainly with the ets() function. You could also look at arima(). I fitted an ARIMA model to your data, and as expected, it returned a simple mean (not that I would recommend blindly fitting ARIMA to just any data): Call: arima(x = foo[, 2]) Coefficients: intercept 7.2333 s.e. 0.8009 sigma^2 estimated as 19.25: log likelihood = -86.93, aic = 177.85 And for count data, you could use some variants of ARIMA, e.g., INAR. HTH, Stephan pg...@gol.com schrieb: Dear all: I'm a newbie and an amateur seeking help with forecasting the next in a non-stationary time series, with constraints of 1 (low) and 27 (high) applicable to all. What I need help with is the solution concept. The series has 439 observations as of last week. I'd like to analyze obs 1 - 30 (which are historical and therefore invariate), to solve for 31. The history: Obs 12 21 31 416 59 66 77 811 911 101 1112 1214 1313 142 154 165 1714 186 194 207 215 228 237 2415 2511 263 274 286 298 304 31?? (a known) For backtesting of forecasting accuracy, I can use either a sliding window ( 1 - 30 to solve for 31, 2 - 31 to solve for 32, 3 - 32 to solve for 33, etc.) OR a cumulative window (1 - 30 to solve for 31, 1 - 31 to solve for 32, 1 - 32 to solve for 33, etc.), whichever works better. I can also supply different windows if deemed appropriate, e.g., 50 or 75 or 100 obs, whatever, in either configuration. The 30 obs window is selected for this list query only so as not to take up too much message space. Query: How would you solve for ob 31 in the above series, with the constraints stated? (If you need a longer history, say, 50 obs or more, I can supply it off-list.) I've tried all the relevant Excel functions with no success, and suspect that the solution lies in some combination of them. Here I defer to the collective wisdom of you all. Once the correct concept is established, I can proceed to set it up in R for this and other similar series. Many TIA and regards, Perry E. Gary Tokyo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate natural cubic spline in R?
Hi, if you are looking for *natural* cubic splines (linear beyond the outer knots), you could use rcs() in Frank Harrell's Design package. HTH, Stephan David Winsemius schrieb: If one enters: ??spline ... You get quite a few matches. The one in the stats functions that probably answers your specific questions is: splinefun {stats} R Documentation Interpolating Splines Description Perform cubic (or Hermite) spline interpolation of given data points, returning either a list of points obtained by the interpolation or a function performing the interpolation. splinefun returns a function with formal arguments x and deriv, the latter defaulting to zero. This function can be used to evaluate the interpolating cubic spline (deriv=0), or its derivatives (deriv=1,2,3) at the points x, where the spline function interpolates the data points originally specified. This is often more useful than spline. Perhaps you need to review from you basic intro material regarding help.search(text) # or ??text# possibilities. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does R support double-exponential smoothing?
Hi, ets() in Hyndman's forecast package allows you to specify which one of the many smoothing variants (additive/multiplicative season, damped trend, additive/multiplicative errors) you want. HTH, Stephan minben schrieb: I want to use double-exponential smoothing to forecast time series datas,but I couldn't find it in the document,does R support this method? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] model selection using ANOVA
Hi Alina, your approach sounds problematic - you can always get a smaller RSS if you add terms to your model, so your approach will always go for larger models, and you will end up overfitting. Consider information criteria, e.g., AIC or BIC, which penalize larger models. References for AIC are Burnham Anderson; other people prefer BIC. Then you can do something like models - list() AICs - rep(NA, n) models[[1]] - lm(...); AICs[1] - AIC(model[[1]]) ... models[[n]] - lm(...); AICs[n] - AIC(model[[n]]) which.min(AICs) depending on your specific needs. HTH, Stephan Alina Sheyman schrieb: I've created a number of models using lm and now want to pick one with the smallest standard error or the smallest RSS, I can get a list of RSS using anova function, but is the any way I can then select one with the smallest RSS from the list? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to generate natural cubic spline in R?
Hi David, David Winsemius schrieb: The splinefun documentation indicates that natural is one of the types of cubic spline options available. That sounds good, didn't know that... rcs() has the advantage of coming with a book (Harrell's Regression Modeling Strategies). Does rcs actually do fitting? Such would not be my expectation on reading the documentation and I do not see any examples of such functionality in the help pages. Nope, but you can include rcs() within fitting functions, lm(foo~rcs(bar,3)), which makes more sense to me than having a spline function fit... Looks like better encapsulation to me. Best, Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add variable in for loop
Hi Paul, you do *not* want to do this, it takes too long and may lead to rounding errors. Vectorize everything, e.g., use sum(meanrotation). And look into ?apply, and google for the R Inferno. And no, there is no +=... Good luck! Stephan pgseye schrieb: Hi, I'm learning to write some basic functions in R. For some data I have I'd like to be able to add a variable to itself after each iteration in a for loop to obtain a grandtotal for that variable so I can calculate a mean. test-function(data){ for (i in 1:80){ meanrotation-(abs(data[i,3]-data[i,2])+abs(data[i,4]-data[i,2])+abs(data[i,5]-data[i,2])+abs(data[i,6]-data[i,2]))/4 cat(i,meanrotation,\n) #total+=meanrotation } #print (total/80) } In perl there's an assignment operator variable+=variable2. Is there anything like this in R to do as illustrated in the code above. thanks a lot, Paul Edit - I guess the other way to do this which I just realised is to assign the output of the function to a vector and then do a summary(), but I don't know how to do this either - help is appreciated __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text matching and substitution
Hi Simeon, ?gsub HTH, Stephan simeon duckworth schrieb: I am trying to simplify a text variable by matching and replacing it with a string in another vector so for example in colours - paste(letters,colours(),stuff,LETTERS) find and replace with (red,blue,green,gray,yellow,other) - irrespective of case its a large dataset, so i'd like to be able to do this as efficiently as possible. thanks for any help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text matching and substitution
Hi Simeon, I'm slightly unclear on what exactly you are trying to achieve... Are you trying to replace every entry of colours which *contains* red by red, dropping the rest of the entry? And same with blue? A short example before after would be helpful... Best, Stephan simeon duckworth schrieb: thanks stephan. i'd been trying to make gsub work, but couldnt make it replace the whole expression. so i'd resorted to trying to loop with grep - but with two problems. firstly, i cant seem to make the loop 'remember' the substitutions it makes (see below). secondly, it feels like this is a really inefficient way of doing something quite simple anyhow. colours - as.character(paste(letters,colours(),stuff,LETTERS)) target - c(red,blue,green,gray) new.colour -colours for (i in length(target)) { x - target[i] new.colour[grep((x),new.colour)] - x return(new.colour) } On Sat, Mar 28, 2009 at 9:45 AM, Stephan Kolassa stephan.kola...@gmx.dewrote: Hi Simeon, ?gsub HTH, Stephan simeon duckworth schrieb: I am trying to simplify a text variable by matching and replacing it with a string in another vector so for example in colours - paste(letters,colours(),stuff,LETTERS) find and replace with (red,blue,green,gray,yellow,other) - irrespective of case its a large dataset, so i'd like to be able to do this as efficiently as possible. thanks for any help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text matching and substitution
Hi Simeon, I played around a little with Vectorize and mapply, but I couldn't make it work :-( So, my best guess would be a simple loop like this: result - as.character(paste(letters,colours(),stuff,LETTERS)) target - c(red,blue,green,gray) for ( new.color in target ) { result[grep(new.color,result)] - new.color } Best of luck, Stephan simeon duckworth schrieb: stephan sorry for not being clear - but thats exactly what i want. i'd like to replace every complex string that contains red with just red, and then so on with blue, yellow etc my data is of the form x xx xx x red xx xxx xx xx xxx xxx xx blue xx xx xx xx x x xx xx xx xx red red xx xx xx xx xx xx xx xx xx xx xx xx x x x x which i'd like to replace with red blue red other other thanks On Sat, Mar 28, 2009 at 2:38 PM, Stephan Kolassa stephan.kola...@gmx.dewrote: Hi Simeon, I'm slightly unclear on what exactly you are trying to achieve... Are you trying to replace every entry of colours which *contains* red by red, dropping the rest of the entry? And same with blue? A short example before after would be helpful... Best, Stephan simeon duckworth schrieb: thanks stephan. i'd been trying to make gsub work, but couldnt make it replace the whole expression. so i'd resorted to trying to loop with grep - but with two problems. firstly, i cant seem to make the loop 'remember' the substitutions it makes (see below). secondly, it feels like this is a really inefficient way of doing something quite simple anyhow. colours - as.character(paste(letters,colours(),stuff,LETTERS)) target - c(red,blue,green,gray) new.colour -colours for (i in length(target)) { x - target[i] new.colour[grep((x),new.colour)] - x return(new.colour) } On Sat, Mar 28, 2009 at 9:45 AM, Stephan Kolassa stephan.kola...@gmx.de wrote: Hi Simeon, ?gsub HTH, Stephan simeon duckworth schrieb: I am trying to simplify a text variable by matching and replacing it with a string in another vector so for example in colours - paste(letters,colours(),stuff,LETTERS) find and replace with (red,blue,green,gray,yellow,other) - irrespective of case its a large dataset, so i'd like to be able to do this as efficiently as possible. thanks for any help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ApEn (Approximate Entropy), Total Corr, Information Interaction
Hi Vishal, re 1]: Ben Bolker very kindly shared an R reimplementation of Kaplan's Matlab code a little while ago: http://www.nabble.com/Approximate-Entropy--to21144062.html#a21149402 Best wishes Stephan Vishal Belsare schrieb: Is there any existing implementation in R/S of : 1] Pincus Kalman's approximate entropy (ApEn) measure 2] Total Correlation / Multiinformation 3] Information Interaction A search doesn't quite reveal anything, but I'd be keen to not reinvent in case someone has worked on it. Many thanks in anticipation. Best, Vishal Belsare __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
Hi Juliet, Juliet Hannah schrieb: One simple thing to try would be to form categories Simple but problematic. Frank Harrell put together a wonderful page detailing all the issues with categorizing continuous data: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous So: keep your data continuous. Apart from that, I would second John's recommendation to try to get samples at the same point in time (and, if it is cortisol, stay away from smokers etc.). Best wishes Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
Hi Juliet, Juliet Hannah schrieb: I should have emphasized, I do not intend to categorize -- mainly because of all the discussions I have seen on R-help arguing against this. Sorry that we all jumped on this ;-) I just thought it would be problematic to include the variable by itself. Take other variables, such as a genotype or BMI. If we measure this variable the next day, it would be the same. However, a hormone's level would not be the same. I thought this error must be accounted for somehow. You are quite correct that fluctuating hormone levels are a problem (although, strictly speaking, measuring BMI and even genotyping will not yield exactly the same results the next day, measurement error is always present). And there may be methods dealing with this, but I don't know of any. If you have any idea about the variability of your hormone, you could always take your data, perturb the hormone levels and run the analysis again to get a feeling for the stability of your results. This is quite ad hoc, but if I were the reviewer, a perturbation analysis like this would greatly reassure me. However, I recently worked with hormones and had exactly your problem, and we couldn't find any published data on day-to-day variability, so this was not an option - we finally went ahead and simply plugged the measurements into R. Good luck! Stephan Thanks again! Regards, Juliet On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron ba...@psych.upenn.edu wrote: If you form categories, you add even more error, specifically, the variation in the distance between each number and the category boundary. What's wrong with just including it in the regression? Yes, the measure X1 will account for less variance than the underlying variable of real interest (T1, each individual's mean, perhaps), but X1 could still be useful in two ways. One, it might be a significant predictor of the dependent variable Y despite the error. Two, it might increase the sensitivity of the model to other predictors (X2, X3...) by accounting for what would otherwise be error. What you cannot conclude in this case (when you measure a predictor with error) is that the effect of (say) X2 is not accounted for by its correlation with T1. Some people try to conclude this when X2 remains a significant predictor of Y when X1 is included in the model. The trouble is that X1 is an error-prone measure of T1, so the full effect of T1 is not removed by inclusion of X1. Jon On 03/07/09 12:49, Juliet Hannah wrote: Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. Regards, Juliet __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron Editor: Judgment and Decision Making (http://journal.sjdm.org) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to comment in R
Hi Mihai, one (very bad style) way would be if (FALSE) { comment comment comment } But putting a # in front of every line is easier to spot in the code. HTH, Stephan mihai.mira...@bafin.de schrieb: Hi everybody, I use for the moment # at the begining of each line for comments. Is there any possibility to comment more than one line, like something which shows the beggingng and the end of the comment? Or is there a possibility to comment only a part of a line? Thanks, Mihai [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R system command on windows
?shell HTH, Stephan Aurelie Labbe, Dr. schrieb: Hi, I am trying to use the R command system under windows (XP). If I try the simple command system(mkdir toto) to create a directory toto, it tells me that it cannot find the command mkdir... Does anybody knows how it works ? Is it a path problem ? Maybe the answer is simple: I am a R user under Unix/Linux and I don't know well R under windows. Thanks ! Aurelie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset problem (reducing from six to two levels)
Hi, does this help? http://www.nabble.com/factor-question-to18638814.html#a18638814 HTH, Stephan Ine schrieb: Hi all, I have got a seemingly simple problem (I am an R starter) with subsetting my data set, but cannot figure out the solution: I want to subset a data set from six to two levels, so that all analyses are done only with these two remaining levels. I tried TOTAL-read.delim('total.csv',header=T) SUBSET.OF.TOTAL-subset(TOTAL, FactorX %in% c(Level1,Level2)) attach(SUBSET.OF.TOTAL) but R does not eliminate the remaining levels of FactorX, just assigns 'not available' to the data. Like this, the other levels still show up in plots etc., but without data entries. Anybody got a solution how to subset the data so that I eliminate the other levels completely? Thanks a lot for the help, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.