[R] Downloading Reuters data from R
Hi R, Can we download Reuters (3000 Xtra) data from R? Does ODBC package help me in this? Or otherwise, is there a way to extract daily closing prices data of Reuters from R? Thank you very much, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2-Y-axes on same plot
Joe Trubisz wrote: Hi... Is this possible in R? I have 2-sets of data, that were collected simultaneously using 2-different data acquisition schemes. The x-values are the same for both. The y-values have different ranges (16.4-37.5 using one method, 557-634 using another). In theory, if you plot both plots on top of each other, the graphs should overlap. The problem I'm having is trying to have to different sets of y-values appear in the same graph, but scaled in the same vertical space. I've seen this done in publications, but not sure if it can be done in R. Hi Joe, Check out twoord.plot in the plotrix package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A package to set up a questionnaire enter data
CE.KA wrote: Hi R users, Is there a Package in R to - set up a questionnaire? - enter data? Hi CE.KA, I don't know, but I have written a general purpose questionnaire program in Tcl-Tk that will administer the questionnaire and record the responses. If you don't find anything, let me know. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] means of a column within groups of a data frame
John Sorkin wrote: R 2.8.0 windows XP I would like to divide the rows of data frame into five groups and then get the mean of one column within the five groups. I have accomplished this using the code below, but I hope there is an easier way, i.e. some function that I can call # create five groups. cut(data$BMI,5) # get mean of AAMTCARE within each of the five groups mean(data[data[,BMIcuts]==(13.3,21.9],AAMTCARE]) mean(data[data[,BMIcuts]==(21.9,30.5],AAMTCARE]) mean(data[data[,BMIcuts]==(30.5,39.1],AAMTCARE]) mean(data[data[,BMIcuts]==(39.1,47.7],AAMTCARE]) mean(data[data[,BMIcuts]==(47.7,56.3],AAMTCARE]) Hi John, Have a look at brkdn in the prettyR package. data$BMIcuts-cut(data$BMI,5) brkdn(BMI~BMIcuts,data) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Jacobi Plane Rotations in R
http://idisk.mac.com/jdeleeuw-Public/jacobi This is paper/software for various techniques based on Jacobi plane rotations. There is R code for -- classical cyclical Jacobi Eigen diagonalization -- Jacobi-based SVD diagonalization -- approximate simultaneous diagonalization of symmetric matrices (De Leeuw/Pruzansky 1978) -- approximate simultaneous diagonalization of rectangular matrices (TUCKER-2) -- approximate body diagonalization of three-way arrays (orthogonal INDSCAL) -- TUCKER-3 for three-way arrays (three rotations meet a core) -- PREHOM/MCA approximate KPL-diagonalization of the Burt matrix (De Leeuw/Bekker 1982) == Jan de Leeuw, 11667 Steinhoff Rd, Frazier Park, CA 93225 home 661-245-1725 skype 661-347-0667 global 254-381-4905 .mac: jdeleeuw +++ aim: deleeuwjan +++ skype: j_deleeuw == Many nights on the road and not dead yet --- the end of autumn. (Basho 1644-1694) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical inconsistency
Rounding can do no good because round(8.8,1)-round(7.8,1)1 # still TRUE round(8.8)-round(7.7)1 # FALSE What you might do is compute a-b-1 and compare it to a very small number: (8.8-7.8-1) 1e-10 # TRUE K On Wed, Dec 10, 2008 at 11:47 AM, emma jane [EMAIL PROTECTED] wrote: Thanks Greg, that does make sense. And I've solved the problem by rounding the variables before taking the difference between them. Thanks to all who replied. Emma Jane From: Greg Snow [EMAIL PROTECTED] .com.br; Wacek Kusnierczyk [EMAIL PROTECTED]; Chuck Cleland [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Tuesday, 9 December, 2008 16:30:08 Subject: RE: [R] Logical inconsistency Some (possibly all) of those numbers cannot be represented exactly, so there is a chance of round off error whenever you do some arithmetic, sometimes the errors cancel out, sometimes they don't. Consider: print(8.3-7.3, digits=20) [1] 1.001 print(11.3-10.3, digits=20) [1] 1 So in the first case the rounding error gives a value that is slightly greater than 1, so the greater than test returns true (if you round the result before comparing to 1, then it will return false). In the second case the uncertainties cancelled out so that you get exactly 1 which is not greater than 1 an so the comparison returns false. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of emma jane Sent: Tuesday, December 09, 2008 7:02 AM To: Bernardo Rangel Tura; Wacek Kusnierczyk; Chuck Cleland Cc: R help Subject: Re: [R] Logical inconsistency Many thanks for your help, perhaps I should have set my query in context ! I'm simply calculating an indicator variable [0,1] based on the whether the difference between two measured variables is 1 or =1. I understand the FAQ about floating point arithmetic, but am still puzzled that it only apparently applies to certain elements, as follows: 8.8 - 7.8 1 TRUE 8.3 - 7.3 1 TRUE However, 10.2 - 9.2 1 FALSE 11.3 - 10.31  FALSE Emma Jane From: Bernardo Rangel Tura [EMAIL PROTECTED] To: Wacek Kusnierczyk [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Saturday, 6 December, 2008 10:00:48 Subject: Re: [R] Logical inconsistency On Fri, 2008-12-05 at 14:18 +0100, Wacek Kusnierczyk wrote: Berwin A Turlach wrote: Dear Emma, On Fri, 5 Dec 2008 04:23:53 -0800 (PST) Please could someone kindly explain the following inconsistencies I've discovered__when performing logical calculations in R: 8.8 - 7.8 1 TRUE 8.3 - 7.3 1 TRUE Gladly: FAQ 7.31 http://cran.at.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R- th ink-these-numbers-are-equal_003f well, this answer the question only partially. this explains why a system with finite precision arithmetic, such as r, will fail to be logically correct in certain cases. it does not explain why r, a language said to isolate a user from the underlying implementational choices, would have to fail this way. there is, in principle, no problem in having a high-level language perform the computation in a logically consistent way. for example, bc is an arbitrary precision calculator language, and has no problem with examples as the above: bc 8.8 - 7.8 1 # 0, meaning 'no' bc 8.3 - 7.3 1 # 0, meaning 'no' bc 8.8 - 7.8 == 1 # 1, meaning 'yes' the fact that r (and many others, including matlab and sage, perhaps not mathematica) does not perform logically here is a consequence of its implementation of floating point arithmetic. the faq you were pointed to, and its referring to the goldberg's article, show that r does not successfully isolate a user from details of the lower-level implementation. vQ Well, first of all for 8.-7.3 is not equal to 1 [for computers] 8.3-7.3-1 [1] 8.881784e-16 But if you use only one digit precision round(8.3-7.3,1)-1 [1] 0 round(8.3-7.3,1)-10 [1] FALSE round(8.3-7.3,1)==1 [1] TRUE So the problem is the code write and no the software -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil     [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version
[R] Coxian Distribution
Hello R-users. I want to know if there is in R the distribution and density function of the Coxian. If there isn't, It will help if someone has written this functions. Thanks Borja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rotate basic plot (scatter) in R
Dear all, I was trying to rotate a plot using R. I tried most of the examples offered here in this forum, but for some reason it is not working for a Scatterplot or in my case: plot (x,y). With a histogram I had no problems. Is it not possible to rotate a simple plot? Thanks a lot, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snowfall sfInit error
Dear Mr. Ripley, indeed that it true. sfInit() currently have a bug on Windows depending on the usage of the Linux tools and the broken Exceptionhandling. Too bad I never tested it accordingly on Windows (as we do not have any Windows machines in our institute). snowfall 1.62 is in the pipe with many other fixes (e.g. NetWorkSpaces usage) and I will include a Windows workaround in it. It will go out for testing at the beginning of the week and should be on CRAN end of the week. Best regards, Jochen Knaus On Sat, 6 Dec 2008, [EMAIL PROTECTED] wrote: Dear all, I am trying to execute the simple example in snowfall http://cran.r-project.org/web/packages/snowfall/vignettes/snowfall.pdf ... require(snow) require(snowfall) sfInit( parallel=TRUE, cpus=2 ) sfLapply( 1:10, exp ) sfStop() I have installed the snow and snowfall packages in R on a machine with windows xp, however, after running the sfInit( parallel=TRUE, cpus=2 ) line I get an error ... Error in system(whoami, intern = TRUE, ignore.stderr = TRUE) : whoami not found Error in paste(sep = _, R, uname, format(Sys.time(), %H%M%S_%m%d%y)) : object uname not found I am the only (administrator) user of the computer. It has a dual core processor, and is not networked. I would be greatful if someone could tell me how to proceed. Follow the posting guide (see the footer of this message) and talk to the maintainer of 'snowfall'. Most likely it is not intended to be used on Windows, but has not declared that. 'whoami' and 'uname' are Unix programs, not Windows ones, but R's Sys.info() provides equivalent information. Kind regards Chibisi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] call lattice function in a function passing groups argument
I'm trying to use a lattice function within a function and have problems passing the groups argument properly. Let's say I have a data frame d - data.frame(x = rnorm(100), y = c(a, b)) and want to plot variable x in a densityplot, grouped by the variable y, then I would do something like densityplot(~ x, d, groups = y) If however I wanted to call the function densityplot within a function and pass the groups argument as an argument of that function, how would I have to proceed? It is not as straightforward as f - function(data, groupvar) { densityplot(~ x, data, groups = groupvar) } probably because the lattice function densityplot.formula preprocesses the groups argument with groups - eval(substitute(groups), data, environment(formula)) It there a way how I could pass the groups argument in the function f? Thanks for any hints, Thomas Zumbrunn __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rotate basic plot (scatter) in R
I was trying to rotate a plot using R. I tried most of the examples offered here in this forum, but for some reason it is not working for a Scatterplot or in my case: plot (x,y). With a histogram I had no problems. Is it not possible to rotate a simple plot? I'm not 100% sure what exactly you mean by 'rotate'. I'll assume you mean rotation of the entire canvas on the output device. For the postscript device you can use the 'horizontal' option to switch between portait and landscape orientation. Is that what you meant? Probably not, because you mentioned that you succeeded with histograms. Maybe you should give us alittle more information, e.g. what exactly you need and what you did in the case of histograms. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simplex function in R
Hi, in the first example, your feasible set is just one point (the one that fulfills the 3 equations) and thus there is only this one point which can maximize the objective function. In the second case, the feasible set is a line. But the simplex algorithm tries to find an optimizing value of the objective on a convex polyhedron. So the simplex function may be inappropriate to find all feasible solutions. HTH Armin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Principal Component Analysis - Selecting components? + right choice?
Dear R gurus, I have some climatic data for a region of the world. They are monthly averages 1950 -2000 of precipitation (12 months), minimum temperature (12 months), maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and I have around 75,000 cells. I need to feed them into a statistical model as co-variates, to use them to predict a response variable. The climatic data are obviously correlated: precipitation for January is correlated to precipitation for February and so on even precipitation and temperature are heavily correlated. I did some correlation analysis and they are all strongly correlated. I though of running PCA on them, in order to reduce the number of co-variates I feed into the model. I run the PCA using prcomp, quite successfully. Now I need to use a criteria to select the right number of PC. (that is: is it 1,2,3,4?) What criteria would you suggest? At the moment, I am using a criteria based on threshold, but that is highly subjective, even if there are some rules of thumb (Jolliffe,Principal Component Analysis, II Edition, Springer Verlag,2002). Could you suggest something more rigorous? By the way, do you think I would have been better off by using something different from PCA? Best, -- Corrado Topi Global Climate Change Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A package to set up a questionnaire enter data
Hello, For entering data alone, you would not need a package. For simple questionnaires, you could write a function. It could go like this. For example you want to record people's names and their ages: # Sets up an empty database database - c() enter_data - function() { show(Enter name:) name - as.character(readline()) show(Enter age:) age - as.numeric(readline()) # Appends data from one questionnaire to the # database database - rbind(database,data.frame(name,age)) # Calls the function in order to proceed # with another questionnaire enter_data() # stop the function using the stop button when you are finished } exporting database into a CSV or a text file should not be a problem with write.csv() or write.csv2(). Kind regards, David Croll __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simplex function in R
Try the pseudo inverse: m - rbind(c(1, 1, 1), c(1, 0, 1), c(0, 1, 0)) b - c(5, 2, 3) library(MASS) ginv(m) %*% b On Thu, Dec 11, 2008 at 2:20 AM, Chris Line [EMAIL PROTECTED] wrote: I have a set of linear equations and would like to find any feasible solution. A simplex solution works in Case 1 below, but not in Case 2. I would be grateful for any help. Case 1: Find any feasible solution for the set of linear equations: a + b + c = 5 a + b + 0c = 4 0a + b + c = 4 Solution - a feasible (and unique) solution is a=1, b=3, c=1. The following R code returns a feasible solution: A3M = matrix(c(1,1,0,1,1,1,1,0,1),nrow=3) b3M = matrix(c(5,4,4),ncol=1) A1M = matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) b1M = matrix(c(10,10,10),ncol=1) AM = matrix(c(1,1,1),nrow=1) simplex(a = AM, A1 = A1M, b1 = b1M, A2 = NULL, b2 = NULL, A3 = A3M, b3 = b3M, maxi = TRUE) Case 2: Find any feasible solution for the set of linear equations a + b + c = 5 a + 0b + c = 2 0a + b + 0c = 3 Solution - one feasible solution of many is a=1, b=3, c=1. There are infinite possible solutions in Case 2. However, the following R code fails to return any feasible solution: A3M = matrix(c(1,1,0,1,0,1,1,1,0),nrow=3) b3M = matrix(c(5,2,3),ncol=1) A1M = matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) b1M = matrix(c(10,10,10),ncol=1) AM = matrix(c(1,1,1),nrow=1) simplex(a = AM, A1 = A1M, b1 = b1M, A2 = NULL, b2 = NULL, A3 = A3M, b3 = b3M, maxi = TRUE) The code returns the error: Error in A.out[, basic] - iden(M) : subscript out of bounds Am I using the Simplex function incorrectly? There may be a better way to approach the problem of finding a feasible solution. Cheers, Chris. The contents of this email including any attachments are confidential. If you have received this email in error, please advise the sender by return email and delete this email. Any unauthorised use of the contents of the email is prohibited and you must not disseminate, copy or distribute the message or use the information contained in the email or its attachments in any way. The views or opinions expressed are the author's own and may not reflect the views or opinions of Tibra. Tibra does not guarantee the integrity of any emails or attached files. E-mails may be interfered with, may contain computer viruses or other defects. Under no circumstances do we accept liability for any loss or damage which may result from your receipt of this message or any attachments. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Principal Component Analysis - Selecting components? + right choice?
You can have look to *S. Dray*. On the number of principal components: A test of dimensionality based on measurements of similarity between matrices. /Computational Statistics and Data Analysis/, 52:2228-2237, 2008. which is implemented in the testdim function of the ade4 package. Cheers. Corrado wrote: Dear R gurus, I have some climatic data for a region of the world. They are monthly averages 1950 -2000 of precipitation (12 months), minimum temperature (12 months), maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and I have around 75,000 cells. I need to feed them into a statistical model as co-variates, to use them to predict a response variable. The climatic data are obviously correlated: precipitation for January is correlated to precipitation for February and so on even precipitation and temperature are heavily correlated. I did some correlation analysis and they are all strongly correlated. I though of running PCA on them, in order to reduce the number of co-variates I feed into the model. I run the PCA using prcomp, quite successfully. Now I need to use a criteria to select the right number of PC. (that is: is it 1,2,3,4?) What criteria would you suggest? At the moment, I am using a criteria based on threshold, but that is highly subjective, even if there are some rules of thumb (Jolliffe,Principal Component Analysis, II Edition, Springer Verlag,2002). Could you suggest something more rigorous? By the way, do you think I would have been better off by using something different from PCA? Best, -- Stéphane DRAY ([EMAIL PROTECTED] ) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France Tel: 33 4 72 43 27 57 Fax: 33 4 72 43 13 88 http://biomserv.univ-lyon1.fr/~dray/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ref card for data manipulation?
On Wed, 10 Dec 2008 18:29:43 +0100, Peter Dalgaard [EMAIL PROTECTED] wrote: You (as many before you) have overlooked the ave() function, which can replace the ordering as well the do.call(c,tapply()) Majority of questions on this list concern data manipulation. Many are repetitive. Overlooking like that will always happen unless some comprehensive data manipulation documentation is made. I think many people would benefit if a specialized data.manip ref.card were conceived. Tom Short's card is an excellent one but it does not cover high level packages like plyr, reshape, DoBy, and a few base data.manip functions are not there as well. Vitalie. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting ISO week
Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf getweek-function(Y,M=NULL,D=NULL){ if(!class(Y)[1]%in%c(Date,POSIXt)) { date.posix-strptime(paste(c(Y,M,D),collapse=-),%Y-%m-%d) } if(class(Y)[1]%in%c(POSIXt,Date)){ date.posix-as.POSIXlt(Y) Y-as.numeric(format(date.posix,%Y)) M-as.numeric(format(date.posix,%m)) D-as.numeric(format(date.posix,%d)) } LY- (Y%%4==0 !(Y%%100==0))|(Y%%400==0) LY.prev- ((Y-1)%%4==0 !((Y-1)%%100==0))|((Y-1)%%400==0) date.yday-date.posix$yday+1 jan1.wday-strptime(paste(Y,01-01,sep=-),%Y-%m-%d)$wday jan1.wday-ifelse(jan1.wday==0,7,jan1.wday) date.wday-date.posix$wday date.wday-ifelse(date.wday==0,7,date.wday) If the date is in the beginning, or end of the year, ### does it fall into a week of the previous or next year? Yn-ifelse(date.yday=(8-jan1.wday)jan1.wday4,Y-1,Y) Yn-ifelse(Yn==Y((365+LY-date.yday)(4-date.wday)),Y+1,Y) ##Set the week differently if the date is in the beginning,middle or end of the year Wn-ifelse( Yn==Y-1, ifelse((jan1.wday==5|(jan1.wday==6 LY.prev)),53,52), ifelse(Yn==Y+1,1,(date.yday+(7-date.wday)+(jan1.wday-1))/7-(jan1.wday4)) ) return(list(Year=Yn,ISOWeek=Wn)) } -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging data.frames columnwise (rbind with different variables, lengths)
Dear List, I have two dataframes with overlapping colnames and want to merge them. Actually, what I want is more similar to rbind, but the dataframes differ in their columns. Here are the examples: df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) Here the desired result: A BC 1 1 m at home 2 2 f away 3 2 NA at home Thanks for any help, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data.frames columnwise (rbind with different variables, lengths)
Dear Stefan, Why not use merge() if you want to merge two datasets? ;-) df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) merge(df1, df2, all = TRUE) HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Stefan Uhmann Verzonden: donderdag 11 december 2008 13:46 Aan: r-help@r-project.org Onderwerp: [R] merging data.frames columnwise (rbind with different variables,lengths) Dear List, I have two dataframes with overlapping colnames and want to merge them. Actually, what I want is more similar to rbind, but the dataframes differ in their columns. Here are the examples: df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) Here the desired result: A BC 1 1 m at home 2 2 f away 3 2 NA at home Thanks for any help, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting the name of an object into a character string
Dear List, I am writing a function in R with the facility to store models for later use in scoring. It would be very useful if I could include in the name of the file stored the name of the model object being stored, this name being chosen by the user in the function call. A simple function to store the name of an object as a character string would fit the bill, but I have not found one. name() doesn't appear to do what I want, maybe I'm using it wrongly. Any suggestions please ? I'm running 2.8.0 on windows XP, Thanks, Philip This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies within the Detica Group plc group of companies. Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
strftime(x, %V) E.g. strftime(as.POSIXlt(Sys.Date()), %V) is 50, and you might want as.numeric() on it. Note that this is OS-dependent, and AFAIR Windows does not have it. On Thu, 11 Dec 2008, Gustaf Rydevik wrote: Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf getweek-function(Y,M=NULL,D=NULL){ if(!class(Y)[1]%in%c(Date,POSIXt)) { date.posix-strptime(paste(c(Y,M,D),collapse=-),%Y-%m-%d) } if(class(Y)[1]%in%c(POSIXt,Date)){ date.posix-as.POSIXlt(Y) Y-as.numeric(format(date.posix,%Y)) M-as.numeric(format(date.posix,%m)) D-as.numeric(format(date.posix,%d)) } LY- (Y%%4==0 !(Y%%100==0))|(Y%%400==0) LY.prev- ((Y-1)%%4==0 !((Y-1)%%100==0))|((Y-1)%%400==0) date.yday-date.posix$yday+1 jan1.wday-strptime(paste(Y,01-01,sep=-),%Y-%m-%d)$wday jan1.wday-ifelse(jan1.wday==0,7,jan1.wday) date.wday-date.posix$wday date.wday-ifelse(date.wday==0,7,date.wday) If the date is in the beginning, or end of the year, ### does it fall into a week of the previous or next year? Yn-ifelse(date.yday=(8-jan1.wday)jan1.wday4,Y-1,Y) Yn-ifelse(Yn==Y((365+LY-date.yday)(4-date.wday)),Y+1,Y) ##Set the week differently if the date is in the beginning,middle or end of the year Wn-ifelse( Yn==Y-1, ifelse((jan1.wday==5|(jan1.wday==6 LY.prev)),53,52), ifelse(Yn==Y+1,1,(date.yday+(7-date.wday)+(jan1.wday-1))/7-(jan1.wday4)) ) return(list(Year=Yn,ISOWeek=Wn)) } -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
A slightly simpler version is format(Sys.Date(), %V) On Thu, 11 Dec 2008, Prof Brian Ripley wrote: strftime(x, %V) E.g. strftime(as.POSIXlt(Sys.Date()), %V) is 50, and you might want as.numeric() on it. Note that this is OS-dependent, and AFAIR Windows does not have it. On Thu, 11 Dec 2008, Gustaf Rydevik wrote: Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf getweek-function(Y,M=NULL,D=NULL){ if(!class(Y)[1]%in%c(Date,POSIXt)) { date.posix-strptime(paste(c(Y,M,D),collapse=-),%Y-%m-%d) } if(class(Y)[1]%in%c(POSIXt,Date)){ date.posix-as.POSIXlt(Y) Y-as.numeric(format(date.posix,%Y)) M-as.numeric(format(date.posix,%m)) D-as.numeric(format(date.posix,%d)) } LY- (Y%%4==0 !(Y%%100==0))|(Y%%400==0) LY.prev- ((Y-1)%%4==0 !((Y-1)%%100==0))|((Y-1)%%400==0) date.yday-date.posix$yday+1 jan1.wday-strptime(paste(Y,01-01,sep=-),%Y-%m-%d)$wday jan1.wday-ifelse(jan1.wday==0,7,jan1.wday) date.wday-date.posix$wday date.wday-ifelse(date.wday==0,7,date.wday) If the date is in the beginning, or end of the year, ### does it fall into a week of the previous or next year? Yn-ifelse(date.yday=(8-jan1.wday)jan1.wday4,Y-1,Y) Yn-ifelse(Yn==Y((365+LY-date.yday)(4-date.wday)),Y+1,Y) ##Set the week differently if the date is in the beginning,middle or end of the year Wn-ifelse( Yn==Y-1, ifelse((jan1.wday==5|(jan1.wday==6 LY.prev)),53,52), ifelse(Yn==Y+1,1,(date.yday+(7-date.wday)+(jan1.wday-1))/7-(jan1.wday4)) ) return(list(Year=Yn,ISOWeek=Wn)) } -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
format(d, %U) and format(d, %W) give week numbers using different conventions. See ?strptime On Thu, Dec 11, 2008 at 7:43 AM, Gustaf Rydevik [EMAIL PROTECTED] wrote: Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf getweek-function(Y,M=NULL,D=NULL){ if(!class(Y)[1]%in%c(Date,POSIXt)) { date.posix-strptime(paste(c(Y,M,D),collapse=-),%Y-%m-%d) } if(class(Y)[1]%in%c(POSIXt,Date)){ date.posix-as.POSIXlt(Y) Y-as.numeric(format(date.posix,%Y)) M-as.numeric(format(date.posix,%m)) D-as.numeric(format(date.posix,%d)) } LY- (Y%%4==0 !(Y%%100==0))|(Y%%400==0) LY.prev- ((Y-1)%%4==0 !((Y-1)%%100==0))|((Y-1)%%400==0) date.yday-date.posix$yday+1 jan1.wday-strptime(paste(Y,01-01,sep=-),%Y-%m-%d)$wday jan1.wday-ifelse(jan1.wday==0,7,jan1.wday) date.wday-date.posix$wday date.wday-ifelse(date.wday==0,7,date.wday) If the date is in the beginning, or end of the year, ### does it fall into a week of the previous or next year? Yn-ifelse(date.yday=(8-jan1.wday)jan1.wday4,Y-1,Y) Yn-ifelse(Yn==Y((365+LY-date.yday)(4-date.wday)),Y+1,Y) ##Set the week differently if the date is in the beginning,middle or end of the year Wn-ifelse( Yn==Y-1, ifelse((jan1.wday==5|(jan1.wday==6 LY.prev)),53,52), ifelse(Yn==Y+1,1,(date.yday+(7-date.wday)+(jan1.wday-1))/7-(jan1.wday4)) ) return(list(Year=Yn,ISOWeek=Wn)) } -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data.frames columnwise (rbind with different variables, lengths)
Dear Stefan, You could use rbind.fill() from reshape package : install.packages(reshape) library(reshape) df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) rbind.fill(df1, df2) AB C 1 1m at home 2 2faway 3 2 NA at home 2008/12/11 Stefan Uhmann [EMAIL PROTECTED] Dear List, I have two dataframes with overlapping colnames and want to merge them. Actually, what I want is more similar to rbind, but the dataframes differ in their columns. Here are the examples: df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) Here the desired result: A BC 1 1 m at home 2 2 f away 3 2 NA at home Thanks for any help, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simplex function in R
re pseudo inverse On the point of generalised inverses - GINV is usually taken to mean the moore-penrose pseudo inverse - this is the least squares projection. There are others - e.g. the Drazin inverse which amounts to diagonalisation - of course this inverse may not be available in R. Gerard Gabor Grothendieck [EMAIL PROTECTED] To ail.com Chris Line Sent by: [EMAIL PROTECTED] [EMAIL PROTECTED] cc project.org r-help@r-project.org r-help@r-project.org Subject 11/12/2008 12:14 Re: [R] Simplex function in R Try the pseudo inverse: m - rbind(c(1, 1, 1), c(1, 0, 1), c(0, 1, 0)) b - c(5, 2, 3) library(MASS) ginv(m) %*% b On Thu, Dec 11, 2008 at 2:20 AM, Chris Line [EMAIL PROTECTED] wrote: I have a set of linear equations and would like to find any feasible solution. A simplex solution works in Case 1 below, but not in Case 2. I would be grateful for any help. Case 1: Find any feasible solution for the set of linear equations: a + b + c = 5 a + b + 0c = 4 0a + b + c = 4 Solution - a feasible (and unique) solution is a=1, b=3, c=1. The following R code returns a feasible solution: A3M = matrix(c(1,1,0,1,1,1,1,0,1),nrow=3) b3M = matrix(c(5,4,4),ncol=1) A1M = matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) b1M = matrix(c(10,10,10),ncol=1) AM = matrix(c(1,1,1),nrow=1) simplex(a = AM, A1 = A1M, b1 = b1M, A2 = NULL, b2 = NULL, A3 = A3M, b3 = b3M, maxi = TRUE) Case 2: Find any feasible solution for the set of linear equations a + b + c = 5 a + 0b + c = 2 0a + b + 0c = 3 Solution - one feasible solution of many is a=1, b=3, c=1. There are infinite possible solutions in Case 2. However, the following R code fails to return any feasible solution: A3M = matrix(c(1,1,0,1,0,1,1,1,0),nrow=3) b3M = matrix(c(5,2,3),ncol=1) A1M = matrix(c(1,0,0,0,1,0,0,0,1),nrow=3) b1M = matrix(c(10,10,10),ncol=1) AM = matrix(c(1,1,1),nrow=1) simplex(a = AM, A1 = A1M, b1 = b1M, A2 = NULL, b2 = NULL, A3 = A3M, b3 = b3M, maxi = TRUE) The code returns the error: Error in A.out[, basic] - iden(M) : subscript out of bounds Am I using the Simplex function incorrectly? There may be a better way to approach the problem of finding a feasible solution. Cheers, Chris. The contents of this email including any attachments are confidential. If you have received this email in error, please advise the sender by return email and delete this email. Any unauthorised use of the contents of the email is prohibited and you must not disseminate, copy or distribute the message or use the information contained in the email or its attachments in any way. The views or opinions expressed are the author's own and may not reflect the views or opinions of Tibra. Tibra does not guarantee the integrity of any emails or attached files. E-mails may be interfered with, may contain computer viruses or other defects. Under no circumstances do we accept liability for any loss or damage which may result from your receipt of this message or any attachments. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any
[R] Property of cov(x) - matrix, some elements of x missing
Dear List Members, this question is not directly related to R but follows up a comment in the documentation of 'cov'. In the case of missing values in the input matrix 'x' there are various options specified via the parameter 'use'. One of them being 'pairwise.complete.obs'. In the 'Details' section it is mentioned that this may result in covariance matrices that are not positive semi-definite. I assume that this is a well known property, however I have not been successful in finding suitable literature in explaining/documenting this phenomenon. If some one of you could give me a hint i would be very grateful. Best, Lukas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A package to set up a questionnaire enter data
Not an R package, but EpiData is free software designed to to exactly this. It's a wonderful piece of software. Define fields, add annotations, provide defaults, provide allowable values or ranges, calculate one field based on entry to another, conditional skipping from one question to another depending on answer, relational database capaability, double-entry verification, and more. I don't think it exports directly to Rdata format, but it can export to plain text, SAS, Stata, dBase, and Excel, any of which I think can be read into R. --Chris Ryan Original message Date: Thu, 11 Dec 2008 12:51:10 +0100 From: David Croll [EMAIL PROTECTED] Subject: Re: [R] A package to set up a questionnaire enter data To: r-help@r-project.org Hello, For entering data alone, you would not need a package. For simple questionnaires, you could write a function. It could go like this. For example you want to record people's names and their ages: # Sets up an empty database database - c() enter_data - function() { show(Enter name:) name - as.character(readline()) show(Enter age:) age - as.numeric(readline()) # Appends data from one questionnaire to the # database database - rbind(database,data.frame(name,age)) # Calls the function in order to proceed # with another questionnaire enter_data() # stop the function using the stop button when you are finished } exporting database into a CSV or a text file should not be a problem with write.csv() or write.csv2(). Kind regards, David Croll __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ref card for data manipulation?
Hi, Good idea, what do you say we try and write a page on this in the R wiki? I started the topic: http://wiki.r-project.org/rwiki/doku.php?id=guides:overview-data-manip Once the content is there, it wouldn't be much of an effort to create a reference-card format if required. Best wishes, baptiste On 11 Dec 2008, at 12:38, Vitalie Spinu wrote: On Wed, 10 Dec 2008 18:29:43 +0100, Peter Dalgaard [EMAIL PROTECTED] wrote: You (as many before you) have overlooked the ave() function, which can replace the ordering as well the do.call(c,tapply()) Majority of questions on this list concern data manipulation. Many are repetitive. Overlooking like that will always happen unless some comprehensive data manipulation documentation is made. I think many people would benefit if a specialized data.manip ref.card were conceived. Tom Short's card is an excellent one but it does not cover high level packages like plyr, reshape, DoBy, and a few base data.manip functions are not there as well. Vitalie. _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to get the CDF of a density() estimation?
Hi, I've estimated a simple kernel density of a univariate variable with density(), but after I would like to find out the CDF at specific values. How can I do it? thanks for your help, with it I am very close to finish my first little bit more serious work in R, Viktor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get the CDF of a density() estimation?
On Thu, 11 Dec 2008 14:28:31 +0100, Viktor Nagy wrote: VN Hi, VN VN I've estimated a simple kernel density of a univariate variable with VN density(), but after I would like to find out the CDF at specific VN values. VN How can I do it? VN Answer 1. Use approfun to interpolate the outcome from density() and then use integrate(). The following lines show a *crude* coding of this idea: R x- rnorm(200) R pdf- density(x) R f- approxfun(pdf$x, pdf$y, yleft=0, yright=0) R cdf-integrate(f, -Inf, 2) # replace '2' by any other value. Answer 2. Do not integrate the estimated density, since this is not the most efficient estimate of the underlying CDF. Instead, smooth the empirical distribution function, using a smaller bandwidth of the kernel. The optimal bandwith for kernel density estimation is of order 0(n^{-1/5}), while for CDF estimation is O(n^{-1/3}), if n denotes the sample size. In practical terms you can still use density(), as indicated above, but selecting a suitably smaller bandwith compared to the one used for density estimation. Best wishes Adelchi Azzalini -- Adelchi Azzalini [EMAIL PROTECTED] Dipart.Scienze Statistiche, Università di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical inconsistency
Hi Kenn, Well I think your use or round isn't optimal solution. If you using round(x,1)-round(x,1) you create 2 problems First: error propagation because you make 2 round. Second: you don't using guard digits approach. The optimal use of round is using in last calculation: Look this round(8.8,1)-round(7.8,1)1 [1] TRUE round(8.8-7.8,1)1 [1] FALSE round(8.8-7.8,1)==1 [1] TRUE Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil -- Original Message --- From: Kenn Konstabel [EMAIL PROTECTED] To: emma jane [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Thu, 11 Dec 2008 11:53:01 +0200 Subject: Re: [R] Logical inconsistency Rounding can do no good because round(8.8,1)-round(7.8,1)1 # still TRUE round(8.8)-round(7.7)1 # FALSE What you might do is compute a-b-1 and compare it to a very small number: (8.8-7.8-1) 1e-10 # TRUE K On Wed, Dec 10, 2008 at 11:47 AM, emma jane [EMAIL PROTECTED] wrote: Thanks Greg, that does make sense. And I've solved the problem by rounding the variables before taking the difference between them. Thanks to all who replied. Emma Jane From: Greg Snow [EMAIL PROTECTED] .com.br; Wacek Kusnierczyk [EMAIL PROTECTED]; Chuck Cleland [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Tuesday, 9 December, 2008 16:30:08 Subject: RE: [R] Logical inconsistency Some (possibly all) of those numbers cannot be represented exactly, so there is a chance of round off error whenever you do some arithmetic, sometimes the errors cancel out, sometimes they don't. Consider: print(8.3-7.3, digits=20) [1] 1.001 print(11.3-10.3, digits=20) [1] 1 So in the first case the rounding error gives a value that is slightly greater than 1, so the greater than test returns true (if you round the result before comparing to 1, then it will return false). In the second case the uncertainties cancelled out so that you get exactly 1 which is not greater than 1 an so the comparison returns false. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of emma jane Sent: Tuesday, December 09, 2008 7:02 AM To: Bernardo Rangel Tura; Wacek Kusnierczyk; Chuck Cleland Cc: R help Subject: Re: [R] Logical inconsistency Many thanks for your help, perhaps I should have set my query in context ! I'm simply calculating an indicator variable [0,1] based on the whether the difference between two measured variables is 1 or =1. I understand the FAQ about floating point arithmetic, but am still puzzled that it only apparently applies to certain elements, as follows: 8.8 - 7.8 1 TRUE 8.3 - 7.3 1 TRUE However, 10.2 - 9.2 1 FALSE 11.3 - 10.31  FALSE Emma Jane From: Bernardo Rangel Tura [EMAIL PROTECTED] To: Wacek Kusnierczyk [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Saturday, 6 December, 2008 10:00:48 Subject: Re: [R] Logical inconsistency On Fri, 2008-12-05 at 14:18 +0100, Wacek Kusnierczyk wrote: Berwin A Turlach wrote: Dear Emma, On Fri, 5 Dec 2008 04:23:53 -0800 (PST) Please could someone kindly explain the following inconsistencies I've discovered__when performing logical calculations in R: 8.8 - 7.8 1 TRUE 8.3 - 7.3 1 TRUE Gladly: FAQ 7.31 http://cran.at.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R- th ink-these-numbers-are-equal_003f well, this answer the question only partially. this explains why a system with finite precision arithmetic, such as r, will fail to be logically correct in certain cases. it does not explain why r, a language said to isolate a user from the underlying implementational choices, would have to fail this way. there is, in principle, no problem in having a high-level language perform the computation in a logically consistent way. for example, bc is an arbitrary precision calculator language, and has no problem with examples as the above: bc 8.8 - 7.8 1 # 0, meaning 'no' bc 8.3 - 7.3 1 # 0, meaning 'no' bc 8.8 - 7.8 == 1 # 1, meaning 'yes' the fact that r (and many others, including matlab and sage, perhaps not mathematica) does not perform logically here is a consequence of its implementation of floating point arithmetic. the faq you were pointed to, and its referring to the goldberg's article, show that r does not successfully isolate a user from details of the
Re: [R] ref card for data manipulation?
__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ref card for data manipulation?
You (as many before you) have overlooked the ave() function, which can replace the ordering as well the do.call(c,tapply()) Majority of questions on this list concern data manipulation. Many are repetitive. Overlooking like that will always happen unless some comprehensive data manipulation documentation is made. I think many people would benefit if a specialized data.manip ref.card were conceived. I like the idea, but is a reference card really enough? To me, what most people need to tackle data manipulation problems is a broad strategy, not a list of useful functions. plyr is a codification of my most recent ideas on one such strategy: splitting a big data structure into smaller pieces, applying a function to each piece and then joining them back together. Just recognising your problem can be solved with this strategy is a big step forward, the functions in plyr just save you some typing and a bit of thought compared to doing it in base R. Recognising this strategy has helped me in my own data manipulation problems - many tasks with which I used to struggle are now easy to solve, not just because of plyr, but because I have a framework in which to think about the problem. But this is just one strategy and there must be many more common strategies waiting to be identified. I think working on this would be time better spent - describing a strategy gives people the tools to help themselves. (Of course this doesn't help the people who just want canned answers, but I'm less interested in helping them) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get the CDF of a density() estimation?
On Thu, 11 Dec 2008 15:11:06 +0100, Adelchi Azzalini wrote: AA AA In practical terms you can still use density(), as indicated above, but AA selecting a suitably smaller bandwith compared to the one used for AA density estimation. PS. of course this is numerically most inefficient... Adelchi Azzalini -- Adelchi Azzalini [EMAIL PROTECTED] Dipart.Scienze Statistiche, Università di Padova, Italia tel. +39 049 8274147, http://azzalini.stat.unipd.it/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Principal Component Analysis - Selecting components? + right choice?
If you're intending to create a model using PCs as predictors, select the PCs based on whether they contribute significanctly to the model fit. In chemometrics (multivariate stats in chemistry, among other things), if we're expecting 3 or 4 PC's to be useful in a principal component regression, we'd generally start with at least the first half-dozen or so and let the model fit sort them out. The reason for not preselecting too rigorously early on is that there's no guarantee at all that the first couple of PC's are good predictors for what you're interested in. The're properties of the predictor set, not of the response set. Mind you, there used to be something of a gap between chemometrics and proper statistics; I'm sure chemometricians used to do things with data that would turn a statistician pale. You could also look for a PLS model, which (if I recall correctly) actually uses the response data to select the latent variables used for prediction. S Corrado [EMAIL PROTECTED] 11/12/2008 11:46:37 Dear R gurus, I have some climatic data for a region of the world. They are monthly averages 1950 -2000 of precipitation (12 months), minimum temperature (12 months), maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and I have around 75,000 cells. I need to feed them into a statistical model as co-variates, to use them to predict a response variable. The climatic data are obviously correlated: precipitation for January is correlated to precipitation for February and so on even precipitation and temperature are heavily correlated. I did some correlation analysis and they are all strongly correlated. I though of running PCA on them, in order to reduce the number of co-variates I feed into the model. I run the PCA using prcomp, quite successfully. Now I need to use a criteria to select the right number of PC. (that is: is it 1,2,3,4?) What criteria would you suggest? At the moment, I am using a criteria based on threshold, but that is highly subjective, even if there are some rules of thumb (Jolliffe,Principal Component Analysis, II Edition, Springer Verlag,2002). Could you suggest something more rigorous? By the way, do you think I would have been better off by using something different from PCA? Best, -- Corrado Topi Global Climate Change Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Scheme
Peter Dalgaard wrote: Wacek Kusnierczyk wrote: Peter Dalgaard wrote: Johannes Huesing wrote: Stavros Macrakis [EMAIL PROTECTED] [Wed, Dec 10, 2008 at 04:59:25AM CET]: So I conclude that what is really meant by R semantics are based on Scheme semantics is R has functions as first-class citizens and a correct implementation of lexical scope, including upwards funarg. One other thing reminiscient of Lisp is the infix notation (as in +(1, 3)), which the authors have sprinkled with enough syntactic sugar that the users needn't be bothered with. To the benefit of ubiquity, I'd think. That's prefix notation, infix is 1+3 (and postfix is 1,3,+ as in old HP calculators). But you're right that R has Lisp-like parse trees with a thin layer of syntactic sugar: Lisp writes function calls as (f x y) for f(x,y) and (+ 1 3) for 1+3. In R we have e - quote(f(x,y)) e[[1]];e[[2]]; e[[3]] f x y e - quote(1+3) e[[1]];e[[2]]; e[[3]] `+` [1] 1 [1] 3 the reminiscence is limited, though. the following won't do: `+`(1,2,3) and quote(1+2+3) is not a list of `+`, 1, 2, and 3. vQ Essentially irrelevant. You have to distinguish between form and function, and it is not important that the two languages contain slightly different definitions and semantics of particular functions. The point is that the general _form_ of the parse tree is the same. and what does the form of the syntax tree have to do with lisp-likeness? in java, c, etc., the string 1 + 2 + 3 would be parsed into a tree that has the same shape as in the case of r: a '+' in the root, '1' in one branch, and in the other branch a tree with a '+' as the root and '2' and '3' in the branches. The point is that the general _form_ of the parse tree is the same. -- does it make java or c resemble lisp? Because of the syntactic sugar, R does not have `+` equivalent to `sum` like LISP does. `+` is a binary (or unary) operator and 1+2+3 parses as LISPs (+ (+ 1 2) 3). the point is, it's not a problem of the binary-ness of the particular function `+`. it's that an expression where the same operator is used infix more than once in a row is parsed and evaluated as a nested sequence of calls, not as a single call with multiple arguments: `%+%` = function(a, b, c=0) { print (c(a,b,c)); a+b+c } 1 %+% 2 %+% 3 `%+%`(1,2,3) vQ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeated searching of no-missing values
dear Hadley and Bert, thank you very much for your suggestions. I asked one question and I learned 2 things: 1. Hadley, library(plyr) ddply(data, .(V1), colwise(cl)) that is exactly what I was searching for. 2. Bert, ?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values. I admit I payed not enough attention to the definition of **atomic** vector. That implies a deeper understanding of structures of data. I'm working with! Many thanks, Patrizio __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] call lattice function in a function passing
If however I wanted to call the function densityplot within a function and pass the groups argument as an argument of that function, how would I have to proceed? It is not as straightforward as f - function(data, groupvar) { densityplot(~ x, data, groups = groupvar) } probably because the lattice function densityplot.formula preprocesses the groups argument with groups - eval(substitute(groups), data, environment(formula)) It there a way how I could pass the groups argument in the function f? Here's one approach. Pass the 'groups' variable as a character, then find that variable in the data frame and rename it. d - data.frame(x = rnorm(100), y = c(a, b)) f - function(data, groupvar) { names(data)[which(names(data) == groupvar)] - gp densityplot(~ x, data, groups = gp) } f(d, groupvar=y) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Validity of GLM using Gaussian family with sqrt link
Dear Prof. Ripley, Thank you for your quick response. (A) link-sqrt is a name and not accepted. link=sqrt is a literal character string, and is. I am not entirely sure whether I understand that statement but this is what I found out. If I specify family=gaussian(link=sqrt), the glm() fails to run because it is not a default link (so, I understand this part). Following Venables and Ripley (2002): summary(glm(cnt~herbc+herbht,data=sotr,family=gaussian(link=sqrt),start=c(0.1,-0.004,0.01))) Call: glm(formula = cnt ~ herbc + herbht, family = gaussian(link = sqrt), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.462211 0.043475 10.632 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** AIC: 3235.0 summary(glm(cnt~herbc+herbht,data=sotr,family=quasi(link=power(0.5),variance=constant),start=c(0.1,-0.004,0.01))) Call: glm(formula = cnt ~ herbc + herbht, family = quasi(link = power(0.5)), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.462211 0.043475 10.632 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** AIC: NA Notice that the parameter estimates and corresponding standard errors are identical. So, my interpretation is that family=gaussian(link=sqrt) is identical as specify family=quasi(link=power(0.5)) in glm(). The exception is that AIC (and thus maximized log-likelihood values) can be computed for family=gaussian(link=sqrt). The questions are: (A.1) Is this interpretation correct? (A.2) If (A.1) is true, does family=gaussian(link=sqrt) implies that I am doing a Generalized Linear Model with normal distribution and the link function is: sqrt(mu) = b0+b1(herbc)+b2(herbht)? (B) In less technical terms, in model 1 you compute the likelihood from probabilities and in model 2 from probability densities, and the latter depend on the units of measurement. Yes, you are correct and I understand it now. Although not as common these days, some small mammal studies still use sqrt transformation of count as response variable and carry out a linear model fitting with predictors (via least squares). So, the exercise that I got into is to compare performances of linear model with sqrt transformation of count and GLM with Poisson. However, knowing that we can't compare logLik or AIC based on different measures of responses. So, I thought that comparison under GLM framework might be an approach closer to the intention. Thank again for your quick respond and advices. I appreciate it very much. Best regards, TzengYih Lam --- Ph.D. student College of Forestry Oregon State University -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Wed 12/10/2008 11:45 PM To: Lam, Tzeng Yih Cc: r-help@r-project.org Subject: Re: [R] Validity of GLM using Gaussian family with sqrt link a) There is a difference between link=sqrt and link=sqrt. link: a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector or an object of class 'link-glm' (such as generated by 'make.link') provided it is not specified _via_ one of the standard names given next. link-sqrt is a name and not accepted. link=sqrt is a literal character string, and is. b) Your first model is a model for integer observations, the second for continuous observations. As such, the log-likleihoods are computed with respect to different reference measures and are not comparable. In less technical terms, in model 1 you compute the likelihood from probabilities and in model 2 from probability densities, and the latter depend on the units of measurement. On Wed, 10 Dec 2008, Lam, Tzeng Yih wrote: Dear all, I have the following dataset: each row corresponds to count of forest floor small mammal captured in a plot and vegetation characteristics measured at that plot sotr plot cnt herbc herbht 1 1A1 0 37.08 53.54 2 1A3 1 36.27 26.67 3 1A5 0 32.50 30.62 4 1A7 0 56.54 45.63 5 1B2 0 41.66 38.13 6 1B4 0 32.08 37.79 7 1B6 0 33.71 30.62 ... I am interested in comparing fit of different specification of Generalized Linear Models (although there are some issues with using AIC or BIC for comparison, but this is the question that I like to post here). Here are two of the several models that I am interested in: (1) Poission log-linear model pois-glm(cnt~herbc+herbht,family=poisson,data=sotr) summary(pois) Call: glm(formula = cnt ~ herbc + herbht, family = poisson, data = sotr) Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -1.341254 0.089969
Re: [R] Logical inconsistency
Hi, I agree -- and my examples using round were meant as bad and dangerous examples. Using round at the last step is better and may solve the problem, but in your example ... round(8.8-7.8,1)==1 [1] TRUE ... you have to know in advance how many decimal places can possibly make a difference (is it just one? maybe it is 2? 3? 15?). round(8.8-7.8,14)==1 [1] TRUE round(8.8-7.8,15)==1 [1] FALSE ... or, equivalently, 8.8-7.8-1 1e-15 [1] TRUE 8.8-7.8-1 1e-16 [1] FALSE Best regards, Kenn On Thu, Dec 11, 2008 at 4:18 PM, Bernardo Rangel Tura [EMAIL PROTECTED]wrote: Hi Kenn, Well I think your use or round isn't optimal solution. If you using round(x,1)-round(x,1) you create 2 problems First: error propagation because you make 2 round. Second: you don't using guard digits approach. The optimal use of round is using in last calculation: Look this round(8.8,1)-round(7.8,1)1 [1] TRUE round(8.8-7.8,1)1 [1] FALSE round(8.8-7.8,1)==1 [1] TRUE Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil -- Original Message --- From: Kenn Konstabel [EMAIL PROTECTED] To: emma jane [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Thu, 11 Dec 2008 11:53:01 +0200 Subject: Re: [R] Logical inconsistency Rounding can do no good because round(8.8,1)-round(7.8,1)1 # still TRUE round(8.8)-round(7.7)1 # FALSE What you might do is compute a-b-1 and compare it to a very small number: (8.8-7.8-1) 1e-10 # TRUE K On Wed, Dec 10, 2008 at 11:47 AM, emma jane [EMAIL PROTECTED] wrote: Thanks Greg, that does make sense. And I've solved the problem by rounding the variables before taking the difference between them. Thanks to all who replied. Emma Jane From: Greg Snow [EMAIL PROTECTED] .com.br; Wacek Kusnierczyk [EMAIL PROTECTED]; Chuck Cleland [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Tuesday, 9 December, 2008 16:30:08 Subject: RE: [R] Logical inconsistency Some (possibly all) of those numbers cannot be represented exactly, so there is a chance of round off error whenever you do some arithmetic, sometimes the errors cancel out, sometimes they don't. Consider: print(8.3-7.3, digits=20) [1] 1.001 print(11.3-10.3, digits=20) [1] 1 So in the first case the rounding error gives a value that is slightly greater than 1, so the greater than test returns true (if you round the result before comparing to 1, then it will return false). In the second case the uncertainties cancelled out so that you get exactly 1 which is not greater than 1 an so the comparison returns false. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] 801.408.8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] project.org] On Behalf Of emma jane Sent: Tuesday, December 09, 2008 7:02 AM To: Bernardo Rangel Tura; Wacek Kusnierczyk; Chuck Cleland Cc: R help Subject: Re: [R] Logical inconsistency Many thanks for your help, perhaps I should have set my query in context ! I'm simply calculating an indicator variable [0,1] based on the whether the difference between two measured variables is 1 or =1. I understand the FAQ about floating point arithmetic, but am still puzzled that it only apparently applies to certain elements, as follows: 8.8 - 7.8 1 TRUE 8.3 - 7.3 1 TRUE However, 10.2 - 9.2 1 FALSE 11.3 - 10.31  FALSE Emma Jane From: Bernardo Rangel Tura [EMAIL PROTECTED] To: Wacek Kusnierczyk [EMAIL PROTECTED] Cc: R help [EMAIL PROTECTED] Sent: Saturday, 6 December, 2008 10:00:48 Subject: Re: [R] Logical inconsistency On Fri, 2008-12-05 at 14:18 +0100, Wacek Kusnierczyk wrote: Berwin A Turlach wrote: Dear Emma, On Fri, 5 Dec 2008 04:23:53 -0800 (PST) Please could someone kindly explain the following inconsistencies I've discovered__when performing logical calculations in R: 8.8 - 7.8 1 TRUE 8.3 - 7.3 1 TRUE Gladly: FAQ 7.31 http://cran.at.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R- th ink-these-numbers-are-equal_003f well, this answer the question only partially. this explains why a system with finite precision arithmetic, such as r, will fail to be logically correct in certain cases. it does not explain why r, a language said to isolate a user from the underlying implementational choices, would have to fail this way. there is, in principle, no problem
[R] Sorting problem
Sys.setlocale(,C) x1 - as.character(date()) # I use date to record the time, and save it to sqlite database, to it converted to character x1_2 - strptime(x1, %a %b %d %H:%M:%S %Y) x2 - as.character(date()) x2_2 - strptime(x2, %a %b %d %H:%M:%S %Y) X-c(x1_2,x2_2) order(X) ## I want to get the permutation other than the sorted vector. ## order(X) works in windows but not Linux. any alternative way to the the permutation? -- HUANG Ronggui, Wincent Tel: (00852) 3442 3832 PhD Candidate, City University of Hong Kong Website: http://ronggui.huang.googlepages.com/ RQDA project: http://rqda.r-forge.r-project.org/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Combining tplot and barplot
Hello! I am pretty new to R statistics and trying to figure out how to show the barplot and the single data points in one single plot. I am using the tplot code from the following site for showing the single datapoints: http://biostat.mc.vanderbilt.edu/twiki/pub/Main/TatsukiRcode/ TatsukiRcodeTplot.r I have two sets of data (x,y) that I want to compare, by plotting a Barplot and the single data points. Unfortunately they are shifted, so that the data points are not right on top of the bars. The code I used for plotting the data: x-c(835,728,2281,1049,1574,1340,621) y-c(466,922,2647,914,1342,998,655) aver-c(mean(x),mean(y)) barplot(aver,col=0,ylim=range(0,2750)) tplot(x,y,add=TRUE,axes=FALSE) I would be grateful, if someone could tell me, what the problem (and hopefully the solution) might be! Thank you very much! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply within a data.frame: a simpler alternative?
You might take a look at the transformBy function in the doBy package For example, new.df=transformBy(~group,data=my.df, new=y/max(y)) David Freedman baptiste auguie-2 wrote: Dear list, I have a data.frame with x, y values and a 3-level factor group, say. I want to create a new column in this data.frame with the values of y scaled to 1 by group. Perhaps the example below describes it best: x - seq(0, 10, len=100) my.df - data.frame(x = rep(x, 3), y=c(3*sin(x), 2*cos(x), cos(2*x)), # note how the y values have a different maximum depending on the group group = factor(rep(c(sin, cos, cos2), each=100))) library(reshape) df.melt - melt(my.df, id=c(x,group)) # make a long format df.melt - df.melt[ order(df.melt$group) ,] # order the data.frame by the group factor df.melt$norm - do.call(c, tapply(df.melt$value, df.melt$group, function(.v) {.v / max(.v)})) # calculate the normalised value per group and assign it to a new column library(lattice) xyplot(norm + value ~ x,groups=group, data=df.melt, auto.key=T) # check that it worked This procedure works, but it feels like I'm reinventing the wheel using hammer and saw. I tried to use aggregate, by, ddply (plyr package), but I coudn't find anything straight-forward. I'll appreciate any input, Baptiste _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - David Freedman Atlanta -- View this message in context: http://www.nabble.com/tapply-within-a-data.frame%3A-a-simpler-alternative--tp20939647p20958347.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] passing on the ... arguments in a progammed graphics function
Hi everyone, I want to write a wrapper function that uses the hist() function. Now I want to allow the hist breaks argument as optional in my function. So when my function contains the breaks argument I want the hist() function to use it, if not, I want the hist() function to use its default for breaks. How can I do that? myFunction - function(data, ...) # breaks= as optional here { hist(data, breaks=???) } TIA Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Downloading Reuters data from R
Hi Shubha I have created an extension DLL for downloading time series data from Reuters. You can download it from here: http://www.theresearchkitchen.com/blog/archives/287 There is also a short manual available at the same location: http://www.theresearchkitchen.com/blog/wp-content/uploads/2008/12/intro.pdf I am currently in the process of uploading a separate extension DLL for retrieval of real-time data from Reuters. Thanks Rory Rory Winston RBS Global Banking Markets Office: +44 20 7085 4476 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Shubha Vishwanath Karanth Sent: 11 December 2008 07:41 To: [EMAIL PROTECTED] Subject: [R] Downloading Reuters data from R Hi R, Can we download Reuters (3000 Xtra) data from R? Does ODBC package help me in this? Or otherwise, is there a way to extract daily closing prices data of Reuters from R? Thank you very much, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. Authorised and regulated by the Financial Services Authority This e-mail message is confidential and for use by the=2...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Downloading Reuters data from R
Hi Shubha, I'm replying offlist because although I don't have an answer to your question, i thought i would point you in the direction of Rmetrics (http://www.rmetrics.org/rmetricsPackages.htm) as there might be something in there that may be helpful if you don't get a more direct answer from your thread. Or maybe a search on r-sig-finance (https:// stat.ethz.ch/pipermail/r-sig-finance/) might bring up something helpful to you. Hope that helps a little bit :-) Tony Breyal. On 11 Dec, 07:40, Shubha Vishwanath Karanth [EMAIL PROTECTED] wrote: Hi R, Can we download Reuters (3000 Xtra) data from R? Does ODBC package help me in this? Or otherwise, is there a way to extract daily closing prices data of Reuters from R? Thank you very much, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ref card for data manipulation?
On Thu, 11 Dec 2008 15:19:03 +0100, hadley wickham [EMAIL PROTECTED] wrote: You (as many before you) have overlooked the ave() function, which can replace the ordering as well the do.call(c,tapply()) Majority of questions on this list concern data manipulation. Many are repetitive. Overlooking like that will always happen unless some comprehensive data manipulation documentation is made. I think many people would benefit if a specialized data.manip ref.card were conceived. I like the idea, but is a reference card really enough? To me, what most people need to tackle data manipulation problems is a broad strategy, not a list of useful functions. Absolutely agree. A list of useful strategies is an excellent idea :). A wiki page for discussion? May be clipping some conceptual passages from your reshape and plyr documentation is a good start? And other strategy ideas will start flowing soon. Vitalie. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] motif search
Dear Alessia, I am very new to R and wanted to know if there is a package that, given very long nucleotide sequences, searches and identifies short (7-10nt) motifs.. I would like to look for enrichment of certain motifs in genomic sequences. I tried using MEME (not an R package, I know), but the online version only allows sequences up to MAX 6 nucleotides, and that's too short for my needs.. You may try this: # # Load the seqinr package: # library(seqinr) # # A FASTA file example - that ships with seqinr - which contains # the complete genome sequence of Chlamydia trachomatis : # fastafile - system.file(sequences/ct.fasta, package = seqinr) # # Import the sequence as a string of characters: # myseq - read.fasta(fastafile, as.string = TRUE) nchar(myseq) # 1042519, that is a Mb sequence # # Look for motif atatatat, with possible overlap: # words.pos(atatatat, myseq, extended = TRUE) # # This returns the posistions where the motif is found, that # is : 236501 236503 283987 687083 792792 792794 # substr(myseq, 236501, 236501 + 8) # # Should be # [1] atatatata # HTH, Jean -- Jean R. Lobry([EMAIL PROTECTED]) Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I, 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE allo : +33 472 43 27 56 fax: +33 472 43 13 88 http://pbil.univ-lyon1.fr/members/lobry/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing on the ... arguments in a progammed graphics function
On 12/11/2008 11:01 AM, Mark Heckmann wrote: Hi everyone, I want to write a wrapper function that uses the hist() function. Now I want to allow the hist breaks argument as optional in my function. So when my function contains the breaks argument I want the hist() function to use it, if not, I want the hist() function to use its default for breaks. How can I do that? myFunction - function(data, ...) # breaks= as optional here { hist(data, breaks=???) } If you make your call as hist(data, ...) then a breaks arg would be passed along (as would any other). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Validity of GLM using Gaussian family with sqrt link
Hi all, Just on this question : can I assume any R internal defined function can be used to describe the link (e.g. = arctan) so long as its increasing and monotone? How might abs work for example - (except at 0)? And/or finally, can I define any old function in R called myfun and use link=myfun provided myfun is a sort of nice function? Gerard Lam, Tzeng Yih [EMAIL PROTECTED] gonstate.edu To Sent by: Prof Brian Ripley [EMAIL PROTECTED] [EMAIL PROTECTED] project.orgcc r-help@r-project.org Subject 11/12/2008 15:20 Re: [R] Validity of GLM using Gaussian family with sqrt link Dear Prof. Ripley, Thank you for your quick response. (A) link-sqrt is a name and not accepted. link=sqrt is a literal character string, and is. I am not entirely sure whether I understand that statement but this is what I found out. If I specify family=gaussian(link=sqrt), the glm() fails to run because it is not a default link (so, I understand this part). Following Venables and Ripley (2002): summary(glm(cnt~herbc+herbht,data=sotr,family=gaussian(link=sqrt),start=c(0.1,-0.004,0.01))) Call: glm(formula = cnt ~ herbc + herbht, family = gaussian(link = sqrt), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.462211 0.043475 10.632 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** AIC: 3235.0 summary(glm(cnt~herbc+herbht,data=sotr,family=quasi(link=power(0.5),variance=constant),start=c(0.1,-0.004,0.01))) Call: glm(formula = cnt ~ herbc + herbht, family = quasi(link = power(0.5)), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.462211 0.043475 10.632 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** AIC: NA Notice that the parameter estimates and corresponding standard errors are identical. So, my interpretation is that family=gaussian(link=sqrt) is identical as specify family=quasi(link=power(0.5)) in glm(). The exception is that AIC (and thus maximized log-likelihood values) can be computed for family=gaussian(link=sqrt). The questions are: (A.1) Is this interpretation correct? (A.2) If (A.1) is true, does family=gaussian(link=sqrt) implies that I am doing a Generalized Linear Model with normal distribution and the link function is: sqrt(mu) = b0+b1(herbc)+b2(herbht)? (B) In less technical terms, in model 1 you compute the likelihood from probabilities and in model 2 from probability densities, and the latter depend on the units of measurement. Yes, you are correct and I understand it now. Although not as common these days, some small mammal studies still use sqrt transformation of count as response variable and carry out a linear model fitting with predictors (via least squares). So, the exercise that I got into is to compare performances of linear model with sqrt transformation of count and GLM with Poisson. However, knowing that we can't compare logLik or AIC based on different measures of responses. So, I thought that comparison under GLM framework might be an approach closer to the intention. Thank again for your quick respond and advices. I appreciate it very much. Best regards, TzengYih Lam --- Ph.D. student College of Forestry Oregon State University -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Wed 12/10/2008 11:45 PM To: Lam, Tzeng Yih Cc: r-help@r-project.org Subject: Re: [R] Validity of GLM using Gaussian family with sqrt link a) There is a difference between link=sqrt and link=sqrt. link: a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector or an object of class 'link-glm' (such as
Re: [R] A package to set up a questionnaire enter data
Thank you ryancw wrote: Not an R package, but EpiData is free software designed to to exactly this. It's a wonderful piece of software. Define fields, add annotations, provide defaults, provide allowable values or ranges, calculate one field based on entry to another, conditional skipping from one question to another depending on answer, relational database capaability, double-entry verification, and more. I don't think it exports directly to Rdata format, but it can export to plain text, SAS, Stata, dBase, and Excel, any of which I think can be read into R. --Chris Ryan Original message Date: Thu, 11 Dec 2008 12:51:10 +0100 From: David Croll [EMAIL PROTECTED] Subject: Re: [R] A package to set up a questionnaire enter data To: r-help@r-project.org Hello, For entering data alone, you would not need a package. For simple questionnaires, you could write a function. It could go like this. For example you want to record people's names and their ages: # Sets up an empty database database - c() enter_data - function() { show(Enter name:) name - as.character(readline()) show(Enter age:) age - as.numeric(readline()) # Appends data from one questionnaire to the # database database - rbind(database,data.frame(name,age)) # Calls the function in order to proceed # with another questionnaire enter_data() # stop the function using the stop button when you are finished } exporting database into a CSV or a text file should not be a problem with write.csv() or write.csv2(). Kind regards, David Croll __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/A-package-to-set-up-a-questionnaire---enter-data-tp20947237p20959328.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting problem
Sorry for the last post. I didn't use the latest version of R. It works under Linux as well for R-2.8.0 patch. Best On Thu, Dec 11, 2008 at 11:34 PM, ronggui [EMAIL PROTECTED] wrote: Sys.setlocale(,C) x1 - as.character(date()) # I use date to record the time, and save it to sqlite database, to it converted to character x1_2 - strptime(x1, %a %b %d %H:%M:%S %Y) x2 - as.character(date()) x2_2 - strptime(x2, %a %b %d %H:%M:%S %Y) X-c(x1_2,x2_2) order(X) ## I want to get the permutation other than the sorted vector. ## order(X) works in windows but not Linux. any alternative way to the the permutation? -- HUANG Ronggui, Wincent Tel: (00852) 3442 3832 PhD Candidate, City University of Hong Kong Website: http://ronggui.huang.googlepages.com/ RQDA project: http://rqda.r-forge.r-project.org/ -- HUANG Ronggui, Wincent Tel: (00852) 3442 3832 PhD Candidate, City University of Hong Kong Website: http://ronggui.huang.googlepages.com/ RQDA project: http://rqda.r-forge.r-project.org/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting the name of an object into a character string
do you mean something like the following: f - function (x) { deparse(substitute(x)) } x - 5 y - 6 z - 7 f(x) f(y) f(z) I hope it helps. Best, Dimitris Philip Whittall wrote: Dear List, I am writing a function in R with the facility to store models for later use in scoring. It would be very useful if I could include in the name of the file stored the name of the model object being stored, this name being chosen by the user in the function call. A simple function to store the name of an object as a character string would fit the bill, but I have not found one. name() doesn't appear to do what I want, maybe I'm using it wrongly. Any suggestions please ? I'm running 2.8.0 on windows XP, Thanks, Philip This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies within the Detica Group plc group of companies. Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Validity of GLM using Gaussian family with sqrt link
Please do read the help page: fortune(WTFM) applies. On Thu, 11 Dec 2008, Gerard M. Keogh wrote: Hi all, Just on this question : can I assume any R internal defined function can be used to describe the link (e.g. = arctan) so long as its increasing and monotone? How might abs work for example - (except at 0)? No. And/or finally, can I define any old function in R called myfun and use link=myfun provided myfun is a sort of nice function? No. From the help page: link: a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector or an object of class 'link-glm' (such as generated by 'make.link') provided it is not specified _via_ one of the standard names given next. You have to specify a *model link function*, as in gaussian(link=arctan) Error in switch(link, logit = { : ‘arctan’ link not recognised Only those known by name (to make.link) or an object of the specified class can be used. Gerard Lam, Tzeng Yih [EMAIL PROTECTED] gonstate.edu To Sent by: Prof Brian Ripley [EMAIL PROTECTED] [EMAIL PROTECTED] project.orgcc r-help@r-project.org Subject 11/12/2008 15:20 Re: [R] Validity of GLM using Gaussian family with sqrt link Dear Prof. Ripley, Thank you for your quick response. (A) link-sqrt is a name and not accepted. link=sqrt is a literal character string, and is. I am not entirely sure whether I understand that statement but this is what I found out. If I specify family=gaussian(link=sqrt), the glm() fails to run because it is not a default link (so, I understand this part). Following Venables and Ripley (2002): summary(glm(cnt~herbc+herbht,data=sotr,family=gaussian(link=sqrt),start=c(0.1,-0.004,0.01))) Call: glm(formula = cnt ~ herbc + herbht, family = gaussian(link = sqrt), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.462211 0.043475 10.632 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** AIC: 3235.0 summary(glm(cnt~herbc+herbht,data=sotr,family=quasi(link=power(0.5),variance=constant),start=c(0.1,-0.004,0.01))) Call: glm(formula = cnt ~ herbc + herbht, family = quasi(link = power(0.5)), data = sotr, start = c(0.1, -0.004, 0.01)) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 0.462211 0.043475 10.632 2e-16 *** herbc -0.003315 0.001661 -1.996 0.0461 * herbht 0.010241 0.001291 7.935 4.86e-15 *** AIC: NA Notice that the parameter estimates and corresponding standard errors are identical. So, my interpretation is that family=gaussian(link=sqrt) is identical as specify family=quasi(link=power(0.5)) in glm(). The exception is that AIC (and thus maximized log-likelihood values) can be computed for family=gaussian(link=sqrt). The questions are: (A.1) Is this interpretation correct? (A.2) If (A.1) is true, does family=gaussian(link=sqrt) implies that I am doing a Generalized Linear Model with normal distribution and the link function is: sqrt(mu) = b0+b1(herbc)+b2(herbht)? (B) In less technical terms, in model 1 you compute the likelihood from probabilities and in model 2 from probability densities, and the latter depend on the units of measurement. Yes, you are correct and I understand it now. Although not as common these days, some small mammal studies still use sqrt transformation of count as response variable and carry out a linear model fitting with predictors (via least squares). So, the exercise that I got into is to compare performances of linear model with sqrt transformation of count and GLM with Poisson. However, knowing that we can't compare logLik or AIC based on different measures of responses. So, I thought that comparison under GLM framework might be an approach closer to the intention. Thank again for your quick respond and advices. I appreciate it very much. Best regards, TzengYih Lam --- Ph.D. student College of Forestry Oregon State University -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Wed 12/10/2008 11:45 PM To: Lam, Tzeng Yih Cc: r-help@r-project.org Subject: Re: [R] Validity of GLM using Gaussian family with sqrt link a) There is a difference between link=sqrt and link=sqrt. link: a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector or an object of class 'link-glm' (such as
[R] very long integers
A quick question really: I have a database with extremely long integer IDs (eg 588848900971299297), which is too big for R to cope with internally (it appears to store as a double), and when I do any frequency tables erroneous results appear. Does anyone know of a package that extends internal storage up to LONG, or is the only solution to read it in as a character from the original data? In case anyone is curious, I didn't create the IDs, and in some form I must conserve all of the ID information for later use. Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Logical in test
OK, this should be trivial but I'm not finding it. I want to compress the test, if (i==7 | i==10 | i==30 | i==50) {} into something like if (i in c(7,10,30,50)) {} so I can build a excludes vector excludes - c(7,10,30,50) and test if (i in excludes) {} However, I'm not finding a clue on how to accomplish this, if it can be done. Would someone with more R experience lend a helping hand please? A reference (so I can continue learning) would also be appreciated. Thanks... -=d David Thompson, Ph.D., P.E., D.WRE, CFM Civil Engineer/Hydrologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with predict in stats4
Hi, We're using stats4 for a logistic regression. The code is chdreg.logit2 - glm(chd ~ age + sex, family = binomial) summary(chdreg.logit2) oddsratios - coef(chdreg.logit2) exp(oddsratios) # Calculate model predicted values pred - predict(chdreg.logit2,type=response) The glm part runs fine, and up to now so has the predict function. However, now we're getting the following error Error in function (classes, fdef, mtable) : unable to find an inherited method for function predict, for signature glm Any thoughts about why this seems to now be appearing. Thanks in advance. -- === David Kaplan, Ph.D. Professor Department of Educational Psychology University of Wisconsin - Madison Educational Sciences, Room, 1061 1025 W. Johnson Street Madison, WI 53706 email: [EMAIL PROTECTED] homepage: http://www.education.wisc.edu/edpsych/default.aspx?content=kaplan.html Phone: 608-262-0836 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data.frames columnwise (rbind with different variables, lengths)
have a look at merge(), e.g., df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) merge(df1, df2, all = TRUE, sort = FALSE) I hope it helps. Best, Dimitris Stefan Uhmann wrote: Dear List, I have two dataframes with overlapping colnames and want to merge them. Actually, what I want is more similar to rbind, but the dataframes differ in their columns. Here are the examples: df1 - data.frame(A = c(1,2), B = c(m,f), C = c(at home, away)) df2 - data.frame(A = c(2), C = c(at home)) Here the desired result: A BC 1 1 m at home 2 2 f away 3 2 NA at home Thanks for any help, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very long integers
Have you looked at the R interface to gmp? http://cran.r-project.org/web/packages/gmp/index.html Rory Winston RBS Global Banking Markets Office: +44 20 7085 4476 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Aaron Robotham Sent: 11 December 2008 17:07 To: r-help@r-project.org Subject: [R] very long integers A quick question really: I have a database with extremely long integer IDs (eg 588848900971299297), which is too big for R to cope with internally (it appears to store as a double), and when I do any frequency tables erroneous results appear. Does anyone know of a package that extends internal storage up to LONG, or is the only solution to read it in as a character from the original data? In case anyone is curious, I didn't create the IDs, and in some form I must conserve all of the ID information for later use. Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. Authorised and regulated by the Financial Services Authority This e-mail message is confidential and for use by the=2...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
On Thu, Dec 11, 2008 at 2:10 PM, Prof Brian Ripley [EMAIL PROTECTED] wrote: A slightly simpler version is format(Sys.Date(), %V) On Thu, 11 Dec 2008, Prof Brian Ripley wrote: strftime(x, %V) E.g. strftime(as.POSIXlt(Sys.Date()), %V) is 50, and you might want as.numeric() on it. Note that this is OS-dependent, and AFAIR Windows does not have it. - On Thu, Dec 11, 2008 at 2:15 PM, Gabor Grothendieck [EMAIL PROTECTED] wrote: format(d, %U) and format(d, %W) give week numbers using different conventions. See ?strptime Thank you both for your replies! I'm on windows, so prof Ripleys solution does not work (why is this OS-dependent?). Regarding Gabor's solution, neither convention follow the ISO 8601 standard, which is used in Europe (and Sweden in particular). See http://en.wikipedia.org/wiki/ISO_8601#Week_dates . So it seems that my function does fill a hole, however small I know that for me, working with week numbers, which are used quite heavily in Sweden, have always been a major frustration. Would it be possible to implement something similar to my solution in base, and how should I go about making it fit in to the rest of the date functions? /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical in test
Take a look at ?any. On Thu, Dec 11, 2008 at 3:11 PM, David B. Thompson, Ph.D., P.E., D.WRE, CFM [EMAIL PROTECTED] wrote: OK, this should be trivial but I'm not finding it. I want to compress the test, if (i==7 | i==10 | i==30 | i==50) {} into something like if (i in c(7,10,30,50)) {} so I can build a excludes vector excludes - c(7,10,30,50) and test if (i in excludes) {} However, I'm not finding a clue on how to accomplish this, if it can be done. Would someone with more R experience lend a helping hand please? A reference (so I can continue learning) would also be appreciated. Thanks... -=d David Thompson, Ph.D., P.E., D.WRE, CFM Civil Engineer/Hydrologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generate combination multiset (set with repetition)
Hi, This has been asked before but not sufficiently answered from what I could find. How do you create combinations with repetitions (multisets) in R? If I have set - array(1:3) And I want to choose all combinations of picking 2 numbers, I want to get a print out like [,1] [,2] [1,]11 [2,]12 [3,]13 [4,]22 [5,]23 [6,]33 subsets(set, 2, allow.repeat=T) should work, but I can't get the multic package to install, t(combn(set,2)) was suggested but it doesn't produce repetitions; expand.grid(rep(list(1:3), 2)) was also suggested, but it produces permuations, not combinations. Additionally, I would like to iterate through each resultant set for large n (similar to the description for getNextSet {pcalg}). Any suggestions? Reuben Cummings __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with legend
hi all, I want to do a plot and put the legend on the left of y axis this is my code: x-seq(1980,2005,1) plot(x,tfa_ita,type=l,col=1,xlim=c(1979,2005),ylim=c(0.2,1.7),xlab=,ylab=,main=Totale Attivita` Finanziarie) lines(x,tfa_spa,type=l,col=2) lines(x,tfa_aus,type=l,col=3) lines(x,tfa_uk,type=l,col=4) lines(x,tfa_ger,type=l,col=5) lines(x,tfa_usa,type=l,col=1,lty=4) lines(x,tfa_jap,type=l,col=2,lty=4) lines(x,tfa_can,type=l,col=3,lty=4) lines(x,tfa_fra,type=l,col=4,lty=4) legend(locator[1],c(ita,spa,aus,uk,ger,usa,jap,can,fra),col=c(1,2,3,4,5,1,2,3,4),lty=c(1,1,1,1,1,4,4,4,4)) when I touch the graphics to put the legend, R puts it on my lines and not on the left of y axis any idea? where I make a mistake? thank you very much Valeria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] very long integers
A quick question really: I have a database with extremely long integer IDs (eg 588848900971299297), which is too big for R to cope with internally (it appears to store as a double), and when I do any frequency tables erroneous results appear. Does anyone know of a package that extends internal storage up to LONG, or is the only solution to read it in as a character from the original data? In case anyone is curious, I didn't create the IDs, and in some form I must conserve all of the ID information for later use. Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical in test
OK, this should be trivial but I'm not finding it. I want to compress the test, if (i==7 | i==10 | i==30 | i==50) {} into something like if (i in c(7,10,30,50)) {} so I can build a excludes vector excludes - c(7,10,30,50) and test if (i in excludes) {} Works for me. excludes - c(7,10,30,50) for(i in excludes) { print(i)} for(i in 5:30) { if( i %in% excludes) {print(i)} } What error messages are you getting? -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD [EMAIL PROTECTED] 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Principal Component Analysis - Selecting components? + right choice?
Hi, It is generally not the case that the best PC set, say, the top k PCs (where k p, p being the number of predcitors) contain the best predictor subset in linear regression. Hadi and Ling (Amer Stat, 1998) show that it is even possible to have an extreme situation where the first (p-1) PCs contribute nothing towards explaining the variation in the response, yet the last PC alone contributes everything. Their theorem is that if the true vector of regression coefficients is in the direction of the j-th eigenvector (of the correlation matrix), then the j-th PC alone will contribute everything to the model fit, while the remaining PCs will contribute zilch. They illustrate this phenomenon with a real data set from a classic text on regression, Draper and Smith. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of S Ellison Sent: Thursday, December 11, 2008 9:37 AM To: r-help@r-project.org; Corrado Subject: Re: [R] Principal Component Analysis - Selecting components? + right choice? If you're intending to create a model using PCs as predictors, select the PCs based on whether they contribute significanctly to the model fit. In chemometrics (multivariate stats in chemistry, among other things), if we're expecting 3 or 4 PC's to be useful in a principal component regression, we'd generally start with at least the first half-dozen or so and let the model fit sort them out. The reason for not preselecting too rigorously early on is that there's no guarantee at all that the first couple of PC's are good predictors for what you're interested in. The're properties of the predictor set, not of the response set. Mind you, there used to be something of a gap between chemometrics and proper statistics; I'm sure chemometricians used to do things with data that would turn a statistician pale. You could also look for a PLS model, which (if I recall correctly) actually uses the response data to select the latent variables used for prediction. S Corrado [EMAIL PROTECTED] 11/12/2008 11:46:37 Dear R gurus, I have some climatic data for a region of the world. They are monthly averages 1950 -2000 of precipitation (12 months), minimum temperature (12 months), maximum temperature (12 months). I have scaled them to 2 km x 2km cells, and I have around 75,000 cells. I need to feed them into a statistical model as co-variates, to use them to predict a response variable. The climatic data are obviously correlated: precipitation for January is correlated to precipitation for February and so on even precipitation and temperature are heavily correlated. I did some correlation analysis and they are all strongly correlated. I though of running PCA on them, in order to reduce the number of co-variates I feed into the model. I run the PCA using prcomp, quite successfully. Now I need to use a criteria to select the right number of PC. (that is: is it 1,2,3,4?) What criteria would you suggest? At the moment, I am using a criteria based on threshold, but that is highly subjective, even if there are some rules of thumb (Jolliffe,Principal Component Analysis, II Edition, Springer Verlag,2002). Could you suggest something more rigorous? By the way, do you think I would have been better off by using something different from PCA? Best, -- Corrado Topi Global Climate Change Biodiversity Indicators Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate combination multiset (set with repetition)
Hi, Perhaps you can use expand.grid and then remove the mirror combinations, values - 1:3 tmp - expand.grid(values, values) unique.combs - tmp[tmp[, 1]=tmp[, 2], ] unique.combs[do.call(order, unique.combs), ] # reorder if you wish Var1 Var2 111 412 713 522 823 933 I vaguely recall a discussion a few months ago on extending this approach to a variable number of arguments to expand.grid. Hope this helps, baptiste On 11 Dec 2008, at 17:00, Reuben Cummings wrote: Hi, This has been asked before but not sufficiently answered from what I could find. How do you create combinations with repetitions (multisets) in R? If I have set - array(1:3) And I want to choose all combinations of picking 2 numbers, I want to get a print out like [,1] [,2] [1,]11 [2,]12 [3,]13 [4,]22 [5,]23 [6,]33 subsets(set, 2, allow.repeat=T) should work, but I can't get the multic package to install, t(combn(set,2)) was suggested but it doesn't produce repetitions; expand.grid(rep(list(1:3), 2)) was also suggested, but it produces permuations, not combinations. Additionally, I would like to iterate through each resultant set for large n (similar to the description for getNextSet {pcalg}). Any suggestions? Reuben Cummings __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logical in test
i %in% c(7,10,30,50) On Thu, Dec 11, 2008 at 12:11 PM, David B. Thompson, Ph.D., P.E., D.WRE, CFM drdbthomp...@gmail.com wrote: OK, this should be trivial but I'm not finding it. I want to compress the test, if (i==7 | i==10 | i==30 | i==50) {} into something like if (i in c(7,10,30,50)) {} so I can build a excludes vector excludes - c(7,10,30,50) and test if (i in excludes) {} However, I'm not finding a clue on how to accomplish this, if it can be done. Would someone with more R experience lend a helping hand please? A reference (so I can continue learning) would also be appreciated. Thanks... -=d David Thompson, Ph.D., P.E., D.WRE, CFM Civil Engineer/Hydrologist __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very long integers
If they are IDs, you presumably don't need to perform arithmetic on them, so why not store them as strings? If you're reading them with read.table, see the colClasses parameter. I am not sure how to do this in RODBC; as.isthere (as in read.table) does not affect columns that look like numbers -- [perhaps you have to convert on the DBMS side? -s On Thu, Dec 11, 2008 at 11:57 AM, Aaron Robotham smashb...@gmail.comwrote: A quick question really: I have a database with extremely long integer IDs (eg 588848900971299297), which is too big for R to cope with internally (it appears to store as a double), and when I do any frequency tables erroneous results appear. Does anyone know of a package that extends internal storage up to LONG, or is the only solution to read it in as a character from the original data? In case anyone is curious, I didn't create the IDs, and in some form I must conserve all of the ID information for later use. Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to plot implicit functions
Dear R users -- I think this question was asked before but there was no reply to it. I would appreciate any suggestion any of you might have. I am interested in plotting several implicit functions (F(x,y,z)=0) on the same fig. Is there anyone who has an example code of how to do this? Thank you Yihsu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Complex integration in R
Dear R-user I need a function to approximate a complex integration. My function is: aprox2=function(s,x,rate){ dexp(x,rate)*exp(-s*x) } where argument s is a complex number. I can't use the integrate function because it's only used with numeric arguments Does anyone know some function to approximate complex integrals? Thanks Borja [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate combination multiset (set with repetition)
Dear Reuben, On Thu, Dec 11, 2008 at 12:53 PM, baptiste auguie ba...@exeter.ac.uk wrote: Hi, Perhaps you can use expand.grid and then remove the mirror combinations, values - 1:3 tmp - expand.grid(values, values) unique.combs - tmp[tmp[, 1]=tmp[, 2], ] unique.combs[do.call(order, unique.combs), ] # reorder if you wish Var1 Var2 111 412 713 522 823 933 I vaguely recall a discussion a few months ago on extending this approach to a variable number of arguments to expand.grid. Hope this helps, baptiste Here is another way: library(prob) urnsamples(1:3, size = 2, ordered = FALSE, replace = TRUE) You can convert to a matrix with as.matrix(), if desired. Regards, Jay -- *** G. Jay Kerns, Ph.D. Associate Professor Department of Mathematics Statistics Youngstown State University Youngstown, OH 44555-0002 USA Office: 1035 Cushwa Hall Phone: (330) 941-3310 Office (voice mail) -3302 Department -3170 FAX E-mail: gke...@ysu.edu http://www.cc.ysu.edu/~gjkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate combination multiset (set with repetition)
On Thu, 11 Dec 2008, Reuben Cummings wrote: Hi, This has been asked before but not sufficiently answered from what I could find. How do you create combinations with repetitions (multisets) in R? If I have set - array(1:3) Why wrap 1:3 in array() ?? And I want to choose all combinations of picking 2 numbers, I want to get a print out like [,1] [,2] [1,]11 [2,]12 [3,]13 [4,]22 [5,]23 [6,]33 For small problems (n 100, say) : which( lower.tri( diag( n ), diag=TRUE), arr.ind=TRUE )[,2:1] For larger problems, something like : foo - function(n) { brks - cumsum( n:1 ) k - 1:choose( n+1, 2 ) j - findInterval( k, brks+1 ) + 1 i - k - ( brks-brks[1] )[ j ] cbind( j, i ) } If the number in 'set' are not 1:n, you can do a lookup using the results from above. HTH, Chuck subsets(set, 2, allow.repeat=T) should work, but I can't get the multic package to install, t(combn(set,2)) was suggested but it doesn't produce repetitions; expand.grid(rep(list(1:3), 2)) was also suggested, but it produces permuations, not combinations. Additionally, I would like to iterate through each resultant set for large n (similar to the description for getNextSet {pcalg}). Any suggestions? Reuben Cummings __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help: programming loop, binding columns between data sets
Hi, I often have the problem of combining data sets of different lengths together. Simple example: I have data frame a, with two columns C1 and C2 and another data frame b with only one column V1. Data frame b is much bigger than a, but C1 of a has the same levels as V1 of b. (so in other words there are multiple instances of a$C1 in b$V1) I wish to paste a$C2 into a new column in b, where a$C1 == b$V1. I have always done it this way... for (i in 1:dim(a)[1]) {b[b$V1 == a$C1[i],c(V2)]=a[i,c(C2)]} However, 1. It is very slow 2. It is unreliable (in that for no reason at all, I often get NA's in the new column of B) -This usually happens when the code is within a loop, or I have to paste multiple columns from a across to b all at once. In this case I often have to paste each column one at a time which takes forever. I often am dealing with very large data sets. I am using R 2.1.1 on Windows Vista. Can anyone suggest a faster/more reliable alternative alternative please? Needless to say I am a programming novice. Thanks in advance, Simon Pickett. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeated searching of no-missing values
On Wed, Dec 10, 2008 at 6:39 PM, Bert Gunter gunter.ber...@gene.com wrote: ...?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values. What is the rationale for this? If it is just backwards compatibility with some long-ago implementation decision, perhaps tapply should be deprecated and replaced by something cleaner (perhaps plyr). If it is something deeper than that, it would be useful to know what. I admit that these details are somewhat obscure and even annoying -- but they **are** documented. No question that it is a good thing that things like this are documented. I think that's all we can expect. Some have lamented the lack of the language's perfect consistency in these matters, but I cannot understand how that would be possible given its nature, intended, as it is, to be **easily** used for high level data manipulation, graphics,statistical analysis etc. as well as programming. As a general rule, consistency makes it *easier* to learn and use a language. There are just too many possible data structures to expect logical consistency in their handling throughout... I am not sure what you mean here. There has been a lot of work in the programming language community on consistent handling of abstract structures of various types. Some of their insights may be applicable to future versions of R. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] check if a certain ... argument has been passed on to my user-defined function
Hi, How can I check if a certain ... argument has been passed on to my user-defined function or not? foo - function(data, ...) { ### here I want to check whether xlab was passed with the ... arguments ### or if the ... arguments did not contain an xlab argument } I tried missing(xlab) , exists(xlab) and several other things but did not find a solution. TIA, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check if a certain ... argument has been passed on to my user-defined function
Try this: foo - function(data, ...) { ### here I want to check whether xlab was passed with the ... arguments ### or if the ... arguments did not contain an xlab argument args - list(...) return(ifelse(xlab %in% names(args), Exists, Missing)) } On Thu, Dec 11, 2008 at 4:22 PM, Mark Heckmann mark.heckm...@gmx.de wrote: Hi, How can I check if a certain ... argument has been passed on to my user-defined function or not? foo - function(data, ...) { ### here I want to check whether xlab was passed with the ... arguments ### or if the ... arguments did not contain an xlab argument } I tried missing(xlab) , exists(xlab) and several other things but did not find a solution. TIA, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] check if a certain ... argument has been passed on to my user-defined function
See ?match.call and note the expand.dots arg. HTH, Chuck On Thu, 11 Dec 2008, Mark Heckmann wrote: Hi, How can I check if a certain ... argument has been passed on to my user-defined function or not? foo - function(data, ...) { ### here I want to check whether xlab was passed with the ... arguments ### or if the ... arguments did not contain an xlab argument } I tried missing(xlab) , exists(xlab) and several other things but did not find a solution. TIA, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
Gabor Grothendieck ggrothendieck at gmail.com writes: format(d, %U) and format(d, %W) give week numbers using different conventions. See ?strptime Gabor, the results of format(aDate, W) appear to be incorrect anyway, see: format(as.Date(2008-01-01), %W) #- 00 There is never a week 0, this should be week 1. format(Sys.Date(), %W)#- 49 but my business calendar says today's (Dec. 11, 2008) week is week 50 which is what Brian Ripleys proposed 'strftime(x, %V)' returns. There could be a format %E (not used up to now) for returning a correct week number according to the European standard. Yours, Hans Werner On Thu, Dec 11, 2008 at 7:43 AM, Gustaf Rydevik gustaf.rydevik at gmail.com wrote: Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf ... [rest deleted] __ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
According to the definition in ?strptime (which is not the same as the ISO definition): format(x, %W) returns Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention. The first day of 2008 is a Tuesday which means that 2008 starts in week 0. On Thu, Dec 11, 2008 at 2:31 PM, Hans W. Borchers hwborch...@gmail.com wrote: Gabor Grothendieck ggrothendieck at gmail.com writes: format(d, %U) and format(d, %W) give week numbers using different conventions. See ?strptime Gabor, the results of format(aDate, W) appear to be incorrect anyway, see: format(as.Date(2008-01-01), %W) #- 00 There is never a week 0, this should be week 1. format(Sys.Date(), %W)#- 49 but my business calendar says today's (Dec. 11, 2008) week is week 50 which is what Brian Ripleys proposed 'strftime(x, %V)' returns. There could be a format %E (not used up to now) for returning a correct week number according to the European standard. Yours, Hans Werner On Thu, Dec 11, 2008 at 7:43 AM, Gustaf Rydevik gustaf.rydevik at gmail.com wrote: Hi all, Is there a simple function already implemented for getting the ISO weeks of a Date object? I couldn't find one, and so wrote my own function to do it, but would appreciate a pointer to the default way. If a function is not yet implemented, could the code below be of interest to submit to CRAN? Best Regards, Gustaf ... [rest deleted] __ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R imperfections? -- was: repeated searching of no-missing values
Replies inline below. Best regards, -- Bert ___ From: macra...@gmail.com [mailto:macra...@gmail.com] On Behalf Of Stavros Macrakis Sent: Thursday, December 11, 2008 10:53 AM To: Bert Gunter Cc: Patrizio Frederic; r-help@r-project.org Subject: Re: [R] repeated searching of no-missing values On Wed, Dec 10, 2008 at 6:39 PM, Bert Gunter gunter.ber...@gene.com wrote: ...?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values. What is the rationale for this? If it is just backwards compatibility with some long-ago implementation decision, perhaps tapply should be deprecated and replaced by something cleaner (perhaps plyr). If it is something deeper than that, it would be useful to know what. Rationale? -- you'll have to ask the developers. As for deprecating (or changing) tapply: do you have any idea how much code that could break?! I think that is probably a wholly unrealistic suggestion. The way forward is through efforts like Hadley's plyr package. Among other things, that's what packages are for. Indeed, as you probably know, packages like R.oo and proto allow one to use a whole different programming language/paradigm within R, while still taking advantage of all of R's existing built-in functionality. Except for possible performance penalties, I don't see how you can ask for much more than that. So, no, R is certainly not perfect. I'm sure that if they could go back 20 years with today's knowledge and experience, the developers would do some things differently. That's life -- and progress! But I think any objective assessment -- and certainly those of us who use it day in and day out in our work -- would consider R a truly amazing software product, warts or no. Hence, may I suggest that instead of merely pointing out its (often well known,btw) imperfections and inelegancies, you instead move to the developers' forum and contribute improvements. This is, I believe, a standard way for people with programming expertise like yourself to contribute to open source development. Although the developers may be a bit crotchety at times (I think often appropriately so given the extraordinary effort they've put in), I think you would find that they would welcome sincere efforts to help them improve R. *** I admit that these details are somewhat obscure and even annoying -- but they **are** documented. No question that it is a good thing that things like this are documented. I think that's all we can expect. Some have lamented the lack of the language's perfect consistency in these matters, but I cannot understand how that would be possible given its nature, intended, as it is, to be **easily** used for high level data manipulation, graphics,statistical analysis etc. as well as programming. As a general rule, consistency makes it *easier* to learn and use a language. *** Of course! *** There are just too many possible data structures to expect logical consistency in their handling throughout... I am not sure what you mean here. There has been a lot of work in the programming language community on consistent handling of abstract structures of various types. Some of their insights may be applicable to future versions of R. *** No doubt. That's progress. Are you going to write this future version? I certainly am not -- and CAN not (being a bear of but little brain)! *** -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Row order in plot
I'm new to R so forgive me if this seems like a simple question: So I have table where the row titles are string variables. When I plot the data with rows along the x-axis, the data is ordered alphabetically as opposed to the order of the table. How can I preserve the row order of the table in the plot? Thanks in advance. -- View this message in context: http://www.nabble.com/Row-order-in-plot-tp20962774p20962774.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row order in plot
It would be easier to answer your question if we knew what your data look like, what R commands you've tried, and what result you want. One possibility: plot the data against 1:nrow(yourdata), and add the row names as labels. Sarah On Thu, Dec 11, 2008 at 2:35 PM, qroberts lvaic...@bu.edu wrote: I'm new to R so forgive me if this seems like a simple question: So I have table where the row titles are string variables. When I plot the data with rows along the x-axis, the data is ordered alphabetically as opposed to the order of the table. How can I preserve the row order of the table in the plot? Thanks in advance. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] is there a way to recursilvely lapply
for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering if I have overlooked something obvious. one can also do: lapply(x,lapply,sum) but that assumes that you already know how many levels you have, and that all the levels are consistent. -Whit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
Gabor Grothendieck ggrothendieck at gmail.com writes: According to the definition in ?strptime (which is not the same as the ISO definition): format(x, %W) returns Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention. The first day of 2008 is a Tuesday which means that 2008 starts in week 0. Yes I read that but it is still misleading and -- I think -- incorrect. See www.dateandtime.org/calendar to find out that this is week 50 even in the UK. We would have had a lot of misplaced business meetings in our company if the week numbers in Great Britain, Germany, and Sweden would actually be different. Hans Werner ... [rest deleted] __ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is there a way to recursilvely lapply
This will recursively lapply and may or may not be what you are looking for: rapply(x, sum) On Thu, Dec 11, 2008 at 2:59 PM, Whit Armstrong armstrong.w...@gmail.com wrote: for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering if I have overlooked something obvious. one can also do: lapply(x,lapply,sum) but that assumes that you already know how many levels you have, and that all the levels are consistent. -Whit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getting ISO week
Perhaps you mean is that the definition ought be otherwise but at least according to one standard the definition is correct: http://www.opengroup.org/onlinepubs/009695399/functions/strptime.html On Thu, Dec 11, 2008 at 3:01 PM, Hans W. Borchers hwborch...@gmail.com wrote: Gabor Grothendieck ggrothendieck at gmail.com writes: According to the definition in ?strptime (which is not the same as the ISO definition): format(x, %W) returns Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention. The first day of 2008 is a Tuesday which means that 2008 starts in week 0. Yes I read that but it is still misleading and -- I think -- incorrect. See www.dateandtime.org/calendar to find out that this is week 50 even in the UK. We would have had a lot of misplaced business meetings in our company if the week numbers in Great Britain, Germany, and Sweden would actually be different. Hans Werner ... [rest deleted] __ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] candisc plotting
Hello, I have a file with two dependent variables (three and five) and one independent variable. I do i.mod - lm(cbind(three, five) ~ species, data=i.txt) and get the following output: Coefficients: three five (Intercept) 9.949 9.586 species -1.166 -1.156 I do a i.can-candisc(i.mod,data=i): and get the following output: Canonical Discriminant Analysis for species: CanRsq Eigenvalue Difference Percent Cumulative 1 0.0965060.10681100100 Test of H0: The canonical correlations in the current row and all that follow are zero LR test stat approx F num Df den Df Pr( F) 10.903 63.875 1598 6.859e-15 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 this is different than the output I get with SAS: Eigenvalue Difference Proportion Cumulative Ratio F Value Num DF Den DF Pr F 1 0.10681. 1. 0.90349416 31.88 2597 .0001 I am also wondering how to plot the can1*can1 like it is done in SAS. proc plot; plot can1*can1=species; format species spechar.; title2 'Plot of Constits_vs_cassettes'; run; Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] call lattice function in a function passing groups argument
On Thu, Dec 11, 2008 at 2:55 AM, Thomas Zumbrunn t.zumbr...@unibas.ch wrote: I'm trying to use a lattice function within a function and have problems passing the groups argument properly. Let's say I have a data frame d - data.frame(x = rnorm(100), y = c(a, b)) and want to plot variable x in a densityplot, grouped by the variable y, then I would do something like densityplot(~ x, d, groups = y) If however I wanted to call the function densityplot within a function and pass the groups argument as an argument of that function, how would I have to proceed? It is not as straightforward as f - function(data, groupvar) { densityplot(~ x, data, groups = groupvar) } probably because the lattice function densityplot.formula preprocesses the groups argument with Yes, that's the price of non-standard evaluation. groups - eval(substitute(groups), data, environment(formula)) It there a way how I could pass the groups argument in the function f? The obvious solution is to evaluate 'groupvar' yourself: f - function(data, groupvar) { groupvar - eval(substitute(groupvar), data, parent.frame()) densityplot(~ x, data, groups = groupvar) } A more general solution (where 'groupvar' may be missing) is to use match.call() etc. (e.g., see lattice:::dotplot.formula) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is there a way to recursilvely lapply
On Thu, 11 Dec 2008, Whit Armstrong wrote: for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering if I have overlooked something obvious. rapply? Which is linked from ?lapply (I just checked). Perhaps rapply(x, sum) a.a a.b b.a b.b 6 12 21 30 or rapply(x, sum, how=list) $a $a$a [1] 6 . one can also do: lapply(x,lapply,sum) but that assumes that you already know how many levels you have, and that all the levels are consistent. -Whit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R imperfections? -- was: repeated searching of no-missing values
Bert, Thanks for your reply. I suspect we agree more than you might think Comments inline below. I've snipped out parts. -s On Thu, Dec 11, 2008 at 2:45 PM, Bert Gunter gunter.ber...@gene.com wrote: Rationale? -- you'll have to ask the developers Hmm. It would be nice if this could be documented for new users -- and for posterity. But I admit that other projects I have worked on don't do a particularly good job of documenting such things, either. As for deprecating (or changing) tapply: do you have any idea how much code that could break?! I think that is probably a wholly unrealistic suggestion Ah, perhaps my terminology isn't clear. In the programming world, deprecating (as opposed to removing or changing) a feature means declaring it obsolete and not-to-be-recommended (or as you put it somewhat obscure and even annoying) while continuing to support it for backwards compatibility. So tapply would continue to exist and to work both for legacy code and for users who prefer it, but it would not be taught to new users, and its documentation would cross-reference the currently recommended approach. The way forward is through efforts like Hadley's plyr package Agreed, and ideally the user and developer community would eventually converge on one or another such package and integrate it into the core system, to avoid balkanization of the user community. ...packages like R.oo and proto allow one to use a whole different programming language/paradigm within R, while still taking advantage of all of R's existing built-in functionality. Except for possible performance penalties, I don't see how you can ask for much more than that I certainly agree that exploring other approaches in add-on packages is a good thing. Even better to progressively deprecate features which are obscure and even annoying at the same time. ...So, no, R is certainly not perfect. I'm sure that if they could go back 20 years with today's knowledge and experience, the developers would do some things differently We cannot change the past, but we can make a better future without hurting current users! ...any objective assessment -- and certainly those of us who use it day in and day out in our work -- would consider R a truly amazing software product, warts or no I agree entirely -- I have come to use R after considering a variety of alternatives for my work, and am delighted with its functionality and strong user community. I admit, though, that the learning curve has been surprisingly steep, largely because of design inconsistencies and idiosyncratic terminology. Hence, may I suggest that instead of merely pointing out its (often well known,btw) imperfections and inelegancies, you instead move to the developers' forum and contribute improvements. This is, I believe, a standard way for people with programming expertise like yourself to contribute to open source development Agreed, and I have contributed to Maxima over the years. I am quite new to R, though, and just getting my bearings. I haven't even looked at the underlying implementation yet. I do intend to contribute in the future. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2-Y-axes on same plot
Joe Trubisz wrote: Hi... Is this possible in R? I have 2-sets of data, that were collected simultaneously using 2-different data acquisition schemes. The x-values are the same for both. The y-values have different ranges (16.4-37.5 using one method, 557-634 using another). In theory, if you plot both plots on top of each other, the graphs should overlap. The problem I'm having is trying to have to different sets of y-values appear in the same graph, but scaled in the same vertical space. I've seen this done in publications, but not sure if it can be done in R. Here's a brute force graph with two sets of y-axis # set up some fake test data time - seq(0,72,12) betagal.abs - c(0.05,0.18,0.25,0.31,0.32,0.34,0.35) cell.density - c(0,1000,2000,3000,4000,5000,6000) #add extra space to right margin of plot within frame par(mar=c(5, 4, 4, 4) + 0.1) # Plot first set of data and draw its axis plot(time, betagal.abs, pch=16, axes=F, ylim=c(0,1), xlab=, ylab=, type=b,col=black, main=Mike's test data) axis(2, ylim=c(0,1),col=black) mtext(Beta Gal Absorbance,side=2,line=2.5) box() # Allow a second plot on the same graph par(new=T) # Plot the second plot and put axis scale on right plot(time, cell.density, pch=15, xlab=, ylab=, ylim=c(0,7000), axes=F, type=b, col=red) mtext(Cell Density,side=4,col=red,line=2.5) axis(4, ylim=c(0,7000), col=red,col.axis=red) # Draw the time axis axis(1,pretty(range(time),10)) mtext(Time (Hours),side=1,col=black,line=2.5) # Add Legend legend(5,7000,legend=c(Beta Gal,Cell Density),text.col=c(black,red),pch=c(16,15),col=c(black,red)) HTH, Rob Baer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Resampling physiological data using R?
Dear all R users, I am going to use R to process some of my physiological data about eye. The problem is the recording machine does not sample in a reliably constant rate: the time intervals between data sampled can vary from 9msec to ~120msec, while most around in the 15-30msec range. The below is a fraction of a single data file of a trial: TimeCursorX CursorY Pupilsize 1811543 -1 -1 -1 1811563 -1 -1 -1 1811584 511 370 4.175665 1811603 511 368 4.181973 1811624 521 368 4.210732 1811644 512 377 4.149632 1811664 524 377 4.275845 1811684 518 368 4.236212 1811703 516 370 4.238384 1811725 507 364 4.181157 1811744 509 371 4.185016 1811764 509 377 4.231987 1811784 514 387 4.252449 1811802 515 388 4.273726 My goal is to resample these data so that the Time column increments by a regular interval, and the other columns of data are the averages (or estimates) at the point in time according to available data points. I have done something that I use a regular interval that is larger than the naturally occurring record machine, i.e. 120msec for example, and acquire an average of the available data points for any particular regular time interval. Now, I need to achieve resampling for smaller regular interval: i.e. 5msec intervals, and interpolate / intrapolate the missing data points from the available ones. i.e. I may have to split up data points into the number of the regular intervals that it may occupied in time. Do you know if there is any package that is doing something similar? And because of the size of the data and computational demand (1500 files each with 2000-8000+ lines), can you suggest me some (algorithmically) more efficient way of doing this? Thanks a lot! Regards, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is there a way to recursilvely lapply
Thanks, Gabor and Prof. Ripley. Sorry for the oversight. I grepped the lapply help for recursive prior to sending my question. why does it appear as *r*ecursive in the help file? or is that just a formating problem on my machine? -Whit On Thu, Dec 11, 2008 at 3:13 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Thu, 11 Dec 2008, Whit Armstrong wrote: for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering if I have overlooked something obvious. rapply? Which is linked from ?lapply (I just checked). Perhaps rapply(x, sum) a.a a.b b.a b.b 6 12 21 30 or rapply(x, sum, how=list) $a $a$a [1] 6 . one can also do: lapply(x,lapply,sum) but that assumes that you already know how many levels you have, and that all the levels are consistent. -Whit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Equivalent to Full Information Maximum Likelihood (FIML) in R?
Is there an equivalent to MPlus's Full Information Maximum Likelihood (FIML) missing data estimator for R? If so, is there a way to take covariance structures produced by such a package and perform multiple regression with these? If you are unfamiliar with Mplus' FIML below is a link to their manual. Their estimation technology is discussed on page 25. I have asked the developer of the mvnmle and he was unsure if this package similar. http://www.statmodel.com/download/techappen.pdf Thanks, Sam -- http://theregressingpilgrim.blogspot.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R imperfections? -- was: repeated searching of no-missing values
replies inline below. Bert Gunter wrote: Replies inline below. [bert (?)]...?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values. [stavros] What is the rationale for this? If it is just backwards compatibility with some long-ago implementation decision, perhaps tapply should be deprecated and replaced by something cleaner (perhaps plyr). If it is something deeper than that, it would be useful to know what. [bert] Rationale? -- you'll have to ask the developers. As for deprecating (or changing) tapply: do you have any idea how much code that could break?! I think that is probably a wholly unrealistic suggestion. do you have any idea how much old code has been broken in the history of programming just because programming languages moved from version x to version x+1? the argument that old code would be broken is repeated here ad nauseam, literally. there always is a tradeoff between protecting the old developers against the need for reimplementation of existing code and protecting the future developers against the need to spend days on figuring our how to hack around broken designs and implementations. The way forward is through efforts like Hadley's plyr package. Among other things, that's what packages are for. packages play an important role in about every language. but packages, especially ones written by third parties, should serve as an *extension* of the core functionality, and not as a replacement. perhaps it is just fine to say that a function from plyr should be used instead of tapply (which, note, is in the base package). but perhaps the core stuff should rather evolve than be duplicated by external patches. as to the original problem, since you (bert) say: ?tapply says that the first argument is an **atomic** vector. A factor is not an atomic vector. So tapply interprets it as such by looking only at its representation, which is as integer values. can you explain the following: is.atomic(as.factor(1:10)) # TRUE is.atomic(factor(0)) # TRUE ?is.atomic says: 'is.atomic' returns 'TRUE' if 'x' is an atomic vector (or 'NULL') and 'FALSE' otherwise. which seems incoherent with the above, and also with the following: f = factor(0) is.atomic(f) # TRUE is.vector(f) # FALSE ?vector says: Note that factors are _not_ vectors; 'is.vector' returns 'FALSE' if f is not a vector, how can it be an atomic vector? perhaps 'is.atomic' does not mean what i would naively assume reading the docs; with r, one has to learn not to use common sense, as in, e.g., the case of sort.list. Indeed, as you probably know, packages like R.oo and proto allow one to use a whole different programming language/paradigm within R, while still taking advantage of all of R's existing built-in functionality. Except for possible performance penalties, I don't see how you can ask for much more than that. given how comments such as those of stavros or mine are typically answered, indeed one cannot expect much more. the question is, why would one not want to ask for more? So, no, R is certainly not perfect. I'm sure that if they could go back 20 years with today's knowledge and experience, the developers would do some things differently. That's life -- and progress! But I think any objective assessment -- and certainly those of us who use it day in and day out in our work -- would consider R a truly amazing software product, warts or no. Hence, may I suggest that instead of merely pointing out its (often well known,btw) imperfections and inelegancies, you instead move to the developers' forum and contribute improvements. This is, I believe, a standard way for people with programming expertise like yourself to contribute to open source development. Although the developers may be a bit crotchety at times (I think often appropriately so given the extraordinary effort they've put in), I think you would find that they would welcome sincere efforts to help them improve R. again, same send-a-patch talk. can't you possibly dissect between design and implementation? should every conceptual discussion be replaced by a flow of patches? python's peps have already been mentioned; another counterexample is jcp. i agree that contributing code is desirable, but discarding any other initiative right away is plainly rude, even if not verbally. I think that's all we can expect. Some have lamented the lack of the language's perfect consistency in these matters, but I cannot understand how that would be possible given its nature, intended, as it is, to be **easily** used for high level data manipulation, graphics,statistical analysis etc. as well as programming. As a general rule, consistency makes it *easier* to learn and use a language. *** Of course! *** radio erewan strikes again?
Re: [R] is there a way to recursilvely lapply
On Thu, 11 Dec 2008, Whit Armstrong wrote: Thanks, Gabor and Prof. Ripley. Sorry for the oversight. I grepped the lapply help for recursive prior to sending my question. why does it appear as *r*ecursive in the help file? or is that just a formating problem on my machine? It is marked as bold: I presume you are reading text help? -Whit On Thu, Dec 11, 2008 at 3:13 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Thu, 11 Dec 2008, Whit Armstrong wrote: for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering if I have overlooked something obvious. rapply? Which is linked from ?lapply (I just checked). Perhaps rapply(x, sum) a.a a.b b.a b.b 6 12 21 30 or rapply(x, sum, how=list) $a $a$a [1] 6 . one can also do: lapply(x,lapply,sum) but that assumes that you already know how many levels you have, and that all the levels are consistent. -Whit __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract rows from data frame based on row names from another data frame
Hi all, Is there a function to extract row names from a data frame based on row names from another data frame? I can write a loop function to do this, but this may be inefficient in terms of processing. thanks for any information, Wade __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simulate spatial data
Hi, I am simulating 2-dimensional data using the RandomFields library and the gaussRF function therein. While this is done with the code below, I would like the landscape to be continuous or smooth at the edges. That is, I would like the upper edge to smoothly connect to the lower edge AND the right edge to smoothly connect to the left edge. I cannot figure out how to do this. I would greatly appreciate pointers to the right function, a different package, or any other viable approach. Best, Daniel library(RandomFields) PrintModelList() ## the complete list of implemented models model - stable mean - 0 variance - 10 nugget - 0 #noise around the structure scale - 10 #structure/patchiness alpha - 2 ## see help(CovarianceFct) for additional ## parameters of the covariance functions step - 1 ## nicer, but also time consuming if step - 0.1 x - seq(0, 100, step) y - seq(0, 100, step) f - GaussRF(x=x, y=y, model=model,grid=T, param=c(mean, variance, nugget, scale, alpha)) # par(mfcol=c(1,2)) image(x, y, f) contour(f) - cuncta stricte discussurus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.