Re: [R] Recursive partitioning algorithms in R vs. alia
Wensui Liu wrote: well, how difficult to code random forest with sas macro + proc split? if you are lack of sas programming skill, then you are correct that you have to wait for 8 years :-) It is true one can use the macro language to obtain some control flow the plain SAS language and its PROCs are missing and for manipulating matrices there is even a third language (IML), but my customers prefer to leverage community-tested open source implementations as building blocks rather than spending unnecessary resources in writing things from scratch in their corner. i don't know how much sas experience you have. as far as i know, both bagging and boosting have been implemented in sas em for a while, together with other cut-edge modeling tools such as svm / nnet. Fair enough, but whenever you will need ensemble methods for survival data or would like to escape bias in variable importance in presence of categorical predictors you will (1) not be able to take something off the shelf and (2) neither to programmatically tweak SAS EM procedures (as they are not exposed but locked in the GUI), so there again your only option is to implement things from scratch. Best, Tobias On Fri, Jun 19, 2009 at 4:18 PM, Tobias Verbeketobias.verb...@openanalytics.be wrote: Wensui Liu wrote: in terms of the richness of features and ability to handle large data(which is normal in bank), SAS EM should be on top of others. Should be ? That is not at all my experience. SAS EM is very much lagging behind current research. You will find variants of random forests in R that will not be in SAS for the next 8 years, to give just one example. however, it is not cheap. in terms of algorithm, split procedure in sas em can do chaid/cart/c4.5, if i remember correctly. These are techniques of the 80s and 90s (which proves my point). CART is in rpart and an implementation of C4.5 can be accessed through RWeka. For the oldest one (CHAID, 1980), there might be an implementation soon: http://r-forge.r-project.org/projects/chaid/ but again there have been quite some improvements in the last decade as well: http://cran.r-project.org/web/views/MachineLearning.html HTH, Tobias On Fri, Jun 19, 2009 at 2:35 PM, Carlos J. Gil Bellostac...@datanalytics.com wrote: Dear R-helpers, I had a conversation with a guy working in a business intelligence department at a major Spanish bank. They rely on recursive partitioning methods to rank customers according to certain criteria. They use both SAS EM and Salford Systems' CART. I have used package R part in the past, but I could not provide any kind of feature comparison or the like as I have no access to any installation of the first two proprietary products. Has anybody experience with them? Is there any public benchmark available? Is there any very good --although solely technical-- reason to pay hefty software licences? How would the algorithms implemented in rpart compare to those in SAS and/or CART? Best regards, Carlos J. Gil Bellosta http://www.datanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Leave One Out Cross Validation
Hi Uwe, My apologies. Please if I can be guided what I am doing wrong in the code. I started my code as such: #ypred is my leave one out estimator of x cvhfunc-function(y,x,h){ ypred-0 for (i in 1:n){ for (j in 1:n){ if (j!=i){ ypred-ypred+(y[i]*k((x[j]-x[i])/h))/k((x[j]-x[i])/h) } }} ypred #CVh is a cvh-0 cvh-cvh+((1/n)*(y-ypred)^2 cvh } test2-cvhfunc(ydat,xdat,.2);test2 #I was experimenting with the following data: library(datasets) data(faithful) ydat-faithful$eruptions;ydat;plot(ydat);par(new=T) xdat-faithful$waiting;xdat;plot(xdat,col=blue) # I want to minimize the CV function with respect to h. Thanks. Uwe Ligges-3 wrote: See the posting guide: If you provide commented, minimal, self-contained, reproducible code some people may be willing to help on the list. Best, Uwe Ligges muddz wrote: Hi All, I have been trying to get this LOO-Cross Validation method to work on R for the past 3 weeks but have had no luck. I am hoping someone would kindly help me. Essentially I want to code the LOO-Cross Validation for the 'Local Constant' and 'Local Linear' constant estimators. I want to find optimal h, bandwidth. Thank you very much! -M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting Cumulative Hazard Functions with Strata
Hello: So i've fit a hazard function to a set of data using kmfit-survfit(Surv(int, event)~factor(cohort)) this factor variable, cohort has four levels so naturally the strata variable has 4 values. I can use this data to estimate the hazard rate haz-n.event/n.risk and calculate the cumulative hazard function by H--log(haz) Now, I would like to plot this cumulative hazard function by time and differentiate the curves by strata. For the life of me I cannot seem to do it. Any help would be greatly appreciated. Regards, j -- View this message in context: http://www.nabble.com/Plotting-Cumulative-Hazard-Functions-with-Strata-tp24121979p24121979.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Maximum number of tables combined with rbind
Hello, I have been using read.table to read data files into R, then rbind to combine them into one table. The column headings are all identical, so it should be straightforward, and it seems to be working well so far. My question is: What is the maximum number of tables that can be combined with rbind? Is it driven by the number of rows in the tables, by a constraint with the syntax of rbind itself, by my PC's memory, or by some other constraint? At this point the data files I am reading in are approx 2,500 rows x 150 columns. Thanks, Guy Green -- View this message in context: http://www.nabble.com/Maximum-number-of-tables-combined-with-rbind-tp24122126p24122126.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting Data by Row
Ross Culloch wrote: Hello fellow R users! I wonder if someone can help with what i think should be a simple question but i can't seem to find the answer or work it out. My data set is as such: Day Time ID Behaviour 1 9 A1 2 1 10A2 3 .. .... .. 4 10 A1 10 4 11 A2 1 .. .... .. 30 1B1 14 30 2C3 4 So basically i have data for several days, for several times, for several IDs and for several Behaviours What i want to do is get an activity budget for ID from these data, e.g: data - tapply(Behaviour,list(ID,Behaviour),length) This will give me a count of the number of times an ID does a certain behaviour for all the data, which is great - but i want to work out seasonal and diurnal activity budgets too, therefore i need to break the data down not just by ID but by day and time, too - I've searched on here and found nothing i could adapt to my data - it may be that i can't see quite how the code would work and i've overlooked something of importance! If anyone can point me in the right direction i'd be most grateful! Hi Ross, The brkdn function in the prettyR package does this, and I am currently working on a more efficient version of this that will drive an improved hierobarp function in the plotrix package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Leave One Out Cross Validation
On Jun 19, 2009, at 7:45 PM, muddz wrote: Hi Uwe, My apologies. Please if I can be guided what I am doing wrong in the code. I started my code as such: # ypred is my leave one out estimator of x Estimator of x? Really? cvhfunc-function(y,x,h){ ypred-0 for (i in 1:n){ #how did you define n? It's not in your parameter list for (j in 1:n){ if (j!=i){ ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/ k( (x[j]-x[i])/h ) At this point you are also using k as a function. Have you at any point defined k? Also multiplication of the ratio of two identical numbers would generally give a result of y[i] for that second term. Unless, of course, any x[j] = x[i] in which case you will throw an error and every thing will grind to a halt. It might help if you now explained what you think CV should be. } } } ypred # not sure what that will do inside the function. If it's there for debugging you may want to print(ypred) #CVh is a # Yes? CVh is a what ? cvh-0 cvh-cvh+((1/n)*(y-ypred)^2# n again. R will still not know what that is. cvh # ypred is a scalar, while y is a vector, so cvh will be a vector. Is that what you want? } test2-cvhfunc(ydat,xdat,.2);test2 #I was experimenting with the following data: library(datasets) data(faithful) ydat-faithful$eruptions;ydat;plot(ydat);par(new=T) xdat-faithful$waiting;xdat;plot(xdat,col=blue) # I want to minimize the CV function with respect to h. Thanks. Uwe Ligges-3 wrote: See the posting guide: If you provide commented, minimal, self-contained, reproducible code some people may be willing to help on the list. Best, Uwe Ligges muddz wrote: Hi All, I have been trying to get this LOO-Cross Validation method to work on R for the past 3 weeks but have had no luck. I am hoping someone would kindly help me. Essentially I want to code the LOO-Cross Validation for the 'Local Constant' and 'Local Linear' constant estimators. I want to find optimal h, bandwidth. Thank you very much! -M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] modifying sound package function plot.Sample
Hi, I'm trying to modify this function.I want to remove the existing xaxis(the tick marks and the values below each tick) and make it dynamic so that i can choose whether i want the xaxis at the top or bottom but i cant seem to change that.can somebody help me? plot.Sample - function(x,xlab=NULL,ylab=NULL,...){ sampletest - is.Sample(x,argname='x' ) if (!sampletest$test) stop(sampletest$error) s - loadSample(x,filecheck=FALSE) s - normalize(s) if (channels(s)==1) { #if (is.null(ylab)) ylab - waveform plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab,axes=FALSE,...) #axis(1) #axis(2,at=c(-1,0,1),labels=as.character(c(-1,0,1))) #abline(h=0) #abline(h=c(-1,1)) #lines(par()$usr[1:2],y=c(rep(par()$usr[3],2)),xpd=TRUE) -- Rajesh.J [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Cumulative Hazard Functions with Strata
On Jun 20, 2009, at 12:23 AM, j0645073 wrote: Hello: So i've fit a hazard function to a set of data using kmfit-survfit(Surv(int, event)~factor(cohort)) this factor variable, cohort has four levels so naturally the strata variable has 4 values. I can use this data to estimate the hazard rate haz-n.event/n.risk That's not valid, but it's close. Try instead: haz - kmfit$n.event/kmfit$n.risk The problem will be to properly categorize the vectors in the survfit object with your covariate factor, cohort. (Strata is a somewhat different term in the language of Cox models.) and calculate the cumulative hazard function by H--log(haz) Since that is not valid R syntax at all and you say you want cumulative, then why aren't you summing the hazards? Now, I would like to plot this cumulative hazard function by time and differentiate the curves by strata. For the life of me I cannot seem to do it. If you are being asked to do this from scratch and show your work for homework grading purposes, then by all means, carry on. If not, then you should be trying: plot(kmfit, fun=cumhaz) Any help would be greatly appreciated. Regards, j and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum number of tables combined with rbind
On Jun 20, 2009, at 1:02 AM, gug wrote: Hello, I have been using read.table to read data files into R, then rbind to combine them into one table. The column headings are all identical, so it should be straightforward, and it seems to be working well so far. My question is: What is the maximum number of tables that can be combined with rbind? Is it driven by the number of rows in the tables, by a constraint with the syntax of rbind itself, by my PC's memory, Yes, and the OS. PC is not a useful designation for an OS. or by some other constraint? At this point the data files I am reading in are approx 2,500 rows x 150 columns. That should not be any problem. On an Intel-Mac w/ 8 GB there is no problem with dataframes that are 500 times that size. If this is on a Windows box, there is an RW-FAQ on the topic. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modifying sound package function plot.Sample
On Jun 20, 2009, at 7:05 AM, rajesh j wrote: Hi, I'm trying to modify this function.I want to remove the existing xaxis(the tick marks and the values below each tick) and make it dynamic so that i can choose whether i want the xaxis at the top or bottom but i cant seem to change that.can somebody help me? plot.Sample - function(x,xlab=NULL,ylab=NULL,...){ sampletest - is.Sample(x,argname='x' ) if (!sampletest$test) stop(sampletest$error) s - loadSample(x,filecheck=FALSE) s - normalize(s) if (channels(s)==1) { #if (is.null(ylab)) ylab - waveform plot(sound(s) [1,],type=l,col=black,xlab=xlab,ylab=ylab,axes=FALSE,...) If I understand you correctly, then try (untested): plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab, xaxt=n, ...) Although I'm not sure you can use that ellipsis in that context. #axis(1) #axis(2,at=c(-1,0,1),labels=as.character(c(-1,0,1))) #abline(h=0) #abline(h=c(-1,1)) #lines(par()$usr[1:2],y=c(rep(par()$usr[3],2)),xpd=TRUE) The locator function would let you get user input that could be used to drive alternative specifications to the axis() function -- Rajesh.J [[alternative HTML version deleted]] David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] package JM -- version 0.3-0
Dear R-users, I'd like to announce the release of the new version of package JM (soon available from CRAN) for the joint modelling of longitudinal and time-to-event data using shared parameter models. These models are applicable in mainly two settings. First, when focus is in the time-to-event outcome and we wish to account for the effect of a time-dependent covariate measured with error. Second, when focus is in the longitudinal outcome and we wish to correct for nonrandom dropout. New features include: * a relative risk model with a piecewise-constant baseline risk function is now available for the event outcome, using option 'piecewise-PH-GH' in the 'method' argument of jointModel(). * several types of residuals are supported for the longitudinal and time-to-event outcomes. Moreover, for the longitudinal outcome there is also the option to compute multiple-imputation-based residuals, as described in Rizopoulos, Verbeke and Molenberghs (Biometrics 2009, to appear). * the Weibull submodel for the time-to-event outcome is now available under both the relative risk and accelerated failure time formulations. * this new version of the package features new and more robust algorithms for numerical integration and optimization -- these updates could lead to different results, epsecially for the survival part compared to the previous version the package. As always, any kind of feedback (e.g., questions, suggestions, bug-reports, etc.) is more than welcome. Best, Dimitris -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MS-VAR introduction
Actually there aren't by now so many packages for markov regime switching models in R but I just saw that at the userR 2009 conference a talk will be held about it, maybe that can help you, though the authors don't mention whether they use a existing package or developed new functionalities. http://www2.agrocampus-ouest.fr/math/useR-2009/abstracts/pdf/Fontdecaba_SanchezEspigares_Munoz.pdf Matthieu Dear Henrique, I think that R is not actually the best statistical tool to model MS-VAR. Indeed, the package msvar only allow a simple specification of the model. One tool I have ever used is on Ox with the package MSVAR built by Krolzig. This package allow a large variety of model specifications, you can choose the number of regimes, the regime dependence etc. You could find more details on his site: http://www.krolzig.co.uk/index.html?content=/msvar.html However, it would be of great interest to develop a package on R. Maybe soon... Best regards, Sandrine Lunven Economist TAC financial www.tac-financial.com Dear R community, I'm starting to learn the MS-VAR methodology and I would like to know what I need to download (e.g. packages) to make MS-VAR estimations using R. Best, Henrique C. de Andrade Doutorando em Economia Aplicada Universidade Federal do Rio Grande do Sul www.ufrgs.br/ppge [[alternative HTML version deleted]] Sandrine LUNVEN wrote: Dear Henrique, I think that R is not actually the best statistical tool to model MS-VAR. Indeed, the package msvar only allow a simple specification of the model. One tool I have ever used is on Ox with the package MSVAR built by Krolzig. This package allow a large variety of model specifications, you can choose the number of regimes, the regime dependence etc. You could find more details on his site: http://www.krolzig.co.uk/index.html?content=/msvar.html However, it would be of great interest to develop a package on R. Maybe soon... Best regards, Sandrine Lunven Economist TAC financial www.tac-financial.com Dear R community, I'm starting to learn the MS-VAR methodology and I would like to know what I need to download (e.g. packages) to make MS-VAR estimations using R. Best, Henrique C. de Andrade Doutorando em Economia Aplicada Universidade Federal do Rio Grande do Sul www.ufrgs.br/ppge [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/MS-VAR-Introduction-tp24038825p24124803.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-project mailing lists (and SVN) - timeout Sunday 21 June
Dear R users, unfortunately, tomorrow Sunday will be a longish timeout of our mail services, notably affecting the R-foo and R-SIG-bar mailing lists @r-project.org , i.e., hosted by the Stats / Math Department of ETH, stat.math.ethz.ch. Note that also the svn.r-project.org will suffer from the timeout. This should be a once in 8 years event, hopefully, and we are sorry for any inconvenience. Well, enjoy Sunday off-work :-) --official-text -- There will be upcoming maintenance work of the central computer services in our server room : Sunday, 21st June, 9 a.m. - 5 p.m. (MEST = GMT+2) During this time, we have to switch off all of our servers since the main power supply will be turned off. Services like email or web will be down. This interrupt will probably take less than 8 hours. Please check the website http://www.math.ethz.ch As soon as this site is up again, all the servers and dervices should be back online (within a small delay). - Martin Maechler, ETH Zurich http://stat.ethz.ch/people/maechler Seminar für Statistik, ETH Zürich, SWITZERLAND __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum number of tables combined with rbind
Thanks David - you're right, PC is not very informative. I am using XP Home with 4GB, though I don't think XP accesses more than 3GB. From following through the FAQ's and memory functions (e.g. memory.limit(size = NA)) it looks like R is getting about 1535Mb at the moment. David Winsemius wrote: My question is: What is the maximum number of tables that can be combined with rbind? Is it driven by the number of rows in the tables, by a constraint with the syntax of rbind itself, by my PC's memory, Yes, and the OS. PC is not a useful designation for an OS. or by some other constraint? At this point the data files I am reading in are approx 2,500 rows x 150 columns. That should not be any problem. On an Intel-Mac w/ 8 GB there is no problem with dataframes that are 500 times that size. If this is on a Windows box, there is an RW-FAQ on the topic. David Winsemius, MD Heritage Laboratories West Hartford, CT -- View this message in context: http://www.nabble.com/Maximum-number-of-tables-combined-with-rbind-tp24122126p24124984.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] string splitting and testing for enrichment
Hi List I have data in the following form: Gene TFBS NUDC PPARA(1) HNF4(20) HNF4(96) AHRARNT(104) CACBINDINGPROTEIN(149) T3R(167) HLF(191) RPA2 STAT4(57) HEB(251) TAF12 PAX3(53) YY1(92) BRCA(99) GLI(101) EIF3I NERF(10) P300(10) TRAPPC3 HIC1(3) PAX5(17) PAX5(110) NRF1(119) HIC1(122) TRAPPC3 EGR(26) ZNF219(27) SP3(32) EGR(32) NFKAPPAB65(89) NFKAPPAB(89) RFX(121) ZTA(168) NDUFS5 WHN(14) ATF(57) EGR3(59) PAX5(99) SF1(108) NRSE(146) TIE1 NRSE(129) I would like to test the 2nd column (each value has letters followed by numbers in brackets) here for enrichment via fisher.test. To that end I am trying to create two factors made up of column 1 (Gene) and column 2 (TFBS) where each Gene would have several entries matching each TFBS. My main problem just now is that I can't split the TFBS column into separate strings (at the moment that 2nd column is all one string for each Gene). Here's where I am just now: test-as.character(dataIn[,2]) # convert the 2nd column from factor to character test2-unlist(strsplit(test[1], ' ')) # split the first element into individual strings (only the first element just now because I'm joust trying to get things working) test3-unlist(strsplit(test2, '\\([0-9]\\)')) # get rid of numbers and brackets now this does not behave as I hoped - it gives me: test3 [1] PPARA HNF4(20) HNF4(96) [4] AHRARNT(104) CACBINDINGPROTEIN(149) T3R(167) [7] HLF(191) ie it only removes the numbers and brackets from the first entry and not the others. Could someone point out my mistake please? Once I have all the TFBS (letters only) for each Gene I would then count how often a TFBS occurs and use this data for a fisher.test testing for enrichment of TFBS in the list I have. I'm a rather muddled here though and would appreciate advice on whether this is the right approach. Thanks Iain sessionInfo() R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum number of tables combined with rbind
You may be able to get more space than what you currently have, but my understanding is the XP in a default configuration can only address 2.5 GB. You also need to consider that you will need open memory in order to do anything useful. I am not aware of limitations imposed by rbind per se. Tables (if you are using the R termonology) are actually special forms of arrays. I just ran this test and it used about half of my 10 GB according to my systemmonitor: mtx - matrix(1:(2500*150), ncol=150) for (i in 1:10) { mtx - rbind(mtx,mtx); print( 2^i) } [1] 2 [1] 4 [1] 8 [1] 16 [1] 32 [1] 64 [1] 128 [1] 256 [1] 512 [1] 1024 Running it again only 8 times and looking at the fraction of occupied memory with Apple's Activity Monitor makes me think you could get 256 of such tables into your space without too much difficulty, but the number could easily be twice that since my pointer overhead in a 64- bit system will be greater than yours. You will probably even have room to do something useful. The option of leaving the data in a database should not be overlooked if you need more space. sqldf is an available package: http://code.google.com/p/sqldf/ Other informative functions include object.size() and gc(). After doubling that 8-fold copy of mtx to get a nine-fold doubling, I see: str(mtx) int [1:128, 1:150] 1 2 3 4 5 6 7 8 9 10 ... object.size(mtx) [1] 768000200 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 114618 6.2 35 18.735 18.7 Vcells 96121989 733.4 309848649 2364.0 576124419 4395.5 Good luck; David On Jun 20, 2009, at 8:49 AM, gug wrote: Thanks David - you're right, PC is not very informative. I am using XP Home with 4GB, though I don't think XP accesses more than 3GB. From following through the FAQ's and memory functions (e.g. memory.limit(size = NA)) it looks like R is getting about 1535Mb at the moment. David Winsemius wrote: My question is: What is the maximum number of tables that can be combined with rbind? Is it driven by the number of rows in the tables, by a constraint with the syntax of rbind itself, by my PC's memory, Yes, and the OS. PC is not a useful designation for an OS. or by some other constraint? At this point the data files I am reading in are approx 2,500 rows x 150 columns. That should not be any problem. On an Intel-Mac w/ 8 GB there is no problem with dataframes that are 500 times that size. If this is on a Windows box, there is an RW-FAQ on the topic. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to import timestamps from emails into R
Here's what I get head(tt) [1] 2008-02-20 03:09:51 EST 2008-02-20 12:12:57 EST [3] 2008-03-05 09:11:28 EST 2008-03-05 17:59:40 EST [5] 2008-03-09 09:00:09 EDT 2008-03-29 15:57:16 EDT But I can't figure out how to plot this now. plot(tt) does not appear to be univariate. I get the same plot with plot(as.Date(tt)), which would make sense if time is used because of the range of the dates and the insignificance of the times of day. head(as.Date(tt)) [1] 2008-02-20 2008-02-20 2008-03-05 2008-03-05 2008-03-09 [6] 2008-03-29 plot(tt) and plot(as.Date(tt)) give something like year as a function of the rest of the date. Here they are [image: tt.png] [image: as.Date.tt.png] Here are the addresses http://thomaslevine.org/time/tt.png http://thomaslevine.org/time/as.Date.tt.png Tom On Fri, Jun 19, 2009 at 6:21 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this: Lines - Sun, 14 Jun 2009 07:33:00 -0700 Sun, 14 Jun 2009 08:35:10 -0700 Sun, 14 Jun 2009 21:26:34 -0700 Mon, 15 Jun 2009 19:47:47 -0700 Wed, 17 Jun 2009 21:50:41 -0700 # L - readLines(myfile.txt) L - readLines(textConnection(Lines)) tt - as.POSIXct(L, format = %a, %d %b %Y %H:%M:%S) On Fri, Jun 19, 2009 at 6:06 PM, Thomas Levinethomas.lev...@gmail.com wrote: I am analysing occurrences of a phenomenon by time, and each of these timestamps taken from email headers represents one occurrence. (The last number is the time zone.) I can easily change the format. Sun, 14 Jun 2009 07:33:00 -0700 Sun, 14 Jun 2009 08:35:10 -0700 Sun, 14 Jun 2009 21:26:34 -0700 Mon, 15 Jun 2009 19:47:47 -0700 Wed, 17 Jun 2009 21:50:41 -0700 I've found documentation for a plethora of ways of importing time data, but I can't decide how to approach it. Any ideas on what may be the cleanest way? The only special concern is that I'll want to plot these data by date and time, meaning that I would rather not bin all of the occurrences from one day. The time zone isn't important as these are all local times; the time zone only changes as a function of daylight savings time, so I probably shouldn't use it at all. Tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error ellipse
Dear All, I have a data set with the following structure: [A], [a], [B], [b] where [A] and [B] are measurements and [a] and [b] are the associated uncertainties. I produce [B]/[A] vs. [A] plots in R and would like to show uncertainties as error ellipses (rather than error bars). Would this be relatively easy to do in R? I would appreciate any help on this Thanks a lot Tibi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a difficulty in boot package
Seunghee Baek wrote: Hi, I have a problem in programming for bootstrapping. I don't know why it show the error message. Please see my code below: #'st' is my original dataset. #functions of 'fml.mlogl','pcopula.fam4','ltd','invltd' are already defined boot.OR-function(data,i) { E=data[i,] ml1-glm(c_VAsex90_bf ~ trt,family=binomial,data=E) ml2-glm(c_VAsex90_bm ~ trt,family=binomial,data=E) marg.covariates-cbind(rep(1,length(E$trt)),E$trt) dep.covariates-cbind(rep(1,length(E$age_avr)),E$age_avr) start-c(ml1$coef,ml2$coef,0,0) fml1-optim(start,fml.mlogl,control=c(maxit=1),hessian=F) x-(1+exp(fml1$par[1]))^(-1) y-(1+exp(fml1$par[3]))^(-1) b-exp(fml1$par[5]+fml1$par[6]*43)+1 p00-ltd(-log(pcopula.fam4(exp(-invltd(x,b)),exp(-invltd(y,b)),b)),b) p1-exp(fml1$par[1])/(1+exp(fml1$par[1])) p2-exp(fml1$par[3])/(1+exp(fml1$par[3])) OR-p00*(p00+p1+p2-1)/(1-p2-p00)/(1-p1-p00) OR } set.seed(101) boot(st,boot.OR,R=500) ## I gives following error message: Error in fn(par, ...) : object dep.covariates not found Since you have not specified all function I can only guess that you need dep.covariance in function fml.mlogl() in your optim() call. In that case you need to pass it as an argument, otherwise it is not in the search path given R's scoping rules. Hence, for example, define fml.mlogl - function(..., dep.covariates) and call ist with optim(, dep.covariates=dep.covariates) Best, Uwe Ligges I hope you can help me in this problem. Thanks, Becky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] string splitting and testing for enrichment
Try this. We read in data and split TFBS on ( or ) or ) giving s and reform s into a matrix prepending the Gene name as column 1. Convert that to a data frame and make the third column numeric. Lines - Gene,TFBS NUDC,PPARA(1) HNF4(20) HNF4(96) AHRARNT(104) CACBINDINGPROTEIN(149) T3R(167) HLF(191) RPA2,STAT4(57) HEB(251) TAF12,PAX3(53) YY1(92) BRCA(99) GLI(101) EIF3I,NERF(10) P300(10) TRAPPC3,HIC1(3) PAX5(17) PAX5(110) NRF1(119) HIC1(122) TRAPPC3,EGR(26) ZNF219(27) SP3(32) EGR(32) NFKAPPAB65(89) NFKAPPAB(89) RFX(121) ZTA(168) NDUFS5,WHN(14) ATF(57) EGR3(59) PAX5(99) SF1(108) NRSE(146) TIE1,NRSE(129) DF - read.csv(textConnection(Lines), as.is = TRUE) s - strsplit(DF$TFBS, \\(|\\) |\\)) f - function(i) cbind(DF[i, Gene], matrix(s[[i]], nc = 2, byrow = TRUE)) DF2 - as.data.frame(do.call(rbind, lapply(seq_along(s), f))) DF2[[3]] - as.numeric(DF2[[3]]) View(DF2) On Sat, Jun 20, 2009 at 10:28 AM, Iain Gallagheriaingallag...@btopenworld.com wrote: Hi List I have data in the following form: Gene TFBS NUDC PPARA(1) HNF4(20) HNF4(96) AHRARNT(104) CACBINDINGPROTEIN(149) T3R(167) HLF(191) RPA2 STAT4(57) HEB(251) TAF12 PAX3(53) YY1(92) BRCA(99) GLI(101) EIF3I NERF(10) P300(10) TRAPPC3 HIC1(3) PAX5(17) PAX5(110) NRF1(119) HIC1(122) TRAPPC3 EGR(26) ZNF219(27) SP3(32) EGR(32) NFKAPPAB65(89) NFKAPPAB(89) RFX(121) ZTA(168) NDUFS5 WHN(14) ATF(57) EGR3(59) PAX5(99) SF1(108) NRSE(146) TIE1 NRSE(129) I would like to test the 2nd column (each value has letters followed by numbers in brackets) here for enrichment via fisher.test. To that end I am trying to create two factors made up of column 1 (Gene) and column 2 (TFBS) where each Gene would have several entries matching each TFBS. My main problem just now is that I can't split the TFBS column into separate strings (at the moment that 2nd column is all one string for each Gene). Here's where I am just now: test-as.character(dataIn[,2]) # convert the 2nd column from factor to character test2-unlist(strsplit(test[1], ' ')) # split the first element into individual strings (only the first element just now because I'm joust trying to get things working) test3-unlist(strsplit(test2, '\\([0-9]\\)')) # get rid of numbers and brackets now this does not behave as I hoped - it gives me: test3 [1] PPARA HNF4(20) HNF4(96) [4] AHRARNT(104) CACBINDINGPROTEIN(149) T3R(167) [7] HLF(191) ie it only removes the numbers and brackets from the first entry and not the others. Could someone point out my mistake please? Once I have all the TFBS (letters only) for each Gene I would then count how often a TFBS occurs and use this data for a fisher.test testing for enrichment of TFBS in the list I have. I'm a rather muddled here though and would appreciate advice on whether this is the right approach. Thanks Iain sessionInfo() R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modifying sound package function plot.Sample
Change as in: plot.Sample - function(..., xaxloc = 1){ and in the body replace axis(1) by axis(xaxloc) then you can call plot(., xaxloc=3) in order to plot the xaxis at the top (while bottom is still the default). Uwe Ligges David Winsemius wrote: On Jun 20, 2009, at 7:05 AM, rajesh j wrote: Hi, I'm trying to modify this function.I want to remove the existing xaxis(the tick marks and the values below each tick) and make it dynamic so that i can choose whether i want the xaxis at the top or bottom but i cant seem to change that.can somebody help me? plot.Sample - function(x,xlab=NULL,ylab=NULL,...){ sampletest - is.Sample(x,argname='x' ) if (!sampletest$test) stop(sampletest$error) s - loadSample(x,filecheck=FALSE) s - normalize(s) if (channels(s)==1) { #if (is.null(ylab)) ylab - waveform plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab,axes=FALSE,...) If I understand you correctly, then try (untested): plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab, xaxt=n, ...) Although I'm not sure you can use that ellipsis in that context. #axis(1) #axis(2,at=c(-1,0,1),labels=as.character(c(-1,0,1))) #abline(h=0) #abline(h=c(-1,1)) #lines(par()$usr[1:2],y=c(rep(par()$usr[3],2)),xpd=TRUE) The locator function would let you get user input that could be used to drive alternative specifications to the axis() function -- Rajesh.J [[alternative HTML version deleted]] David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to import timestamps from emails into R
This produces the x-axis is the index, and the y-axis is time. It has all of the time information on the same axis, allowing me to plot cumulative occurrences by time (my original plan) if the times are sorted, which they should be. I think I'll end up using some variant of plot(tt,seq_along(tt)), putting the time axis along the bottom. Thanks Tom On Sat, Jun 20, 2009 at 11:15 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this: plot(seq_along(tt), tt) On Sat, Jun 20, 2009 at 10:55 AM, Thomas Levinethomas.lev...@gmail.com wrote: Here's what I get head(tt) [1] 2008-02-20 03:09:51 EST 2008-02-20 12:12:57 EST [3] 2008-03-05 09:11:28 EST 2008-03-05 17:59:40 EST [5] 2008-03-09 09:00:09 EDT 2008-03-29 15:57:16 EDT But I can't figure out how to plot this now. plot(tt) does not appear to be univariate. I get the same plot with plot(as.Date(tt)), which would make sense if time is used because of the range of the dates and the insignificance of the times of day. head(as.Date(tt)) [1] 2008-02-20 2008-02-20 2008-03-05 2008-03-05 2008-03-09 [6] 2008-03-29 plot(tt) and plot(as.Date(tt)) give something like year as a function of the rest of the date. Here they are Here are the addresses http://thomaslevine.org/time/tt.png http://thomaslevine.org/time/as.Date.tt.png Tom On Fri, Jun 19, 2009 at 6:21 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this: Lines - Sun, 14 Jun 2009 07:33:00 -0700 Sun, 14 Jun 2009 08:35:10 -0700 Sun, 14 Jun 2009 21:26:34 -0700 Mon, 15 Jun 2009 19:47:47 -0700 Wed, 17 Jun 2009 21:50:41 -0700 # L - readLines(myfile.txt) L - readLines(textConnection(Lines)) tt - as.POSIXct(L, format = %a, %d %b %Y %H:%M:%S) On Fri, Jun 19, 2009 at 6:06 PM, Thomas Levinethomas.lev...@gmail.com wrote: I am analysing occurrences of a phenomenon by time, and each of these timestamps taken from email headers represents one occurrence. (The last number is the time zone.) I can easily change the format. Sun, 14 Jun 2009 07:33:00 -0700 Sun, 14 Jun 2009 08:35:10 -0700 Sun, 14 Jun 2009 21:26:34 -0700 Mon, 15 Jun 2009 19:47:47 -0700 Wed, 17 Jun 2009 21:50:41 -0700 I've found documentation for a plethora of ways of importing time data, but I can't decide how to approach it. Any ideas on what may be the cleanest way? The only special concern is that I'll want to plot these data by date and time, meaning that I would rather not bin all of the occurrences from one day. The time zone isn't important as these are all local times; the time zone only changes as a function of daylight savings time, so I probably shouldn't use it at all. Tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] typo in Lomb-Scargle periodogram implementation in spec.ls() from cts package?
Please report problems in the code to the package maintainer (CCing). Best, Uwe Ligges Mikhail Titov wrote: Hello! I tried to contact author of the package, but I got no reply. That is why I write it here. This might be useful for those who were using cts for spectral analysis of non-uniformly spaced data. In file spec.ls.R from cts_1.0-1.tar.gz lines 59-60 are written as pgram[k, i, j] - 0.5 * ((sum(x[1:length(ti)]* cos(2 * pi * freq.temp[k] * (ti - tao^2/sum((cos(2 * pi * freq.temp[k] * (ti - tao)))^2) + (sum(x[1:length(ti)] * sin(2 * pi * freq.temp[k] * (ti - tao^2 === ) === /sum((sin(2 * pi * freq.temp[k] * (ti - tao)))^2) Is there a misplaced bracket (shown like === ) ===)? Should it be like the following? pgram[k, i, j] - 0.5 * ((sum(x[1:length(ti)]* cos(2 * pi * freq.temp[k] * (ti - tao^2/sum((cos(2 * pi * freq.temp[k] * (ti - tao)))^2) + (sum(x[1:length(ti)] * sin(2 * pi * freq.temp[k] * (ti - tao^2/sum((sin(2 * pi * freq.temp[k] * (ti - tao)))^2) === ) === Here is quick reference http://en.wikipedia.org/wiki/Least-squares_spectral_analysis#The_Lomb.E2.80.93Scargle_periodogram . One half coefficient was not applied to entire expression. Also I find weird next lines (61-62) pgram[1, i, j] - 0.5 * (pgram[2, i, j] + pgram[N, i, j]) First of all, such things should not be in the for loop. Second, I don't quite understand the meaning of it. P.S. Should I use tapering of my data? If I just try to fit sine and cosine, I may not use it, however for FFT windowing is a must. What about Lomb-Scargle? Mikhail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to import timestamps from emails into R
If that is the situation then plot(tt) in your post could not have been what you wanted in any case, e.g. plot(10:20) On Sat, Jun 20, 2009 at 11:49 AM, Thomas Levinethomas.lev...@gmail.com wrote: This produces the x-axis is the index, and the y-axis is time. It has all of the time information on the same axis, allowing me to plot cumulative occurrences by time (my original plan) if the times are sorted, which they should be. I think I'll end up using some variant of plot(tt,seq_along(tt)), putting the time axis along the bottom. Thanks Tom On Sat, Jun 20, 2009 at 11:15 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this: plot(seq_along(tt), tt) On Sat, Jun 20, 2009 at 10:55 AM, Thomas Levinethomas.lev...@gmail.com wrote: Here's what I get head(tt) [1] 2008-02-20 03:09:51 EST 2008-02-20 12:12:57 EST [3] 2008-03-05 09:11:28 EST 2008-03-05 17:59:40 EST [5] 2008-03-09 09:00:09 EDT 2008-03-29 15:57:16 EDT But I can't figure out how to plot this now. plot(tt) does not appear to be univariate. I get the same plot with plot(as.Date(tt)), which would make sense if time is used because of the range of the dates and the insignificance of the times of day. head(as.Date(tt)) [1] 2008-02-20 2008-02-20 2008-03-05 2008-03-05 2008-03-09 [6] 2008-03-29 plot(tt) and plot(as.Date(tt)) give something like year as a function of the rest of the date. Here they are Here are the addresses http://thomaslevine.org/time/tt.png http://thomaslevine.org/time/as.Date.tt.png Tom On Fri, Jun 19, 2009 at 6:21 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Try this: Lines - Sun, 14 Jun 2009 07:33:00 -0700 Sun, 14 Jun 2009 08:35:10 -0700 Sun, 14 Jun 2009 21:26:34 -0700 Mon, 15 Jun 2009 19:47:47 -0700 Wed, 17 Jun 2009 21:50:41 -0700 # L - readLines(myfile.txt) L - readLines(textConnection(Lines)) tt - as.POSIXct(L, format = %a, %d %b %Y %H:%M:%S) On Fri, Jun 19, 2009 at 6:06 PM, Thomas Levinethomas.lev...@gmail.com wrote: I am analysing occurrences of a phenomenon by time, and each of these timestamps taken from email headers represents one occurrence. (The last number is the time zone.) I can easily change the format. Sun, 14 Jun 2009 07:33:00 -0700 Sun, 14 Jun 2009 08:35:10 -0700 Sun, 14 Jun 2009 21:26:34 -0700 Mon, 15 Jun 2009 19:47:47 -0700 Wed, 17 Jun 2009 21:50:41 -0700 I've found documentation for a plethora of ways of importing time data, but I can't decide how to approach it. Any ideas on what may be the cleanest way? The only special concern is that I'll want to plot these data by date and time, meaning that I would rather not bin all of the occurrences from one day. The time zone isn't important as these are all local times; the time zone only changes as a function of daylight savings time, so I probably shouldn't use it at all. Tom [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading .Rdata files
also use 'try' to capture the error and continue On Sat, Jun 20, 2009 at 12:57 AM, Erin Hodgess erinm.hodg...@gmail.comwrote: Dear R People: I'm loading several thousand .Rdata files in sequence. If one of them is empty, the function crashes. I am thinking about using system(wc ) etc., and strsplit for the results, but was wondering if there is a more clever way via a file type command, please. Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to apply the dummy coding rule in a dataframe with complete factor levels to another dataframe with incomplete factor levels?
Hi Sean, The levels attribute of a factor can contain levels that are not represented in the data. So, in your example we can get the desired result by adding the missing levels via the levels argument to the factor function: dfB =data.frame(f1=factor(c('a','b','b'), levels=c('a','b','c')), f2=factor(c('aa','bb','bb'), levels=c('aa','bb','cc'))) model.matrix(~f1+f2, data=dfB) (Intercept) f1b f1c f2bb f2cc 1 1 0 000 2 1 1 010 3 1 1 010 attr(,assign) [1] 0 1 1 2 2 attr(,contrasts) attr(,contrasts)$f1 [1] contr.treatment attr(,contrasts)$f2 [1] contr.treatment hth, Kingsford Jones On Fri, Jun 19, 2009 at 10:13 PM, Sean Zhangseane...@gmail.com wrote: Dear R helpers: Sorry to bother for a basic question about model.matrix. Basically, I want to apply the dummy coding rule in a dataframe with complete factor levels to another dataframe with incomplete factor levels. I used model.matrix, but could not get what I want. The following is an example. #Suppose I have two dataframe A and B dfA=data.frame(f1=factor(c('a','b','c')), f2=factor(c('aa','bb','cc'))) dfB =data.frame(f1=factor(c('a','b','b')), f2=factor(c('aa','bb','bb'))) #dfB's factor variables have less number of levels #use model.matrix on dfA (matA-model.matrix(~f1+f2,data=dfA)) #use model.matrix on dfB (matB-model.matrix(~f1+f2,data=dfB)) #I actaully like to dummy code dfB using the dummy coding rule defined in model.matrix(~f1+f2,data=dfA)) #matB_wanted is below (matB_wanted-rbind(c(1,0,0,0,0),c(1,1,0,1,0),c(1,1,0,1,0)) ) colnames(matB_wanted)-colnames(matA) matB_wanted Can someone kindly show me how to get matB_wanted? Many thanks in advance! -Sean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] png() resolution problem {was Silhouette ...}
Hallo Sebastian, SP == Sebastian Pölsterl s...@k-d-w.org on Sun, 14 Jun 2009 14:04:52 +0200 writes: SP Hello Martin, SP I plotting the silhouette of a clustering and storing it as png. When I SP try to store the image as png the bars are missing. The bars are plotted SP when I use x11 or postscript as device. In addition, it seems to work SP when I use a smaller matrix (e.g. ruspini). SP Would be great if you have look at this issue. Hmm, I've been at a conference in Italy... The silhouette plot only uses standard R plotting functions, so any problem with it exposes problems in standard R graphics. -- Such a message should really go to R-help. to which I CC now. -- library(cluster) nmat - matrix(rnorm(2500*300), ncol=300, nrow=2500) rmat - matrix(rchisq(1000, 300, 50), ncol=300, nrow=1000) mat - rbind(nmat, rmat) pr - pam(mat, 2) sil - silhouette(pr) png(sil.png) #postscript(sil.ps) plot(sil) dev.off() -- Anyway, I can confirm the problem, but of course, it has not much to do with the silhouette function, but rather with the png() device which produces a bitmap, and the lines you draw are too fine (in the bitmap resolution) and so are rounded to invisible. You can reproduce the problem much more simply: set.seed(1); x - rlnorm(5000) png(bar.png);barplot(x,col=gray,border=0,horiz=TRUE);dev.off() system(eog bar.png ) ## which is also empty, and the completely analogue, replacing ## png [bitmap] with pdf [vector graphic] pdf(bar.pdf);barplot(x,col=gray,border=0,horiz=TRUE);dev.off() system(evince bar.pdf ) ## gives a very nice plot, into which you can zoom and see all details. Now in principle you should be able to use png() with a much higher resolution than the default one, but replacing the above png(bar.bng) with png(bar.bng, res = 1200) did not help, as we now get the infamous Error in plot.new() : figure margins too large Other R-help readers will be able to make the png() example work for such cases, where you need so many lines. {but let's stick with barplot(*, border=0, *)} Regards, Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Leave One Out Cross Validation
Hi David, Thanks and I apologize for the lack of clarity. ##n is defined as the length of xdat n-length(xdat) #I defined 'k' as the Gaussian kernel function k-function(v) {1/sqrt(2*pi)*exp(-v^2/2)} #GAUSSIAN kernal #I believe ypred in my case, was the leave one out estimator (I think its the variable x, in other words xdat or independent variable). I could be wrong, but from the text of Davidson and MacKinnon, thats what I got out of it. cvhfunc-function(y,x,h){ ypred-0 for (i in 1:n){ for (j in 1:n){ if (j!=i){ ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/ k( (x[j]-x[i])/h ) #I believe ypred in my case, was the leave one out estimator (I think its the variable x, in other words xdat or independent variable). I could be wrong, but from the text of Davidson and MacKinnon, thats what I got out of it. #I totally hear you on the fact that ypred will just equal y[i], however it should not be the case because for each x[i], I should be running all x[j] except for when x[i]=x[j]. And I should be mulitiplying the numerator of ypred (y[i]*k( (x[j]-x[i])/h ). However, its not doing that. #I believe CV should be the following: It is used to determine optimal bandwidth. Steps: (1)The process involves running x, [j] times for each x[i], except for when x[j]=x[i]. This is similiar to drawing a histogram and finding how how large/small the bins should be. ypred should be a vector of nx1 (2) The second step is similiar to a variance measure, where take the difference of y and (1), square, and than sum for all n. } } } ypred # not sure what that will do inside the function. If it's there for debugging you may want to print(ypred) cvh-0 cvh-cvh+((1/n)*(ydat-ypred)^2 # ypred is a scalar, while y is a vector, so cvh will be a vector. Is that what you want? } # I was not sure with this ypred # not sure what that will do inside the function. If it's there for debugging you may want to print(ypred). #As a test run with h=.2; If test values of h= 1.4,15,30,50 show different and good results (i.e no NaN, #massive small numbers, etc) test2-cvhfunc(ydat,xdat,.2);test2 Thanks. -Mudasir } David Winsemius wrote: On Jun 19, 2009, at 7:45 PM, muddz wrote: Hi Uwe, My apologies. Please if I can be guided what I am doing wrong in the code. I started my code as such: # ypred is my leave one out estimator of x Estimator of x? Really? cvhfunc-function(y,x,h){ ypred-0 for (i in 1:n){ #how did you define n? It's not in your parameter list for (j in 1:n){ if (j!=i){ ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/ k( (x[j]-x[i])/h ) At this point you are also using k as a function. Have you at any point defined k? Also multiplication of the ratio of two identical numbers would generally give a result of y[i] for that second term. Unless, of course, any x[j] = x[i] in which case you will throw an error and every thing will grind to a halt. It might help if you now explained what you think CV should be. } } } ypred # not sure what that will do inside the function. If it's there for debugging you may want to print(ypred) #CVh is a # Yes? CVh is a what ? cvh-0 cvh-cvh+((1/n)*(y-ypred)^2# n again. R will still not know what that is. cvh # ypred is a scalar, while y is a vector, so cvh will be a vector. Is that what you want? } test2-cvhfunc(ydat,xdat,.2);test2 #I was experimenting with the following data: library(datasets) data(faithful) ydat-faithful$eruptions;ydat;plot(ydat);par(new=T) xdat-faithful$waiting;xdat;plot(xdat,col=blue) # I want to minimize the CV function with respect to h. Thanks. Uwe Ligges-3 wrote: See the posting guide: If you provide commented, minimal, self-contained, reproducible code some people may be willing to help on the list. Best, Uwe Ligges muddz wrote: Hi All, I have been trying to get this LOO-Cross Validation method to work on R for the past 3 weeks but have had no luck. I am hoping someone would kindly help me. Essentially I want to code the LOO-Cross Validation for the 'Local Constant' and 'Local Linear' constant estimators. I want to find optimal h, bandwidth. Thank you very much! -M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html
[R] Tree Decison with rpat
Dear colleagues in R, I am using rpart to make treedecision, but what treedecision rpart use ? j48, id3, c4.5 ? Tanks Rafael Marconi Ramos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Customize axis labels in xyplot
Hello, I'm plotting an xyplot where a continuous var recorded every min is plotted on y, and time expressed as HH:MM:SS on x, as follows : xaxis=list(tick.number=12,rot=90) lst=list(x=xaxis) xyplot(upt$LOAD_1 ~ upt$TIME, data=upt, type=c('g','p', 'r'), scales=lst) On the x-axis, every time label is drawn and the label quickly become unreadable as they overlap on each other. I wished to limit the number of label with 'tick.number=12' but it does not work as the x values are not treated as a numerical sequence. How can I limit the number of ticks and labels for a time series expressed as HH:MM:SS ? One way might be to convert to mins from an origin, the result would be less expressive though. Thank you for any help. -- View this message in context: http://www.nabble.com/Customize-axis-labels-in-xyplot-tp24126788p24126788.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)
(I am replacing R-devel and r-bugs with r-help as addressees.) On Sat, Jun 20, 2009 at 9:45 AM, Dr. D. P. Kreil dpkr...@gmail.com wrote: So if I request a calculation of 0.3-0.1-0.1-0.1 and I do not get 0, that is not an issue of rounding / underflow (or whatever the correct technical term would be for that behaviour)? No. Let's start from the beginning. In binary floating point arithmetic, all numbers are represented as a*2^b, where a and b have a fixed number of digits, so input conversion from decimal form to binary form inherently loses some precision -- that is, it rounds to the nearest binary fraction. For example, representation(0.3) is 5404319552844595 * 2^-54, about 1e-17 less than exactly 3/10, which is of course not representable in the form a*2^b. The EXACT difference (calculating with rationals -- no roundoff errors etc.) between representation(0.3) and 3*representation(0.1) is 2^-55 (about 1e-17); the EXACT difference between representation(0.3) and representation(3*representation(0.1)) is 2^-54. As it happens, in this case, there is no rounding error at all -- the floating-point result of 0.3 - 3*0.1 is exactly -2^-54. I thought that guard digits would mean that 0.3-0.1*3 should be calculated in higher precision than the final representation of the result, i.e., avoiding that this is not equal to 0? Guard digits and sticky bits are techniques for more accurate rounding of individual arithmetic operations, and do not persist beyond each individual operation. They cannot create precise results out of imprecise inputs (except when they get lucky!). And even with precise inputs, they cannot create correctly rounded results with multiple operations. Consider for example (1.0 + 1.0e-15) - 1.0. The correctly rounded result of (1.0+1.0e-15) is 1.0011... And the correctly rounded result of (1.0+1.0e-15)-1.0 is 1.11e-15, which is 11% different than the mathematical result. Perhaps you are thinking about the case where intermediate results are accumulated in higher-than-normal precision. This technique only applies in very specialized circumstances, and it not available to user code in most programming languages (including R). I don't know whether R's sum function uses this technique or some other (e.g. Kahan summation), but it does manage to give higher precision than summation with individual arithmetic operators: sum(c(2^63,1,-2^63)) = 1 but Reduce(`+`,c(2^63,1,-2^63)) = 0 I am sorry if I am not from the field... If you can suggest an online resource to help me use the right vocabulary and better understand the fundamental concepts, I am of course grateful. I would suggest What every computer scientist should know about floating-point arithmetic *ACM Computing Surveys* *23*:1 (March 1991) for the basics. Anything by Kahan (http://www.cs.berkeley.edu/~wkahan/) is interesting. Beyond elementary floating-point arithmetic, there is of course the vast field of numerical analysis, which underlies many of the algorithms used by R and other statistical systems. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Leave One Out Cross Validation
You don't seem to be making any corrections or updating your code. There remains a syntax error in the last line of cvhfunc because of mismatched parens. On Jun 20, 2009, at 1:04 PM, muddz wrote: Hi David, Thanks and I apologize for the lack of clarity. #n is defined as the length of xdat n-length(xdat) Would be better to include that at the top of the function body. #I defined 'k' as the Gaussian kernel function k-function(v) {1/sqrt(2*pi)*exp(-v^2/2)} #GAUSSIAN kernal #I believe ypred in my case, was the leave one out estimator (I think its the variable x, in other words xdat or independent variable). It does not make sense to me that ypred or yhat would be an estimator of x. I could be wrong, but from the text of Davidson and MacKinnon, thats what I got out of it. Don't have whatever text. The Google hits suggest it might be Econometric Theory and Methods. Amazon has one of those look inside options. Searching on cv(h) suggests you are trying to replicate kernel regression on p 686. #I totally hear you on the fact that ypred will just equal y[i], however it should not be the case because for each x[i], I should be running all x[j] except for when x[i]=x[j]. And I should be mulitiplying the numerator of ypred (y[i]*k( (x[j]-x[i])/h ). However, its not doing that. Not doing what? You need to decide what is wrong. (y[i]*k((x[j]-x[i])/h))/k((x[j]-x[i])/h) = y[i] * 1 # by definition, unless x[j] = x[i], in which case it is NaN ... which is what in the vector that your current function produces after fixing some syntax. Doing some debugging it looks as though the first time through with i=1 and j=2 that the results of the calucation is NaN which means that ypred will always be NaN. Just because it is done over a loop does not fix the error in numerical logic and failure to check arguments. So you still need to decide how to trap situations where x[j] = x[i]. And you need to re-read DM and implement the weights that prevent using using terms when k((x[j]- x[i])/h) is very small. (There is only a single summation index in that CV(h) formula. So maybe you could just run predict(lm(y ~ . , data=x[-i])) instead of that god-awful nested loop.) #I believe CV should be the following: It is used to determine optimal bandwidth. Steps: If CV is a scalar then the last line could not possibly give you what you want: (1 - 1:10) #scalar minus vector gives vector. [1] 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Don't you want a sum? (1)The process involves running x, [j] times for each x[i], except for when x[j]=x[i]. This is similiar to drawing a histogram and finding how how large/small the bins should be. ypred should be a vector of nx1 (2) The second step is similiar to a variance measure, where take the difference of y and (1), square, and than sum for all n. You are going to need to define second step. # I was not sure with this ypred That line looks useless to me at the moment. ypred is whatever it is at that point, and it won't return a value from the function because you later evaluate other expressions. Functions only return the last evaluated expression or what is wrapped in return(). # not sure what that will do inside the function. If it's there for debugging you may want to print(ypred). Thanks. -Mudasir } David Winsemius wrote: On Jun 19, 2009, at 7:45 PM, muddz wrote: Hi Uwe, My apologies. Please if I can be guided what I am doing wrong in the code. I started my code as such: # ypred is my leave one out estimator of x Estimator of x? Really? cvhfunc-function(y,x,h){ ypred-0 for (i in 1:n){ #how did you define n? It's not in your parameter list for (j in 1:n){ if (j!=i){ ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/ k( (x[j]-x[i])/h ) At this point you are also using k as a function. Have you at any point defined k? Also multiplication of the ratio of two identical numbers would generally give a result of y[i] for that second term. Unless, of course, any x[j] = x[i] in which case you will throw an error and every thing will grind to a halt. It might help if you now explained what you think CV should be. } } } ypred # not sure what that will do inside the function. If it's there for debugging you may want to print(ypred) #CVh is a # Yes? CVh is a what ? cvh-0 cvh-cvh+((1/n)*(y-ypred)^2# n again. R will still not know what that is. cvh # ypred is a scalar, while y is a vector, so cvh will be a vector. Is that what you want? } test2-cvhfunc(ydat,xdat,.2);test2 #I was experimenting with the following data: library(datasets) data(faithful) ydat-faithful$eruptions;ydat;plot(ydat);par(new=T) xdat-faithful$waiting;xdat;plot(xdat,col=blue) # I want to minimize the CV function with respect
Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)
On Sat, Jun 20, 2009 at 4:10 PM, Dr. D. P. Kreil dpkr...@gmail.com wrote: Ah, that's probably where I went wrong. I thought R would take the 0.1, the 0.3, the 3, convert them to extended precision binary representations, do its calculations, an the reduction to normal double precision binary floats would only happen when the result was stored or printed. This proposal is problematic in many ways. For example, it would *still* not guarantee that 0.3 - 3*0.1 == 0, since extended-precision floats have the same characteristics as normal-precision floats. Would you round to normal precision when passing arguments? Then sqrt could not produce extended-precision results. etc. etc. I suppose R could support an extended-precision floating-point type, but that would require that the *user* choose which operations were in extended-precision and which in normal precision. (And of course it would be a lot of work to add in a complete and consistent way.) -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation between categorical data
On Jun 20, 2009, at 2:05 PM, Jason Morgan wrote: On 2009.06.19 14:04:59, Michael wrote: Hi all, In a data-frame, I have two columns of data that are categorical. How do I form some sort of measure of correlation between these two columns? For numerical data, I just need to regress one to the other, or do some pairs plot. But for categorical data, how do I find and/or visualize correlation between the two columns of data? As Dylan mentioned, using crosstabs may be the easiest way. Also, a simple correlation between the two variables may be informative. If each variable is ordinal, you can use Kendall's tau-b (square table) or tau-c (rectangular table). The former you can calculate with ?cor (set method=kendall), the latter you may have to hack something together yourself, there is code on the Internet to do this. If the data are nominal, then a simple chi-squared test (large-n) or Fisher's exact test (small-n) may be more appropriate. There are rules about which to use when one variable is ordinal and one is nominal, but I don't have my notes in front of me. Maybe someone else can provide more assistance (and correct me if I'm wrong :). I would be cautious in recommending the Fisher Exact Test based upon small samples sizes, as the FET has been shown to be overly conservative. This also applies to the use of the continuity correction for the chi-square test (which replicates the behavior of the FET). For more information see: Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations Ian Campbell Stat in Med 26:3661-3675; 2007 http://www3.interscience.wiley.com/journal/114125487/abstract and: How conservative is Fisher's exact test? A quantitative evaluation of the two-sample comparative binomial trial Gerald G. Crans, Jonathan J. Shuster Stat Med. 2008 Aug 15;27(18):3598-611. http://www3.interscience.wiley.com/journal/117929459/abstract Frank also has some comments here (bottom of the page): http://biostat.mc.vanderbilt.edu/wiki/Main/DataAnalysisDisc#Some_Important_Points_about_Cont More generally, Agresti's Categorical Data Analysis is typically the first reference in this domain to reach for. There is also a document written by Laura Thompson which provides for a nice R companion to Agresti. It is available from: https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)
Dear Stavros, Thank you very much for your helpful email and your patience. Perhaps you are thinking about the case where intermediate results are accumulated in higher-than-normal precision. This technique only applies in very specialized circumstances, and it not available to user code in most programming languages (including R). Ah, that's probably where I went wrong. I thought R would take the 0.1, the 0.3, the 3, convert them to extended precision binary representations, do its calculations, an the reduction to normal double precision binary floats would only happen when the result was stored or printed. Having read your explanations now, I suspect it was unreasonable to expect that. I don't know whether R's sum function uses this technique or some other (e.g. Kahan summation), but it does manage to give higher precision than summation with individual arithmetic operators: sum(c(2^63,1,-2^63)) = 1 but Reduce(`+`,c(2^63,1,-2^63)) = 0 That is very interesting! I would suggest What every computer scientist should know about floating-point arithmetic ACM Computing Surveys 23:1 (March 1991) for the basics. Anything by Kahan (http://www.cs.berkeley.edu/~wkahan/) is interesting. Beyond elementary floating-point arithmetic, there is of course the vast field of numerical analysis, which underlies many of the algorithms used by R and other statistical systems. Thank you very much for the pointers! Best regards, David. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)
Yes, I see the problem now! Thank you for bearing with me and for the helpful explanations and info. Best regards, David. 2009/6/20 Stavros Macrakis macra...@alum.mit.edu: On Sat, Jun 20, 2009 at 4:10 PM, Dr. D. P. Kreil dpkr...@gmail.com wrote: Ah, that's probably where I went wrong. I thought R would take the 0.1, the 0.3, the 3, convert them to extended precision binary representations, do its calculations, an the reduction to normal double precision binary floats would only happen when the result was stored or printed. This proposal is problematic in many ways. For example, it would *still* not guarantee that 0.3 - 3*0.1 == 0, since extended-precision floats have the same characteristics as normal-precision floats. Would you round to normal precision when passing arguments? Then sqrt could not produce extended-precision results. etc. etc. I suppose R could support an extended-precision floating-point type, but that would require that the *user* choose which operations were in extended-precision and which in normal precision. (And of course it would be a lot of work to add in a complete and consistent way.) -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help installing xts package
Hi, I am trying to install package xts. I am using OPENSUSE 11 64 bit. When I invoke: install.pacakges(xts,dependencies=T) I received a lot of ERROR: compilation failed for package errors when R tries to install xts dependencies. I do not understand this behavior. I notice one thing. It complains. ERROR: compilation failed for package 'Hmisc' ** Removing '/home/chibi/R/x86_64-unknown-linux-gnu-library/2.8/Hmisc' * Installing *source* package 'Design' ... ** libs WARNING: R include directory is empty -- perhaps need to install R-devel.rpm or similar Is it because I need R-devel.rpm? What is it? I search google around and cannot find any explanation. Thank you. adschai [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Roxygen vs Sweave for S4 documentation
Hi, I have been using R for a while. Recently, I have begun converting my package into S4 classes. I was previously using Rdoc for documentation. Now, I am looking to use the best tool for S4 documentation. It seems that the best choices for me are Roxygen and Sweave (I am fine with tex). Are there any users of Roxygen or Sweave who can comment on the strengths or weaknesses of one or othe other? Thanks in advance. - Ken -- View this message in context: http://www.nabble.com/Roxygen-vs-Sweave-for-S4-documentation-tp24131590p24131590.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation between categorical data
On Saturday 20 June 2009 04:36:55 pm Marc Schwartz wrote: On Jun 20, 2009, at 2:05 PM, Jason Morgan wrote: On 2009.06.19 14:04:59, Michael wrote: Hi all, In a data-frame, I have two columns of data that are categorical. How do I form some sort of measure of correlation between these two columns? For numerical data, I just need to regress one to the other, or do some pairs plot. But for categorical data, how do I find and/or visualize correlation between the two columns of data? As Dylan mentioned, using crosstabs may be the easiest way. Also, a simple correlation between the two variables may be informative. If each variable is ordinal, you can use Kendall's tau-b (square table) or tau-c (rectangular table). The former you can calculate with ?cor (set method=kendall), the latter you may have to hack something together yourself, there is code on the Internet to do this. If the data are nominal, then a simple chi-squared test (large-n) or Fisher's exact test (small-n) may be more appropriate. There are rules about which to use when one variable is ordinal and one is nominal, but I don't have my notes in front of me. Maybe someone else can provide more assistance (and correct me if I'm wrong :). I would be cautious in recommending the Fisher Exact Test based upon small samples sizes, as the FET has been shown to be overly conservative. . . . There are other ways of regarding the FET. Since it is precisely what it says - an exact test - you can argue that you should avoid carrying over any conclusions drawn about the small population the test was applied to and employing them in a broader context. In so far as the test is concerned, the sample data and the contingency table it is arrayed in are the entire universe. In that sense, the FET can't be conservative or liberal. It isn't actually a hypothesis test and should not be thought of as one or used in the place of one. JDougherty [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.