Re: [R] Completely Off Topic:Link to IOM report on use of -omics tests in clinical trials
Thanks, I had totally missed this controversy but from quick read of summary the impact on open source analysis was unclear.Can you explain the punchline? I think many users of R have concluded the biggest problem in most analyses isfirst getting the data and then verfiying any results you derive, both issues that sound related to your post. ( The jumble below is illustrative of what hotmail has been doing with plain text, getting plain data withoutall the formatting junk is a recurring problem LOL). #62; Date#58; Mon, 26 Mar 2012 22#58;38#58;56 #43;0100#13;#10;#62; From#58; iaingallagher#64;btopenworld.com#13;#10;#62; To#58; gunter.berton#64;gene.com#59; r-help#64;r-project.org#13;#10;#62; Subject#58; Re#58; #91;R#93; Completely Off Topic#58;Link to IOM report on use of #34;-omics#34; tests in clinical trials#13;#10;#62;#13;#10;#62; I followed this case while it was ongoing.#13;#10;#62;#13;#10;#62;#13;#10;#62; It was a very interesting example of basic mistakes but also #40;for me#41; of journal politicking.#13;#10;#62;#13;#10;#62;#13;#10;#62; Keith Baggerly and Kevin Coombes wrote a great paper - #34;DERIVING CHEMOSENSITIVITY FROM CELL LINES#58; FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY#34; in The Annals of Applied Statistics #40;2009, Vol. 3, No. 4, 1309#8211;1334#41; which explains some of the background and investigative work they had to do to bring those mistakes to light.! #13;#10;#62;#13;#10;#62;#13;#10;#62; Best#13;#10;#62;#13;#10;#62; iain#13;#10;#62;#13;#10;#62;#13;#10;#62;#13;#10;#62; - Original Message -#13;#10;#62; From#58; Bert Gunter #60;gunter.berton#64;gene.com#62;#13;#10;#62; To#58; r-help#64;r-project.org#13;#10;#62; Cc#58;#13;#10;#62; Sent#58; Monday, 26 March 2012, 19#58;12#13;#10;#62; Subject#58; #91;R#93; Completely Off Topic#58;Link to IOM report on use of #34;-omics#34; tests in clinical trials#13;#10;#62;#13;#10;#62; Warning#58; This has little directly to do with R, although R and related#13;#10;#62; tools #40;e.g. sweave and other reproducible research tools#41; have a#13;#10;#62; natural role to play.#13;#10;#62;#13;#10;#62; The IOM report#58;#13;#10;#62;#13;#10;#62; http#58;//www.iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx#13;#10;#62;#13;#10;#62; that arose out of the Duke Univ. genomics testing scandal ha! s been#13;#10;#62; released. My thanks to Keith Baggerly for forwar ding this. I believe#13;#10;#62; that many R users in the medical research community will find this#13;#10;#62; interesting, and I hope I do not venture too far out of line by#13;#10;#62; passing on the link to readers of this list. It #42;#42;will#42;#42; have an#13;#10;#62; important impact on so-called Personalized Health Care #40;which I guess#13;#10;#62; affects all of us#41;, and open source analytical #40;statistical#41;#13;#10;#62; methodology is a central issue.#13;#10;#62;#13;#10;#62; For those interested, try the summary first.#13;#10;#62;#13;#10;#62; Best to all,#13;#10;#62; Bert#13;#10;#62;#13;#10;#62;#13;#10;#62; --#13;#10;#62;#13;#10;#62; Bert Gunter#13;#10;#62; Genentech Nonclinical Biostatistics#13;#10;#62;#13;#10;#62; Internal Contact Info#58;#13;#10;#62; Phone#58; 467-7374#13;#10;#62; Website#58;#13;#10;#62; http#58;//pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pd! b-biostatistics/pdb-ncb-home.htm#13;#10;#62;#13;#10;#62; __#13;#10;#62; R-help#64;r-project.org mailing list#13;#10;#62; https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read the posting guide http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide commented, minimal, self-contained, reproducible code.#13;#10;#62;#13;#10;#62;#13;#10;#62; __#13;#10;#62; R-help#64;r-project.org mailing list#13;#10;#62; https#58;//stat.ethz.ch/mailman/listinfo/r-help#13;#10;#62; PLEASE do read the posting guide http#58;//www.R-project.org/posting-guide.html#13;#10;#62; and provide commented, minimal, self-contained, reproducible code.#13;#10; __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Future of R | API to Public Databases
LOL, I remember posting about this in the past. The US gov agencies vary but mostare quite good. The big problem appears to be people who push proprietary orcommercial standards for which only one effective source exists. Some formats,like Excel and PDF come to mind and there is a disturbing trend towards theiradoption in some places where raw data is needed by many. The best thing to do is contact the informationprovider and let them know you want raw data, not images or stuff that worksin limited commercial software packages. Often data sources are valuable andthe revenue model impacts availability. If you are just arguing over different open formats, it is usually easy for someone towrite some conversion code and publish it- CSV to JSON would not be a problem for example. Data of course are quite variable and there is nothingwrong with giving provider his choice. Date: Sat, 14 Jan 2012 10:21:23 -0500 From: ja...@rampaginggeek.com To: r-help@r-project.org Subject: Re: [R] The Future of R | API to Public Databases Web services are only part of the problem. In essence, there are at least two facets: 1. downloading the data using some protocol 2. mapping the data to a common model Having #1 makes the import/download easier, but it really becomes useful when both are included. I think #2 is the harder problem to address. Software can usually be written to handle #1 by making a useful abstraction layer. #2 means that data has consistent names and meanings, and this requires people to agree on common definitions and a common naming convention. RDF (Resource Description Framework) and its related technologies (SPARQL, OWL, etc) are one of the many attempts to try to address this. While this effort would benefit R, I think it's best if it's part of a larger effort. Services such as DBpedia and Freebase are trying to unify many data sets using RDF. The task view and package ideas a great ideas. I'm just adding another perspective. Jason On 01/13/2012 05:18 PM, Roy Mendelssohn wrote: HI Benjamin: What would make this easier is if these sites used standardized web services, so it would only require writing once. data.gov is the worst example, they spun the own, weak service. There is a lot of environmental data available through OPenDAP, and that is supported in the ncdf4 package. My own group has a service called ERDDAP that is entirely RESTFul, see: http://coastwatch.pfel.noaa.gov/erddap and http://upwell.pfeg.noaa.gov/erddap We provide R (and matlab) scripts that automate the extract for certain cases, see: http://coastwatch.pfeg.noaa.gov/xtracto/ We also have a tool called the Environmental Data Connector (EDC) that provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you to subset data that is served by OPeNDAP, ERDDAP, certain Sensor Observation Service (SOS) servers, and have it read directly into R. It is freely available at: http://www.pfeg.noaa.gov/products/EDC/ We can write such tools because the service is either standardized (OPeNDAP, SOS) or is easy to implement (ERDDAP). -Roy On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote: Dear R Users - R is a wonderful software package. CRAN provides a variety of tools to work on your data. But R is not apt to utilize all the public databases in an efficient manner. I observed the most tedious part with R is searching and downloading the data from public databases and putting it into the right format. I could not find a package on CRAN which offers exactly this fundamental capability. Imagine R is the unified interface to access (and analyze) all public data in the easiest way possible. That would create a real impact, would put R a big leap forward and would enable us to see the world with different eyes. There is a lack of a direct connection to the API of these databases, to name a few: - Eurostat - OECD - IMF - Worldbank - UN - FAO - data.gov - ... The ease of access to the data is the key of information processing with R. How can we handle the flow of information noise? R has to give an answer to that with an extensive API to public databases. I would love your comments and ideas as a contribution in a vital discussion. Benjamin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The contents of this message do not reflect any position of the U.S. Government or NOAA. ** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center
Re: [R] HELP!! - PHP calling R to execute a r-code file (*.r)
Date: Fri, 30 Dec 2011 16:04:08 -0600 From: xiuquan.w...@gmail.com To: r-help@r-project.org Subject: [R] HELP!! - PHP calling R to execute a r-code file (*.r) Hi, I have met a tough problem when using PHP to call R to generate some plots. I tested it okay in my personal computer with WinXP. But when I was trying to update to my server (Win2003 server), I found it did not work. Below is the details: I've run into lots of problems like this. Generally first check the php error log file, I have noidea where it is on your machine, and see if you can get your script to dump output somewhere,possibly with absolute path so you know where to look for it LOL. Often the change in user creates unexpected problems with file permissions and libraries and paths. You need to checkthe specific direcories for permissions not just top level. I would also point out that there is Rapache available as well as Rserver. Curious if people are using R with any other unique situations server side. We have a java webserver which I use to invoke R via bash scripts and generate rathercomplicated files. These could take very long to generate but if you have flexible caching system,it can be easy to re use output files or even generate them ahead of time. Starting R or any otherprocess is not instantaneous and often image generation is quite time consuming. Thereare a lot of issues making it work well in a server setting in real time. Scale up has also been an issue. Apache threading or process model is quite expensive if you careabout performance. We were able to use netty front end and so far that has worked very well.PHP AFAIK is not thread safe however. 1 r-code file (E:/mycode.r): -- jpeg(E:/mytest.jpg) plot(1:10) dev.off() -- 2 php code: --- exec(R CMD BATCH --vanilla --slave --no-timing E:/mycode.r exit); --- 3 results: for WinXP: the image can be generated successfully. for Server(win2003): can not be generated. BTW, I have added a user everyone with full control permission to the E disk. [[elided Hotmail spam]] Thanks. All the best, Xiuquan Wang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert CSV file to FASTA
Date: Wed, 31 Aug 2011 01:36:51 -0700 From: oliviacree...@gmail.com To: r-help@r-project.org Subject: [R] Convert CSV file to FASTA Hi there, I have large excel files which I can save as CSV files. Each excel file contains two columns. One contains the chromosome number and the second contains a DNA sequence. I need to convert this into a fasta file that looks like this chromosomenumber CGTCGAGCGTCGAGCGGAGCG Can anyone show me an R script to do this? If you can post a few lines of your csv someone can probably give you a bach script to do it. It may be possible in R but sed/awk probbly work better. IIRC, fasta is just a name line followed by sequence. If your csv looks like name, XX it may be possible to change comma to space and use awk with something like print $1\n$2 etc. Many thanks x -- View this message in context: http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R the right choice for simulating first passage times of random walks?
Top posting cuz hotmail decided not to highlight... Personally I would tend to use java or c++ for the inner loops but you could of course later make an R package out of that. This is especially true if your code will be used elsewhere in a performance critical system. For example, I wrote some c++ code for dealing with graphs nothing fancy but it let me play with some data structure ideas and I could then build it into standalone programs or perhaps an R package. Many of these things slow down due to memory incoherence or IO long before you use up the processor. With c++ in principle anyway you have a lot of control over these things. Once you have your results and want to anlyze them, that's when I would use R. Dumping simulation samples to a text file is easy to also lets you use other things like sed/grep or vi to explore as needed. From: paulepan...@users.sourceforge.net To: r-help@r-project.org Date: Thu, 28 Jul 2011 02:00:13 +0200 Subject: Re: [R] Is R the right choice for simulating first passage times of random walks? Dear R folks, Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel: I need to simulate first passage times for iterated partial sums. The related papers are for example [1][2]. As a start I want to simulate how long a simple random walk stays negative, which should result that it behaves like n^(-½). My code looks like this. 8 code 8 n = 10 # number of simulations length = 10 # length of iterated sum z = rep(0, times=length + 1) for (i in 1:n) { x = c(0, sign(rnorm(length))) s = cumsum(x) for (i in 1:length) { if (s[i] 0 s[i + 1] = 0) { z[i] = z[i] + 1 } } } plot(1:length(z), z/n) curve(x**(-0.5), add = TRUE) 8 code 8 Of course the program above is not complete, because it only checks for the first passage from negativ to positive. `if (s[2] 0) {}` should be added before the for loop. This code already runs for over half an hour on my system¹. Reading about the for loop [3] it says to try to avoid loops and I probably should use a matrix where every row is a sample. Now my first problem is that there is no matrix equivalent for `cumsum()`. Can I use matrices to avoid the for loop? I mean the inner for loop. Additionally I wonder if `cumsum` is really faster or if I should sum the elements by myself and check after every step if the walk gets non-negative/0. With a length of 100 this should save some cycles. On the other hand adding numbers should be really fast and adding checks in between could potentially be slower. My second question is, is R the right choice for such simulations? It would be great when R can also give me a confidence interval(?) and also try to fit a curve through the result and give me the rule of correspondence(?) [4]. Do you have pointers for those? I glanced at simFrame [5] and read `?simulate` but could not understand it right away and think that this might be overkill. Do you have any suggestions? Thanks, Paul ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps [2] http://arxiv.org/abs/0911.5456 [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there an R program that produces optimal solution/mix of multiple samples' varying volumes and values
Date: Mon, 25 Jul 2011 11:39:22 -0700 From: lukescore...@gmail.com To: r-help@r-project.org Subject: [R] Is there an R program that produces optimal solution/mix of multiple samples' varying volumes and values Sorry about the lengthy subject line. Anyone know of an R' program that can look at several sources' varying available volumes/amounts and their individual set of values, compared to a target range/curve for these values, to find the optimal mixture(s) of these possible sources for the desired curve, and for a specified amount? I hope that makes sense as a reader. Well, whatever you are talking about you need to write some error as a function of your fit parameters and decide if you can minimize it analytically or not. If you can do it analytically, then there are probably matrix functions you want. If not, there are several optimizers that should come up on google. If you are trying to express some data in terms of nice basis set that may be different than unknown collection of things. For example, if you have sin/cos curves fft may work, polynomial something else etc. Often when people ask questions like this, they try to fit to a collection of things they think may work and then the optimizer gets stuck since it can't optimize a and b for a*x + b*x etc so make sure your error function has a specific best fit etc. Generally coding mistakes can be addressed on the list if you get that far. Thanks for your time. Luke __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Life Cycle Assessment with R.
Date: Mon, 25 Jul 2011 19:03:08 +0100 From: jbustosm...@yahoo.es To: r-help@r-project.org Subject: [R] Life Cycle Assessment with R. Hello everyone, There's something really important about climate change and how many institutions around the globe are looking for softwares solutions in order to achieve they (and everyone) needs to improve life conditions in all the planet. Currently, they're many comercial softwares working with this important topic named as: Life Cycle Assesment, monitoring carbon emition, but as many of you may know, commercial softwares controlling or managing our planet could be another big mistake. To sum up briefly, it could be a good idea creating a R package doing Life Cycle Assessment (if has not being created) in order to gain a better understanding and making these important decisions about global warming and how can we as humanity control how Carbon Footprint is measured by commercial or not commercial propouses. Who knows if there's people working in Life Cycle Assesment (carbon emition) with R? or If there's someone interested in doing a package about it, please let me know! I explain this here, because of the R philosophy.Best wishes!José BustosChilean Biostatistician www.aespro.cl [ my text below] Well, generally R packages are more general purpose tools than specific applications such as this- although there may be an iphone save the world app LOL. I have no idea but usually the issue here is getting the required data in an open form. Many govt agencies think excel is open and that you would not want to do an analysis that wasn't supported in such popular software. This comes up with financial data all the time, I even asked for account information in csv and I got a reply, thank you for asking about exporting to excel and a detailed explanation that was completely irrelevant. Modelling of complicated systems is often well complicated and predictions about the future involve assumptions that are often made to suit the needs of the immediate analyst. On topics which involve money or emotion, getting unbiased analysis is impossible and getting data can be very difficult. This list is not designed for advocacy or even discussion of analysis results but ability to get data in form usable by R may be or more general interest to those seeking help on R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xml2-config issues
Date: Fri, 22 Jul 2011 20:06:34 -0600 From: abmathe...@gmail.com To: r-help@r-project.org Subject: [R] xml2-config issues I'm trying to install the XML package on Ubuntu 10.10, and I keep getting a warning message the XML could not be found and had non-zero exit status. How can I fix this problem? install.packages() Loading Tcl/Tk interface ... done --- Please select a CRAN mirror for use in this session --- Installing package(s) into ‘/home/amathew/R/i686-pc-linux-gnu-library/2.13’ (as ‘lib’ is unspecified) trying URL ' http://streaming.stat.iastate.edu/CRAN/src/contrib/XML_3.4-0.tar.gz' Content type 'application/x-gzip' length 896195 bytes (875 Kb) opened URL == downloaded 875 Kb * installing *source* package ‘XML’ ... checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -E No ability to remove finalizers on externalptr objects in this verison of R checking for sed... /bin/sed checking for pkg-config... /usr/bin/pkg-config checking for xml2-config... no Cannot find xml2-config ERROR: configuration failed for package ‘XML’ * removing ‘/home/amathew/R/i686-pc-linux-gnu-library/2.13/XML’ [ my text, hotmail won't highlight original ] you probably need to get libxml2 and then you should be able to run xml2-config from command line to verfiy it is there. This is a specific pkg-config ( man pkg-config for details ). For some non-R things yum or apt-get has not gotten the most recent versions but generally your package manager should be just fine. The downloaded packages are in ‘/tmp/Rtmp2V4huR/downloaded_packages’ Warning message: In install.packages() : installation of package 'XML' had non-zero exit status Thank You, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model approximaions (was comparing SAS and R survival....)
From: thern...@mayo.edu To: abouesl...@gmail.com Date: Fri, 22 Jul 2011 07:04:15 -0500 CC: r-help@r-project.org Subject: Re: [R] Cox model approximaions (was comparing SAS and R survival) For time scale that are truly discrete Cox proposed the exact partial likelihood. I call that the exact method and SAS calls it the discrete method. What we compute is precisely the same, however they use a clever algorithm which is faster. To make things even more confusing, Prentice introduced an exact marginal likelihood which is not implemented in R, but which SAS calls the exact method. Data is usually not truly discrete, however. More often ties are the result of imprecise measurement or grouping. The Efron approximation assumes that the data are actually continuous but we see ties because of this; it also introduces an approximation at one point in the calculation which greatly speeds up the computation; numerically the approximation is very good. In spite of the irrational love that our profession has for anything branded with the word exact, I currently see no reason to ever use that particular computation in a Cox model. I'm not quite ready to remove the option from coxph, but certainly am not going to devote any effort toward improving that part of the code. The Breslow approximation is less accurate, but is the easiest to program and therefore was the only method in early Cox model programs; it persists as the default in many software packages because of history. Truth be told, unless the number of tied deaths is quite large the difference in results between it and the Efron approx will be trivial. The worst approximation, and the one that can sometimes give seriously strange results, is to artificially remove ties from the data set by adding a random value to each subject's time. Care to elaborate on this at all? First of course I would agree that doing anything to the data, or making up data, and then handing it to an analysis tool that doesn't know you maniplated it can be a problem ( often called interpolation or something with a legitimate name LOL). However, it is not unreasonable to do a sensitivity analysis by adding noise and checking the results. Presumaably adding noise to remove things the algorighm doesn't happen to like would work but you would need to take many samples and examine stats of how your broke the ties. Now if the model is bad to begin with or the data is so coarsely binned that you can't get much out of it then ok. I guess in this case, having not thought about it too much, ties would be most common either with lots of data or if hazards spiked over time scales simlar to your measurement precision or if the measurement resolution is not comparable to hazard rate. In the latter 2 cases of course the approach is probably quite limited . Consider turning exponential curves into step functions for example. Terry T --- begin quote -- I didn't know precisely the specifities of each approximation method. I thus came back to section 3.3 of Therneau and Grambsch, Extending the Cox Model. I think I now see things more clearly. If I have understood correctly, both discrete option and exact functions assume true discrete event times in a model approximating the Cox model. Cox partial likelihood cannot be exactly maximized, or even written, when there are some ties, am I right ? In my sample, many of the ties (those whithin a single observation of the process) are due to the fact that continuous event times are grouped into intervals. So I think the logistic approximation may not be the best for my problem despite the estimate on my real data set (shown on my previous post) do give [[elided Hotmail spam]] I was thinking about distributing the events uniformly in each interval. What do you think about this option ? Can I expect a better approximation than directly applying Breslow or Efron method directly with the grouped event data ? Finally, it becomes a model problem more than a computationnal or algorithmic one I guess. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different result of multiple regression in R and SPSS
From: dwinsem...@comcast.net To: seoulseoulse...@gmail.com Date: Tue, 19 Jul 2011 18:45:47 -0400 CC: r-help@r-project.org Subject: Re: [R] Different result of multiple regression in R and SPSS On Jul 19, 2011, at 6:29 PM, J. wrote: Thanks for the answer. However, I am still curious about which result I should use? The result from R or the one from SPSS? It is becoming apparent that you do not know how to use the results from either system. The progress of science would be safer if you get some advice from a person that knows what they are doing. Why the results from two programs are different? Different parametrizations. If I had to guess I would bet that the gender coefficient is R is exactly twice that of the one from SPSS. They are probably both correct in the context of their respective codings. I guess I would also suggest, again, run some samples with known data sets and see what you get(RSSWKDSASWYG). You would want to do this anyway if you want to insure your real data is being used reasonably. You still need to have some way to check your opinion from the expert mentioned above and known data will help there too. A factor of 2 often shows up from just looking at pictures once you have some intuition. I've often been wrong on intuition, but chasing it down and proving it helps you learn a lot :) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow optim(): solved
Date: Thu, 14 Jul 2011 12:44:18 -0800 From: toshihide.hamaz...@alaska.gov To: r-h...@stat.math.ethz.ch Subject: Re: [R] Very slow optim(): solved After Googling and trial and errors, the major cause of optimization was not functions, but data setting. Originally, I was using data.frame for likelihood calculation. Then, I changed data.frame to vector and matrix for the same likelihood calculation. Now convergence takes ~ 14 sec instead of 25 min. Certainly, I didn't know this simple change makes huge computational difference. Thanks, can you pass along any additional details like google links you found or comment on the resulting limitation( were you CPU limited converting data formats or did this cause memory problems leading to VM thrashing?)? I've often had c++ code that turns out to be IO limited when I expected I was doing real complicated computations, it never hurts to go beyond the usual suspects LOL. Toshihide Hamachan Hamazaki, 濱崎俊秀PhD Alaska Department of Fish and Game: アラスカ州漁業野生動物課 Diivision of Commercial Fisheries: 商業漁業部 333 Raspberry Rd. Anchorage, AK 99518 Phone: (907)267-2158 Cell: (907)440-9934 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ben Bolker Sent: Wednesday, July 13, 2011 12:21 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] Very slow optim() Hamazaki, Hamachan (DFG toshihide.hamazaki at alaska.gov writes: Dear list, I am using optim() function to MLE ~55 parameters, but it is very slow to converge (~ 25 min), whereas I can do the same in ~1 sec. using ADMB, and ~10 sec using MS EXCEL Solver. Are there any tricks to speed up? Are there better optimization functions? There's absolutely no way to tell without knowing more about your code. You might try method=CG: Method ‘CG’ is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak-Ribiere or Beale-Sorenson updates). Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems. If ADMB works better, why not use it? You can use the R2admb package (on R forge) to wrap your ADMB calls in R code, if you prefer that workflow. Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow optim()
Hamazaki, Hamachan (DFG toshihide.hamazaki at alaska.gov writes: Dear list, I am using optim() function to MLE ~55 parameters, but it is very slow to converge (~ 25 min), whereas I can do the same in ~1 sec. using ADMB, and ~10 sec using MS EXCEL Solver. Are there any tricks to speed up? Are there better optimization functions? There's absolutely no way to tell without knowing more about your code. You might try method=CG: I guess the first thing to do is look at task manager and see if it is memory or CPU limited. In the absence of gross algorithmic differences, the other stuff that you may be doing could be a big deal. I saw similar performance issues in my own c++ code where the fast result was obtained just by sorting the input data to stop VM thrashing. A sort on large data is noramlly considered slow and wastful if later algorithm doesn't require that but memory coherence can be a big deal. Method ‘CG’ is a conjugate gradients method based on that by Fletcher and Reeves (1964) (but with the option of Polak-Ribiere or Beale-Sorenson updates). Conjugate gradient methods will generally be more fragile than the BFGS method, but as they do not store a matrix they may be successful in much larger optimization problems. If ADMB works better, why not use it? You can use the R2admb package (on R forge) to wrap your ADMB calls in R code, if you prefer that workflow. Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using t tests
( after getting confirmation of lack of posts try again, LOL ) From: marchy...@hotmail.com To: r-help@r-project.org Subject: RE: [R] Using t tests Date: Sun, 10 Jul 2011 10:13:51 -0400 ( sorry if this is a repost but I meant to post to list and never received any indication it was sent to the list, thanks asking for comments about approach to data analysis). From: marchy...@hotmail.com To: j...@bitwrit.com.au; gwanme...@aol.com CC: r-help@r-project.org Subject: RE: [R] Using t tests Date: Sun, 10 Jul 2011 07:35:32 -0400 Date: Sat, 9 Jul 2011 18:40:43 +1000 From: j...@bitwrit.com.au To: gwanme...@aol.com CC: r-help@r-project.org Subject: Re: [R] Using t tests On 07/08/2011 07:22 PM, gwanme...@aol.com wrote: Dear Sir, I am doing some work on a population of patients. About half of them are admitted into hospital with albumin levels less than 33. The other half have albumin levels greater than 33, so I stratify them into 2 groups, x and y respectively. I suspect that the average length of stay in hospital for the group of patients (x) with albumin levels less than 33 is greater than those with albumin levels greater than 33 (y). What command function do I use (assuming that I will be using the chi square test) to show that the length of stay in hospital of those in group x is statistically significantly different from those in group y? Hi Ivo, Just to make things even more complicated for you, Mark's suggestion that the length_of_stay measure is unlikely to be normally distributed might lead you to look into a non-parametric test like the Wilcoxon (aka ( please correct any of the following which is wrong, but note that the discusion is more interesting and useful with details of your goals ) I'm curious why people still jump to setting arbitrary cutoff points, in this case based on what you happen to have sampled, rather than try to find a functional relationship between the two parametric variables? Generally the thing that separates likely cause from noise is smotthness or something you can at least rationalize in terms of physical mechanisms. If your question relates to the reprodiciblity of a given result ( well this experiment showed hi and low were significantly different on hospital stays, maybe the next experiement will show the same ) you'd probably like to consider the data in relation to possible causes. I'd not sure your disease process would know about your median test results when patients walk in. BTW, what is terminating the hospital stay, cure death or insurance exhaustion? This sounds like you are just trying to reproduce something that is already in the literature:cutoff is on the low side of normal and often hypoprotein is suspected of being bad, that the higher group would be usually expected to do better no? Although I suppose this could have something to do with dehydration etc but the point of course is that data interpretation is difficult to do in a vacuum. Mann-Whitney in your case) test. You will have to split your length_of_stay measure into two like this (assume your data frame is named losdf): albumin_hilo - albumin 33 wilcox.test(losdf$length-of-stay[albumin_hilo], losdf$length_of_stay[!albumin_hilo]) or if you use wilcox_test in the coin package: albumin_hilo - albumin 33 wilcox_test(length_of_stay~albumin_hilo,losdf) Do remember that the chi-square test is used for categorical variables, for instance if you dichotomized your length_of_stay into less than 10 days or 10 days and over. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem loading rgdal with Rapache, problem solved due to libexpat with apache.
Well, I'm not sure it is really an rgdal problem as much as having a stale so and I didn;t want to go wreck the thing to capture the exact error message, the point was that apache build seems to have come with incompatible expat so. In the apache/lib dir, I apparently had a much different libs, now named xxx, than that which was in /lib64. If I dump the symbols with objdump the one rgdal complained about, IIRC XML_StopParser does not appear in either output but they do grep differently as shown below ( -rw-r--r-- 1 root root 605314 May 1 07:09 libexpat.a -rwxr-xr-x 1 root root 805 May 1 07:09 libexpat.la lrwxrwxrwx 1 root root 17 May 1 07:09 libexpat.so - libexpat.so.0.5.0 lrwxrwxrwx 1 root root 17 May 1 07:09 libexpat.so.0 - libexpat.so.0.5.0 -rwxr-xr-x 1 root root 143144 Jul 6 16:29 libexpat.so.0.5.0 -rwxr-xr-x 1 root root 398422 Jul 6 16:27 libexpat.soxxx.0.5.0 [marchywka@351915-www1 lib]$ grep XML_StopParser libexpat* Binary file libexpat.so matches Binary file libexpat.so.0 matches Binary file libexpat.so.0.5.0 matches [marchywka@351915-www1 lib]$ libexpat.so.0.5.0: linux-vdso.so.1 = (0x2b4a1fdd2000) libc.so.6 = /lib64/libc.so.6 (0x0035aca0) /lib64/ld-linux-x86-64.so.2 (0x0035ac60) libexpat.soxxx.0.5.0: linux-vdso.so.1 = (0x7fff8bdfc000) libc.so.6 = /lib64/libc.so.6 (0x2b143b19e000) /lib64/ld-linux-x86-64.so.2 (0x0035ac60) To: r-help@r-project.org Subject: Re: [R] problem loading rgdal with Rapache, problem solved due to libexpat with apache. You should provide at least the output of sessionInfo() and the messages issued by rgdal on load. rgdal has recently been updated to try to address issues of interference between applications, and it would help very much to know your intentions, platform, and the versions of the software you are using. Roger Mike Marchywka wrote: Has anyone had problems with Rapache that don't show up on command line execution of R? I just ran into this loading rgdal in Rapache page and having a problem with loading shared object. The final complaint was that Stop_XMLParser was undefined- this was surprising since grep -l showed it in expat lib and it worked fine from command line. Finally, I noted the search path had apache/lib ahead of the other places where expat exists. The libexpat there although apparently having same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one into the other fixed the problem and the page works fine but curious is anyone has thoughts on what could have caused this. Sorry I don't have specific output but i thought you may remember if you ran into it and it is not worth trying to replicate. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Roger Bivand Department of Economics NHH Norwegian School of Economics Helleveien 30 N-5045 Bergen, Norway -- View this message in context: http://r.789695.n4.nabble.com/problem-loading-rgdal-with-Rapache-problem-solved-due-to-libexpat-with-apache-tp3650119p3652639.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] superimposing network graphs
Date: Wed, 6 Jul 2011 13:10:14 +0200 To: r-help@r-project.org CC: but...@uci.edu Subject: [R] superimposing network graphs Dear all, I have a undirected network (g), representing all the sexual relationships that ever existed in a model community. I also have a directed edgelist (e) which is a subset of the edgelist of g. e represents the transmission pathway of HIV. Now I would like to superimpose the picture of the sexual relationships with arrows in a different colour, to indicate where in the network HIV was transmitted. If you can't find an R answer, I've found that dot from graphviz is really helpful. It should not be too hard to generate the source code from your raw data file or R but others may have a more R answer. Package sna may be helpful for an R-only approach. Any ideas on how to do this? Many thanks, Wim Wim Delva MD, PhD International Centre for Reproductive Health Ghent University, Belgium www.icrh.org South African Centre for Epidemiological Modelling and Analysis Stellenbosch University, South Africa www.sacema.com epi update: www. sacemaquarterly.com Tel: +27 21 808 27 79 (work) Cell: +27 72 842 82 33 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem loading rgdal with Rapache, problem solved due to libexpat with apache.
Has anyone had problems with Rapache that don't show up on command line execution of R? I just ran into this loading rgdal in Rapache page and having a problem with loading shared object. The final complaint was that Stop_XMLParser was undefined- this was surprising since grep -l showed it in expat lib and it worked fine from command line. Finally, I noted the search path had apache/lib ahead of the other places where expat exists. The libexpat there although apparently having same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one into the other fixed the problem and the page works fine but curious is anyone has thoughts on what could have caused this. Sorry I don't have specific output but i thought you may remember if you ran into it and it is not worth trying to replicate. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] wavelets
From: jdnew...@dcn.davis.ca.us Date: Mon, 4 Jul 2011 00:45:41 -0700 To: tyagi...@gmail.com; r-help@r-project.org Subject: Re: [R] wavelets Study the topic more carefully, I suppose. My understanding is that wavelets do not in themselves compress anything, but because they sort out the interesting data from the uninteresting data, it can be easy to toss the uninteresting data (lossy data compression). Perhaps you should understand better what your Matlab library is doing. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. user123 tyagi...@gmail.com wrote: I'm new to the topic of wavelets. When I tried to use the mra function in the wavelets package, the data is not getting compressed. eg. if the original data has 500 values , the output data also has the same. However in MATLAB, depending on the level of decompositon, the data gets compressed. How do I implement this in R? can you post some code? You can always compress into one value of course by turning bytes into a single char string, what you want is entropy. I posted some example code before and I remember it took effort to not get the subsampling. mra is probably multi-resolution analysis and I'd suppose you want all the samples. You probably need paper and pencil however at this point. -- View this message in context: http://r.789695.n4.nabble.com/wavelets-tp3642973p3642973.html Sent from the R help mailing list archive at Nabble.com. _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Protecting R code
Put it on rapache or otherwise server but this seems like a waste depending on what you are doing Server side is only good way but making c++ may be interesting test Sent from my Verizon Wireless BlackBerry -Original Message- From: Vaishali Sadaphal vaishali.sadap...@tcs.com Date: Mon, 4 Jul 2011 16:48:13 To: spencer.gra...@prodsyse.com Cc: r-help@r-project.org; b.rowling...@lancaster.ac.uk Subject: Re: [R] Protecting R code Hey All, Thank you so much for quick replies. Looks like translation to C/C++ is the only robust option. Do you think there exists any ready-made R to C translator? Thanks -- Vaishali Vaishali Paithankar Sadaphal Tata Consultancy Services Mailto: vaishali.sadap...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Outsourcing From: Spencer Graves spencer.gra...@prodsyse.com To: Barry Rowlingson b.rowling...@lancaster.ac.uk Cc: Vaishali Sadaphal vaishali.sadap...@tcs.com, r-help@r-project.org Date: 07/04/2011 08:42 PM Subject: Re: [R] Protecting R code Hello: On 7/4/2011 7:41 AM, Barry Rowlingson wrote: On Mon, Jul 4, 2011 at 8:47 AM, Vaishali Sadaphal vaishali.sadap...@tcs.com wrote: Hi All, I need to give my R code to my client to use. I would like to protect the logic/algorithms that have been coded in R. This means that I would not like anyone to be able to read the code. At some point the R code has to be run. Which means it has to be read by an interpreter that can handle R code. Which means, unless you rewrite the interpreter, the R code must exist as such. Even if you could compile R into C code into machine code and distribute a .exe file, its still possible in theory to reverse-engineer it and get something like the original back - the original logic if not the original names of the variables and functions. You could rewrite the interpreter to only run encrypted, signed code that requires a decryption key, but you still have to give the user the decryption key at some point in order to get the plaintext code. Again, its an obfuscation problem of hiding the key somewhere, and hence is going to fail. It all depends on how much expense you want to go to in order to make the expense of circumventing your solution more than its worth. Tell me how much that is, and I will tell you the solution. For total security[1], you need to run the code on servers YOU control, and only give access via a network API. You can do this with RServe or any of the HTTP-based systems like Rapache. An organization I know that encrypted R code started with making it available only on their servers. This was maybe four years ago. I'm not sure what they do now, but I think they have since lost their major proponents of R internally and have probably translated all the code they wanted to sell into a compiled language in a way that didn't require R at all. Spencer Barry [1] Except of course servers can be hacked or socially-engineered into. For total security, disconnect your machine from the network and from any power supply. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to fit ARMA model
From: bbol...@gmail.com Date: Fri, 1 Jul 2011 13:16:45 + Subject: Re: [R] How to fit ARMA model UnitRoot akhussanov at gmail.com writes: Hello, I am having some problems with fitting an ARMA model to my time series data (randomly generated numbers). The thing is I have tried many packages [tseries, fseries, FitARMA etc.] and all of them giving very different results. I would appreciate if someone could post here what the best package is for my purpose. Also, after having done the fitting, I would like to check for the model's adequacy. How can I do this? Thanks. It's hard to say without more detail -- we don't know what your purpose is (beyond the general one of fitting an ARMA model to data). I would say that when in doubt, if there is functionality in 'core' R -- the base and recommended packages -- that it is likely to be the most stable and well tested. (Not always true -- there are some very good contributed packages, and sometimes the functions in R are missing some advanced features -- but a good rule of thumb.) So try ?arima. Another rule of thumb is that Venables and Ripley 2002 is a good starting point (although again not necessarily as extensive as specialized topics) for how do I do xxx in R? See Chapter 14. I had to check my memory but IIRC wiki or other non-authoritative source suggested that this is not much different than trying to design or analyze an IIR (infinte impulse response) filter that is presumed excited with white noise. In such a case, the power spectrum could be interpretted in terms of zero and pole ( zeroes of denomintator) of the transfer function giving you some graphical info to help develop intuition. It is not hard to make your own tests and sanity checks if you are just getting started. ( hand typed as my 'dohs machine is busy running malware scans and this one R install has a few issues ) x-runif(8192) x1-c(x[-1],x[1]) x2-c(x1[-1],x1[1]) y=x+x1+x2 plot(log(abs(fft(y makes bode plot ( if I have the name right) with zeroes at the root. Extension to IIR case should be obvious. As suggested in the help for ?arima, ?tsdiag is a generic function to plot time series diagnostics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running R from windows command prompt
To: lig...@statistik.tu-dortmund.de CC: r-help@r-project.org Subject: Re: [R] Running R from windows command prompt Thanks for your help. I tried the way you mentioned for my first question. But I am not getting any results. Can you please explain in detail the process through which I can run a R code from windows command prompt. While your problem likely has nothing to do with windoh's, I would suggest you go get cygwin ( see google) and use that. I and probably others have lots of scripts that work there and on linux. Bash scripts are a better way to proceed if you want to do more than a test case and they integrate with lots of other existing things. 2011/6/28 Uwe Ligges lig...@statistik.tu-dortmund.de On 28.06.2011 11:54, siddharth arun wrote: 1. I have a R program in a file say functions.R. I load the functions.R file the R using source(function.R) and then call functionsf1(), f2() etc. which are declared and defined within function.R file. I also need to load a couple of R libraries using library() before I can use f1(), f2() etc. My question is can I acheive all this (i.e. calling function f1() and f2()) from the windows prompt without opening the R environment ? If yes then how? Put all the code into a file, e.g. foo.R, and run R CMD BATCH foo.R from the windows command shell. 2. Also, Is there any way to scan strings directly. Like scan() function only scans numerical values. Is there any way to scan strings? Yes, it is called scan() which is for arbitrary data, including character(). See ?scan. Otherwise, you may want to look into ?readLines Uwe Ligges -- Siddharth Arun, 4th Year Undergraduate student Industrial Engineering and Management, IIT Kharagpur [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time-series analysis with treatment effects - statistical approach
Date: Thu, 23 Jun 2011 15:41:25 -0700 From: jmo...@student.canterbury.ac.nz To: r-help@r-project.org Subject: Re: [R] Time-series analysis with treatment effects - statistical approach Mike Marchywka wrote: I discovered a way to do repetitive tasks that can be concisely specified using something called a computer. Now that's funny :) well, there is a point to that and that is that with cheap computations you can do different analyses than you did in the past. There were not controlled tests. It was a field experiment testing the effects that various pavement designs have on underlying soil moisture. Two designs incorporated a porous pavement surface course, while two others were based on standard impervious concrete pavement...the control was just bare, exposed soil. As you can see from the graph, the control responds quickly to rainfall events, but dries out quickly as well due to evaporation. The porous pavement allows for quick infiltration of precipitation, while the impervious pavement eventually allows infiltration of rainfall, but it's delayed. My objective is to be able to differentiate between the pavement treatments, such that I can state with statistical confidence that porous pavements affects underlying soil moisture differently than impervious pavements. I think this is obvious just looking at it, but I wanted to be able to back it up with stats. What I'd done previously is to average by week. But as I mentioned, I thought that an anova table with 104 rows relating to each week was a poor way of analyzing the data. But that being said, it effectively allows me to check for treatment-related differences. I don't think we've mentioned R in the past few posts but I guess pointing people to useful things that R can do is not too big a problem and if you have ever dealt with analysis for the sake of rationalization you can appreciation that is a huge problem :) Generally you'd like to have reproducible results and if you don't have IID ( stationary parameters of the population you wish to characterize) you are not even asking a good question about the system. It may be helpful as quick check of something but otherwise difficult to interpret- do your results mean these things are different in the desert? You appear to have data points from a bunch of different situations. After the fact selection is often helpful but generally stats people frown on that as backing up anything ( unless it supports sponsor's opinion LOL). http://www.itl.nist.gov/div898/handbook/prc/section4/prc432.htm I guess I'd either go with dynamic model or convert into dollars and then see if you have clinically and statistically significant differences in things of relevance. Thanks for the suggestions to date. Maybe the more I explain what I'm trying to achieve, the more focussed the suggestions will be. The vaguer the question, the broader the response, right? Thanks again, Justin -- View this message in context: http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3621179.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Access R functions from web
From: orvaq...@gmail.com To: r-help@r-project.org Subject: [R] Access R functions from web I need a way to send R objects and call R functions from web. Is there any project close or similar to that? I want to be able to send an HTTP rquest from an existing application with some data. And obtain a plot from R. This should be a faq but it can take a while to find, see Rserve and Rapache. I have been using Rapache now on red hat and debian and it works nicely. Also, the goog visualization API works in some limited testing but apparently a lot of that requires flash ( which I either did not have or had turned off LOL). Thanks in advance Caveman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R a solution to my problem?
From: jdnew...@dcn.davis.ca.us Date: Fri, 24 Jun 2011 02:13:29 -0700 To: ikoro...@gmail.com; R-help@r-project.org Subject: Re: [R] Is R a solution to my problem? Almost any tool can be bent to perform almost any task, but that combination sounds like a particularly difficult way to get started. There are a variety of purpose-built software development environments for serving dynamically-constructed web pages, including C++ libraries. I guess Rapache and Rserve would be worth looking at if you think it is a match with R. However, discussions of such topics do not belong on this mailing list. Give Google a chance. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Igor Korot ikoro...@gmail.com wrote: Hi, ALL, My name is Igor and I am a Software Developer. Recently I wrote a program for a company using C++. The problem I am facing right now is that this program (demo) has to run using web interface. I.e. the potential user goes to the website, click on the link and the program executes somewhere on the server. Would that be possible with the R extension / add-on? Thank you. _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help
I'm surprised the pdf made it through mail list but many will not open these and I have just cleaned malware off my 'dohs system. Can you explain what you want to do in simple text? Subject: [R] R-help Hi Please assist me to code the attached pdf in R. Your help will be greatly appreciated. Edward Actuarial Science Student __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time-series analysis with treatment effects - statistical approach
From: rvarad...@jhmi.edu To: marchy...@hotmail.com; jmo...@student.canterbury.ac.nz; r-help@r-project.org Subject: RE: [R] Time-series analysis with treatment effects - statistical approach Date: Thu, 23 Jun 2011 02:59:19 + If you have any specific features of the time series of soil moisture, you could either model that or directly estimate it and test for differences in the 4 treatments. If you do not have any such specific considerations, you might want to consider some nonparametric approaches such as functional data analysis, in particular functional principal components analysis (fPCA) might be relevant. You could also consider semiparametric methods. For example, take a look at the SemiPar package. Ravi. I guess just playing with it while waiting for other code to finish, I'd be curious if you had any controlled tests such as impulse response- what did treatment do when you held at constant temp and humidity and illumination in stll air after single burst of rain? If you were pursing the model approach, quick look suggests qualitatitve rather than just quantitative effects - in one case looks like linear or biphasic dry out dynamics, others seem to just fall off of cliff. Objective of course matters too, if you are trying to sell this to farmers, maybe a plot of moisture for each treatment against control would help. I just did that after averaging over sensors and it may be a reasonable analysis for cost effectiveness if you can translate moisture into dollars. Now you would still need to put error bars on comparisons and use words carefully etc but that approach may be more important than getting at dynamics. I dunno. Consider that in fact maybe all you care about is peaks, if too dry for one day kills the crop then that is what you want to focus the analysis on etc etc etc. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Mike Marchywka [marchy...@hotmail.com] Sent: Wednesday, June 22, 2011 9:31 PM To: jmo...@student.canterbury.ac.nz; r-help@r-project.org Subject: Re: [R] Time-series analysis with treatment effects - statistical approach Date: Wed, 22 Jun 2011 17:21:52 -0700 From: jmo...@student.canterbury.ac.nz To: r-help@r-project.org Subject: Re: [R] Time-series analysis with treatment effects - statistical approach Hi Mike, here's a sample of my data so that you get an idea what I'm working with. Thanks, data helps make statements easier to test :) I'm quite busy at moment but I will try to look during dead time. http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt SampleDataSet.txt Also, I've uploaded an image showing a sample graph of daily soil moisture by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments. Also, I've included precipitation to show the soil moisture response to precip. Personally I'd try to write a simple physical model or two and see which one(s) fit best. It shouldn't be too hard to find sources and sinks of water and write a differential equation with a few parameters. There are probably online lecture notes that cover this or related examples. You probably suspect a mode of action for the treatments, see if that is consistent with observed dyanmics. You may need to go get temperature and cloud data but it may or may not be worth it. http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg I have used ANOVA previously, but I don't like it for 2 reasons. The first is that I have to average away all of the interesting variation. But mainly, There are a number of assumptions that go into that to make it useful. If you are just drawing samples from populations of identical independent things great but here I would look at things related to non-stationary statistics of time series. it becomes quite cumbersome to do a separate ANOVA for each day (700+ days) or even each week (104 weeks). I discovered a way to do repetitive tasks that can be concisely specified using something called a computer. Writing loops is pretty easy, don't give up due to cumbersomeness. Also, you could try a few simple things like plotting difference charts ( plot treatment minus control for example). If you approach this purely empirically, there are time series packages and maybe the econ/quant financial analysts would have some thoughts that wouldn't be well known in your field. Thanks for your help, -Justin - - - - - - Mike Marchywka | V.P. Technology 415-264-8477 marchy...@phluant.com Online Advertising and Analytics for Mobile http://www.phluant.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
Re: [R] Time-series analysis with treatment effects - statistical approach
Subject: [R] Time-series analysis with treatment effects - statistical approach Hello all R listers, I'm struggling to select an appropriate statistical method for my data set. I have collected soil moisture measurements every hour for 2 years. There are 75 sensors taking these automated measurements, spread evenly across 4 treatments and a control. I'm not interested in being able to predict soil future soil moisture trends, but rather in knowing whether the treatment affected soil moisture response overall. In particular, it would be interesting to inspect treatment related response within defined periods. For example, a visual inspection of my data suggests that soil moisture is equivalent across treatments during wet winter months, but during the dry summer months, a treatment effect appears. Any help on this topic would be very appreciated. I've looked far and wide through academic literature for similar experimental designs, but have not had any success as yet. Can you post your data? This makes it more interesting for potential responders and makes it more likely you get a relevant answer or observation. I guess the stats people would just suggest something like anova and you could just compare moisture means among the treatments and controls but presumably things are not stationary and it may be easy to get confusing results. http://en.wikipedia.org/wiki/Analysis_of_variance Usually there is some analysis plan made up a priori I would imagine or at least there is prior literature in your field. I guess citeseer or google scholar/books may be useful if you have some key words or author names. The data you have sounds to an engineer just like any other discrete time signal and maybe digital signal processing literature would be of interest. Presumably you have some models to test, and more details depend on the specifics of your competing hypotheses. Kalman filtering maybe? Cheers, Justin Dr. Justin Morgenroth New Zealand School of Forestry Christchurch, New Zealand -- View this message in context: http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3615856.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time-series analysis with treatment effects - statistical approach
Date: Wed, 22 Jun 2011 17:21:52 -0700 From: jmo...@student.canterbury.ac.nz To: r-help@r-project.org Subject: Re: [R] Time-series analysis with treatment effects - statistical approach Hi Mike, here's a sample of my data so that you get an idea what I'm working with. Thanks, data helps make statements easier to test :) I'm quite busy at moment but I will try to look during dead time. http://r.789695.n4.nabble.com/file/n3618615/SampleDataSet.txt SampleDataSet.txt Also, I've uploaded an image showing a sample graph of daily soil moisture by treatment. The legend shows IP, IP+, PP, PP+ which are the 4 treatments. Also, I've included precipitation to show the soil moisture response to precip. Personally I'd try to write a simple physical model or two and see which one(s) fit best. It shouldn't be too hard to find sources and sinks of water and write a differential equation with a few parameters. There are probably online lecture notes that cover this or related examples. You probably suspect a mode of action for the treatments, see if that is consistent with observed dyanmics. You may need to go get temperature and cloud data but it may or may not be worth it. http://r.789695.n4.nabble.com/file/n3618615/MeanWaterPrecipColour2ndSeasonOnly.jpeg I have used ANOVA previously, but I don't like it for 2 reasons. The first is that I have to average away all of the interesting variation. But mainly, There are a number of assumptions that go into that to make it useful. If you are just drawing samples from populations of identical independent things great but here I would look at things related to non-stationary statistics of time series. it becomes quite cumbersome to do a separate ANOVA for each day (700+ days) or even each week (104 weeks). I discovered a way to do repetitive tasks that can be concisely specified using something called a computer. Writing loops is pretty easy, don't give up due to cumbersomeness. Also, you could try a few simple things like plotting difference charts ( plot treatment minus control for example). If you approach this purely empirically, there are time series packages and maybe the econ/quant financial analysts would have some thoughts that wouldn't be well known in your field. Thanks for your help, -Justin -- View this message in context: http://r.789695.n4.nabble.com/Time-series-analysis-with-treatment-effects-statistical-approach-tp3615856p3618615.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different results from nls in 2.10.1 and 2.11.1
Date: Mon, 20 Jun 2011 11:25:54 +0200 From: lig...@statistik.tu-dortmund.de To: p.e.b...@dunelm.org.uk CC: r-help@r-project.org; pb...@astro.uni-bonn.de Subject: Re: [R] different results from nls in 2.10.1 and 2.11.1 Since one is a 32-bit and the other one a 64-bit, and therefore the compiler is also different, you can get different numerical results easily, even with identical versions of R (and most of us do not have outdated R installations around). I just tried your example on 3 different systems with R-2.13.0 and all told me singular convergence... Uwe Ligges On 18.06.2011 15:44, Philip Bett wrote: Hi, I've noticed I get different results fitting a function to some data on my laptop to when I do it on my computer at work. I guess the best all around approach is to dump the results from your FisherAvgPdf function and get some idea what trajectory it takes in the different cases. This is presumably not just an issue with R and could have something to tell you about your data vs the model. Even if it converged without incident, you'd probably want to look at more details. You apparently are trying to fit a histogram to a pdf and it is not too hard to just plot fit over histo and spot check parameter space. Consider for example, plot(h$mids,h$density,type=b) points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.198),pch=20) points(h$mids,FisherAvgdPdf(acos(h$mids),3.527e-8,3.298),pch=13) points(h$mids,FisherAvgdPdf(acos(h$mids),4.527e-8,3.198),pch=14) and the find various measures for how good/bad these are etc. e-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.198) sum(e*e) e-h$density-FisherAvgdPdf(acos(h$mids),3.527e-8,3.298) sum(e*e) the error surface as function of parameters seems smooth etc, foo - function (x,y) { e-h$density-FisherAvgdPdf(acos(h$mids),x,y) ; sum(e*e) } x=(2+(1:100)*.02)/1e8; y=3+((1:100)/100); df-data.frame(x=0,y=0,z=0); for ( i in x ) { for ( j in y ) { gg-data.frame(i,j,foo(i,j));str(gg); colnames(gg)-colnames(df); df-rbind(df,gg); }} sel=which(df$x!=0) scatterplot3d(df$x[sel],df$y[sel],df$z[sel]) Here's a code snippet of what I do: ##-- require(circular) ## for Bessel function I.0 ## Data: dd - c(0.9975948929787, 0.9093316197395, 0.7838819026947, 0.9096108675003, 0.8901804089546, 0.2995955049992, 0.9461286067963, 0.8248071670532, 0.2442084848881, 0.2836948633194, 0.7353935241699, 0.5812761187553, 0.8705610632896, 0.8744471669197, 0.7490273118019, 0.9947383403778, 0.9154829382896, 0.8659985661507, 0.6448246836662, 0.8588128685951, 0.7347437739372, -0.1645197421312, 0.970999121666, 0.8038327097893, 0.9558997154236, 0.6846113204956, 0.6286814808846, 0.9201356172562, 0.9422197341919, 0.3470877110958, 0.4154576957226, 0.0721184238791, 0.14151956141, -0.6142936348915, -0.4688512086868, 0.6805665493011, 0.3594025671482, 0.8991097211838, 0.7656877636909, 0.9282909035683, 0.9454715847969, 0.9766132831573, 0.4316343963146, 0.62679708004, 0.2093886137009, 0.3937581181526, 0.4254160523415, 0.8684504628181, 0.3844584524632, 0.9578431844711, 0.956972181797, 0.4456568360329, 0.9793710708618, 0.5825698971748, 0.929228246212, 0.9211971759796, 0.9407976865768, 0.821156680584, 0.2048042863607, 0.6473184227943, 0.9456319212914, 0.7021154165268, 0.9761978387833, 0.1485801786184, 0.2195029109716, 0.5378785729408, 0.8304615020752, 0.8596342802048, 0.950027525425, 0.9102076888084, 0.5108731985092, 0.7200184464455, 0.3571084141731, 0.9765330553055, -0.143017962575, 0.8576183915138, 0.1283493340015, -0.3226098418236, 0.7031792402267, 0.8708582520485, 0.56754809618, 0.060470353812, 0.8015220761299, 0.7363410592079, 0.671902179718, 0.8082517385483, 0.9468197822571, 0.9729647636414, 0.7919752597809, 0.9539568424225, 0.4840737581253, 0.850653231144, 0.5909016132355, 0.8414449691772, 0.9699150323868) xlims - c(-1,1) bw - 0.05 b - seq(xlims[1],xlims[2],by=bw) ; nb - length(b) h - hist( dd, breaks=b, plot=FALSE) FisherAvgdPdf - function(theta,theta0,kappa){ A - kappa/(2*sinh(kappa)) A * I.0( kappa*sin(theta)*sin(theta0) ) * exp( kappa*cos(theta)*cos(theta0) ) } nls(dens ~ FisherAvgdPdf(theta,theta0,kappa), data = data.frame( theta=acos(h$mids), dens=h$density ), start=c( theta0=0.5, kappa=4.0 ), algorithm=port, lower=c(0,0), upper=c(acos(xlims[1]),500), control=list(warnOnly=TRUE) ) ##-- On one machine, nls converges, and on the other it doesn't. Any ideas why, and which is right? I can't see anything in R News that could be relevant. The different R versions (and computers) are: R.version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 11.1 year 2010 month 05 day 31 svn rev 52157 language R version.string R version 2.11.1
Re: [R] Multivariate HPD credible volume -- is it computable in R?
Date: Sun, 19 Jun 2011 19:06:20 +1200 From: agw1...@gmail.com To: r-help@r-project.org Subject: [R] Multivariate HPD credible volume -- is it computable in R? Hi all, I'm new to the list and am hoping to get some advice. I have a set of multivariate data and would like to find the densest part of the data cloud containing 95% of the data, like a 95% HPD credible volume. Is there any R code available to compute that? It looks like the LaplacesDemon pkg was just updated FWIW, http://www.google.com/search?hl=enq=hpd+credible+volume+cran If youjust want to find the density of your data in some n-dim space, that sounds like multi dim binning or histogram would work (you can check google but IIRC there is no general n-dim binning pacakge. I also mentioned a version based on a density field being associated with each point which could be summed over all points to get density at arbirtary point but you need to implrement that in some intelligent way for it to be fast ). I guess you could bin using aggregate see if this example would work, df-data.frame(z=runif(100), x-runif(100)*10,y-runif(100)*5,c-rep(1,100)) d-aggregate(c~floor(x)+floor(y),df,FUN=sum) d which (d$c==max(d$c)) now at this point you can imagine you may be able to find a surface that encloses a certain amount of your data Thank you very much! Your help and patience are much appreciated. G.S. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Server question
From: oliver.jo...@digred.com To: r-help@r-project.org Date: Fri, 17 Jun 2011 15:35:38 +0100 CC: rebeccahard...@deltaeconomics.com Subject: [R] Server question Hi A client of mine has asked me to investigate the installation of R-software. Could anyone tell me whether the software works only on a client machine or whether it sits on a server with clients attaching to it? Did anyone answer this yet? See Rserve and Rapache on google. Both could be useful to you or get you started. Not immediately clear from the docs. Best Oliver -- Oliver Jones T: 01845 595911 M: 07977 122089 DIG*RED Web Production | http://www.digred.com/ www.digred.com 2 Vyner's Yard Rainton North Yorkshire YO7 3PH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About 'hazard ratio', urgent~~~
Date: Mon, 13 Jun 2011 19:44:15 -0700 From: dr.jz...@gmail.com To: r-help@r-project.org Subject: [R] About 'hazard ratio', urgent~~~ Hi, I am new to R. My question is: how to get the 'hazard ratio' using the 'coxph' function in 'survival' package? You can probably search the docs for hazard terms, for example, http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf and try running known test data through to verify. For example, it does seem that the exp column contains a decent estimate of hazard ratio in simple cases( not quite right 3 sig figs but haven't given this a lot of thought and it is still early here, hoping from input from someone who can explain better) , ?rexp ns=10 df-data.frame(gr-c(rep(0,ns),rep(1,ns)),t=c(rexp(ns,1),rexp(ns,3))) coxph(Surv(t)~gr,df) Call: coxph(formula = Surv(t) ~ gr, data = df) coef exp(coef) se(coef) z p gr 1.09 2.98 0.00503 217 0 Likelihood ratio test=47382 on 1 df, p=0 n= 20, number of events= 2e+05 df-data.frame(gr-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,2))) coxph(Surv(t)~gr,df) Call: coxph(formula = Surv(t) ~ gr, data = df) coef exp(coef) se(coef)z p gr 0.658 1.930.148 4.44 8.8e-06 Likelihood ratio test=19.6 on 1 df, p=9.5e-06 n= 200, number of events= 200 df-data.frame(gr-c(rep(0,100),rep(1,100)),t=c(rexp(100,1),rexp(100,1))) coxph(Surv(t)~gr,df) Call: coxph(formula = Surv(t) ~ gr, data = df) coef exp(coef) se(coef) zp gr -0.0266 0.9740.142 -0.187 0.85 Likelihood ratio test=0.03 on 1 df, p=0.852 n= 200, number of events= 200 thanks, karena -- View this message in context: http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3595527.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need script to create new waypoint
Now I need to change this data set into one with waypoints at regular intervals: for example 2 I guess your question is about algorithm ideas for making up data due to unspecified percrived need. Anything you can specify completely should be easy to code in R, maybe a bash script, or c++ or java, not sure the Excel was intended for arbitrary code. Generally the choice of how you manufacture data would be dictated by the percieved need. If you can elaborate on the need maybe someone can help you pick an approach to making things up. There is a sig list for geo problems that may know your requirement better, https://stat.ethz.ch/mailman/listinfo/r-sig-geo and of course if your need is to replicate prior work in the field, you can contact authors directly for questions about what they did. If your question is about how R can handle time series with pre-packaged things, there is quite a bit notably zoo and ts which I think should come up with ?zoo and ?ts http://www.google.com/search?hl=ensafe=offq=R+ts+regular+time+intervals also try interpolation search. library(zoo) ?zoo ?ts From: anti...@hotmail.com To: r-help@r-project.org Date: Tue, 14 Jun 2011 11:55:57 + Subject: [R] Need script to create new waypoint Dear help-list members, I am a student at Durham University (UK) conducting a PhD on spatial representation in baboons. Currently, I'm analysing the effect of sampling interval on home range calculations. I have followed the baboons for 234 days in the field, each day is represented by about 1000 waypoints (x,y coordinates) recorded at irregular time intervals. Consecutive waypoints in this raw data set have an average distance interval of 5 meters an average time interval of 23 seconds (but when baboons were stationary the time interval could be much larger - e.g. waypoint 7 below).This raw data set needs to become a data set with waypoints at regular intervals and thus, 'new' waypoints have to be 'created'. Eventually, I want to use seven different time intervals: 2, 5, 10, 15, 30, 45 and 60 minute intervals. I have tried in Excel, but I am not managing it. I have some experience with R, and although I can 'read' quite complicated scripts, I am unable to write them, so I would very much appreciate any help anybody would be willing to offer me. My current data set has 9 columns (in csv / excel file): x coordinate, y coordinate, year, month, day, record, time interval (duration between this waypoint and the previous) (hh:dd:ss), summed time intervals, distance interval (m) EXAMPLE (24th of april 2007) (wp1) x1, y1, 2007, 7, 24, 1, 00:00:00, 00:00:00, 0 (wp2) x2, y2, 2007, 7, 24, 2, 00:00:23, 00:00:23, 2 (wp3) x3, y3, 2007, 7, 24, 3, 00:00:50, 00:00:73, 3 (wp4) x4, y4, 2007, 7, 24, 4, 00:01:20, 00:02:33, 5 (wp5) x5, y5, 2007, 7, 24, 5, 00:00:03, 00:02:36, 1 (wp6) x6, y6, 2007, 7, 24, 6, 00:00:12, 00:02:48, 2 (wp7) x7, y7, 2007, 7, 24, 7, 00:05:45, 00:08:33, 2 Now I need to change this data set into one with waypoints at regular intervals: for example 2 minutes = 120 seconds 2 minutes after the first waypoint (x1,y1) the baboons would be somewhere between WP3 and WP4 (at WP3 sum duration is 73 seconds and after WP4 sum duration is 153 seconds), and so this is where I would like a new waypoint created. Note that there are time intervals which will be so large that multiple 'new' waypoints have to be made / copied (e.g. WP 7 for a 2 minute interval). Three ways of calculating the new coordinates for this new waypoint from very precise to not so precise (in order of preference) are: 1) Basing the new waypoint coordinates on the relative 'time distance' to each waypoint of the two surrounding waypoints (therewith assuming a constant movement during the time interval). The 'time distance' bewteen WP3 and WPnew is (120-73) 47 seconds and the 'time distance' between WPnew and WP4 is (153-120) 33 seconds (whereas total time interval between WP3 and WP4 is 80 seconds). WPnew (with coordinates Xnew,Ynew) should then be located at 80/33*100=41.25% from WP3: Xnew = X3 + (X4-X3)*41.25% and Ynew= Y3 + (Y4-Y3)*41.25% 2) Calculate the average location (average of x3,y3 and x4,y4), at which to create a new waypoint at 2 minutes. 3) A simpler alternative is that the location of the 'closer waypoint in time', in this example WP 4, (WP4: 153-120=33 versus WP3: 120-73=47) could be copied as being the new location. I hope I explained my query so that it makes sense to you. I realize I am asking a 'big' question and apologize for this - the software programs I have thorough knowledge of (ArcGIS, excel, Mapsource), are all unable to solve this problem. I would be very grateful for any advice or suggestions. Best wishes, Louise [[alternative HTML version deleted]] __
Re: [R] About 'hazard ratio', urgent~~~
Date: Tue, 14 Jun 2011 09:11:25 -0700 From: dr.jz...@gmail.com To: r-help@r-project.org Subject: Re: [R] About 'hazard ratio', urgent~~~ Thanks a lot for the great help~ well, if you are referring to my post, as I indicated I made a lot of blunders to get that out the door and posted the first thing that made sense. Note also that by making the one hazard rate 1 it wasn't clear that the column I cited was divided by 1 etc. Caveat emptor. I usually just answer questions like this when I meant to look at that anyway and I have a chance of putting forth an answer which I and others can verify. In any case, if you are using some new code it always helps to run known test cases through it, even if it is just new to you LOL. -- View this message in context: http://r.789695.n4.nabble.com/About-hazard-ratio-urgent-tp3595527p3597025.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] smoothScatter function (color density question) and adding a legend
Date: Sat, 11 Jun 2011 21:52:21 -0400 From: ccoll...@purdue.edu To: r-help@r-project.org Subject: [R] smoothScatter function (color density question) and adding a legend Dear R experts, I am resending my questions below one more time just in case someone out there could help but missed my email. Thanks, I was curious about this and so I installed the geneplot package and that went fine but I had another issue and need to take care of that. First, see if ?legend helps at all ( no idea, I have not used this stuff much). Also source code should be available for the pacakge you care about. On quick read, it sounds like this uses a binning system for colors. If you want something slightly different and much slower, I have some examples ( IIRC) that calculate densities in R using something similar to electric field calculation around points, ( again I'm in a hurry and pulling this from archive so caveat emptor this plot may not correspond exactly to example script below etc), http://98.129.232.232/coloumb1.pdf mydensity-function(x1,x2) { len=length(x1); z-1:len; for ( y in 1:len ) { nx=c(1:(y-1),(y+1):len); if ( y==1) {nx=2:len;} if (y==len) {nx=1:(len-1); } # coloumb is a bit much with overlapping data points, so limite it a bit z[y]=sum(1.0/(.001*1+(x1[nx]-x1[y])*(x1[nx]-x1[y])+(x2[nx]-x2[y])*(x2[nx]-x2[y]))); #for } print(max(z)); print(min(z)); z=z-min(z); z=100*z/max(z); hist(z); color=rainbow(100); #color=heat.colors(10); tmap=color[floor(z)+1]; scatterplot3d(x1,x2,z,color=tmap); plot(x2,x1,col=tmap,cex=.5) #library(VecStatGraphs2D) #DrawDensityMap(x1,x2,PaintPoint=TRUE) #color=terrain.colors(26) #color=heat.colors(26) } I don't think my questions are too hard. I am most concerned about the transformation function. See below. Thanks, Clayton Hello, I have a few questions, regarding the smoothScatter function. I have a scatter plot with more than 500,000 data points for two samples. So, I am wanting to display the density in colors to convince people that my good correlation coefficient is not due to an influential point effect and plus, I also want to make my scatter plot look pretty. Anyway ... I have been able to make the plot, but I have a couple of questions about it. I also want to make a legend for it, which I don't know how to do. I have only been playing around with R for a few weeks now off and on. So, I am definitely a beginner. 1. I have 10 colors for my plot: white representing zero density and dark red representing the maximum density (I presume). According to the R documentation, the transformation argument represents the function mapping the density scale to the color scale. Note that in my R code below, transformation = function(x) x^0.14. I was wondering how exactly or mathematically that this function relates the density to color. I believe the colorRampPalette ramps colors between 0 and 1. I am not sure if x represents the color or the density. Since I have 10 colors, I suppose the colorRampPalette would assign values of 0, 0.11, 0.22, 0.33, 0.44, 0.55, 0.66, 0.77, 0.88, and 1 for white to dark red. I am not sure though. Does anyone know how this works? I am sure its not too too complicated. 2. In a related issue, I also would like to make a legend for this plot. Then, I would be able to see the transformation function's effects on the color-density relationship. Could someone help me in making a legend for my smoothScatter plot? I would like to place it immediately to the right of my plot as a vertical bar, matching the vertical length of the plot as is often convention. I really like the smoothScatter function. It is easy to use, and I believe it's relatively new. Thanks in advance. -Clayton clayton - c(white, lightblue, blue, darkgreen, green, yellow, orange, darkorange, red, darkred) x - read.table(c:/users/ccolling/log_u1m1.txt, sep =\t) smoothScatter(x$V8~x$V7, nbin=1000, colramp = colorRampPalette(c(clayton)), nrpoints=Inf, pch=, cex=.7, transformation = function(x) x^.14, col=black, main=M1 vs. M3 (r = 0.92), xlab=Normalized M1, ylab=Normalized M2, cex.axis = 1.75, cex.lab = 1.25 cex.main = 2) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RES: Linear multivariate regression with Robust error
Date: Fri, 10 Jun 2011 16:50:24 -0300 From: filipe.bote...@vpar.com.br To: frien...@yorku.ca; bkkoc...@gmail.com CC: r-help@r-project.org Subject: [R] RES: Linear multivariate regression with Robust error --Forwarded Message Attachment-- Subject: RES: [R] Linear multivariate regression with Robust error Date: Fri, 10 Jun 2011 16:50:24 -0300 From: filipe.bote...@vpar.com.br To: frien...@yorku.ca; bkkoc...@gmail.com CC: r-help@r-project.org Hi Barjesh, I am not sure which data you analyze, but once I had a similar situation and it was a multicolinearity issue. I realized this after finding a huge correlation between the independent variables, then I dropped one of them and signs of slopes made sense. Beforehand, have a glance at your correlation matrix of independent variables to see the relationship between them. I guess look at the data ( LATFD ) seems to be the recurring solution. Certainly with polynomial regression, there's a tendency to think that the fit parameter is a pure property of the system being examined. With something like a Taylor series, and you have a slope at a specific point, maybe you could think about it that way but here the coefficient is just whatever optimizes your (arbitrary ) error function for the data you have. A linear approaximation to a non-line could be made at one point and maybe that property should remain constant but you have people claiming past authors samples some curve near x=a and got a different slope than my work largely smapling f(x) around x=b what is wrong with my result? It may be interesting to try to write a taylor series around some point and see how those coefficients vary with data sets for example( you still need arbitrary way to estimate slopes and simply differencing two points may be a bit noisy LOL but you could play with some wavelet families maybe ). If you try to describe a fruit as a linear combination of vegetables, you may get confusing but possibly useful results even if they don't correspond to properties of the fruits so much as a specific need you have. For example, if you are compressing images of fruits and your decoder already has a dictionary of vegetables, it may make sense to do this. This is not much different from trying to compress non-vocal music with an ACELP codec that attempts to fit the sounds to models of human vocal tract. Sometimes this may be even be informative about how a given sound was produced even if it sounds silly. Not sure how helpful my advice will be, but it did the trick for me back then ;) Cheers, Filipe Botelho -Mensagem original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em nome de Michael Friendly Enviada em: sexta-feira, 10 de junho de 2011 12:51 Para: Barjesh Kochar Cc: r-help@r-project.org Assunto: Re: [R] Linear multivariate regression with Robust error On 6/10/2011 12:23 AM, Barjesh Kochar wrote: Dear all, i am doing linear regression with robust error to know the effect of a (x) variable on (y)other if i execute the command i found positive trend. But if i check the effect of number of (x.x1,x2,x3)variables on same (y)variable then the positive effect shwon by x variable turns to negative. so plz help me in this situation. Barjesh Kochar Research scholar You don't give any data or provide any code (as the posting guide requests) , so I have to guess that you have just rediscovered Simpson's paradox -- that the coefficient of a variable in a marginal regression can have an opposite sign to that in a joint model with other predictors. I have no idea what you mean by 'robust error'. One remedy is an added-variable plot which will show you the partial contributions of each predictor in the joint model, as well as whether there are any influential observations that are driving the estimated coefficients. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This message and its attachments may contain confidential and/or privileged information. If you are not the addressee, please, advise the sender immediately by replying to the e-mail and delete this message. Este mensaje y sus anexos pueden contener información confidencial o privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un mensaje al remitente. Esta mensagem e seus anexos podem conter informação confidencial ou privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação ao
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Date: Sat, 11 Jun 2011 19:57:47 +0200 Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing From: lui.r.proj...@googlemail.com To: marchy...@hotmail.com CC: r-help@r-project.org Hello Mike, [[elided Hotmail spam]] Best to my knowledge the sort algorithm implemented in R is already backed by C++ code and not natively written in R. Writing the code in C++ is not really an option either (i think rGenoud is also written in C++). I am not sure whether there really is a bottleneck with respect to the computer - I/O is pretty low, plenty of RAM left etc. It really seems to me as if parallelizing is not easily possible or only at high costs so that the benefits diminish through all the coordination and handling needed... Did anybody use rGenoud in cluster mode an experience sth similar? Are quicksort packages available using multiple processors efficiently (I didnt find any... :-( ). I'm no expert but these don't seem to be terribly subtle problems in most cases. Sure, if the task is not suited to parallelism and you force it to be parallel and it spends all its time syncing up, that can be a problem. Just making more tasks to fight over the bottle neck- memory, CPU, locks- can easily make things worse. I think I posted my link earlier on IEEE blurb showing how easy it is for many cores to make things worse on non-contrived benchmarks. I am by no means an expert on parallel processing, but is it possible, that benefits from parallelizing a process greatly diminish if a large set of variables/functions need to be made available and the actual function (in this case sorting a few hundred entries) is quite short whereas the number of times the function is called is very high!? It was quite striking that the first run usually took several hours (instead of half an hour) and the subsequent runs were much much faster.. There is so much happening behind the scenes that it is a little hard for me to tell what might help - and what will not... Help appreciated :-) Thank you Lui On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka wrote: Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with geomap in googleVis
Subject: RE: [R] error with geomap in googleVis Date: Fri, 10 Jun 2011 11:06:47 +0100 From: markus.gesm...@lloyds.com To: marchy...@hotmail.com; r-h...@stat.math.ethz.ch Hi Mike, I believe you are trying to put two charts on the same page. The demo(googleVis) provides you with an example for this. So applying the same approach to your data sets would require: Thanks, I thought however I tried to different examples the first being my own which failed and then the Fruits example as a second test. I was lazy and used install package from probably wrong mirror as in the past on 'dohs I had problems with CMD INSTALL. I reloaded as per your suggestion and tried first example. The output did change but it is still lacking a chart. Do I need API key or something? Ive never used google visualization before and was just reacting to OP as I had wanted to try it. Process is below, final result may be here, http://98.129.232.232/xxx.html , but I will be varying as I get time. Thanks again. --2011-06-10 05:17:47-- http://cran.r-project.org/src/contrib/googleVis_0.2.5.t ar.gz Resolving cran.r-project.org... 137.208.57.37 Connecting to cran.r-project.org|137.208.57.37|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Fri, 10 Jun 2011 10:17:47 GMT Server: Apache/2.2.9 (Debian) Last-Modified: Tue, 07 Jun 2011 06:35:56 GMT ETag: a83451-b36e9-4a5196ea64b00 Accept-Ranges: bytes Content-Length: 734953 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: application/x-gzip Length: 734953 (718K) [application/x-gzip] Saving to: `googleVis_0.2.5.tar.gz' 100%[==] 734,953 527K/s in 1.4s 2011-06-10 05:17:49 (527 KB/s) - `googleVis_0.2.5.tar.gz' saved [734953/734953] [marchywka@351915-www1 downloads]$ R CMD INSTALL googleVis_0.2.5.tar.gz * installing to library `/usr/local/lib64/R/library' * installing *source* package `googleVis' ... ** R ** data ** moving datasets to lazyload DB ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... ** testing if installed package can be loaded * DONE (googleVis) [marchywka@351915-www1 downloads]$ R R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library(googleVis) df-data.frLoading required package: RJSONIO ame(foo=c(BrazilTo suppress the following message use the statement: suppressPackageStartupMessages(library(googleVis)) Welcome to googleVis version 0.2.5 Type ?googleVis to access the overall documentation and vignette('googleVis') for the package vignette. You can execute the demo of the package via: demo(googleVis) More information is available on the googleVis project web-site: http://code.google.com/p/google-motion-charts-with-r/ Please read also the Google Visualisation API Terms of Use: http://code.google.com/apis/visualization/terms.html Feel free to send us an email rvisualisat...@gmail.com if you would like to be keept informed of new versions, or if you have any feedback, ideas, suggestions or would like to collaborate. ,C df-data.frame(foo=c(Brazil,Canada), bar=c(123,456)) map1-gvisGeoMap(df,locationvar='foo',numvar='bar') cat(map1$html) Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat' cat(map1$html$header) !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; html xmlns=http://www.w3.org/1999/xhtml; head titleGeoMapID23b504e5/title meta http-equiv=content-type content=text/html;charset=utf-8 / style type=text/css body { color: #44; font-family: Arial,Helvetica,sans-serif; font-size: 75%; } a { color: #4D87C7; text-decoration: none; } /style /head body cat(map1$html$header,file=xxx.html,appeand=F) cat(map1$html$chart,file=xxx.html,appeand=t) Error in cat(list(...), file, sep, fill, labels, append) : argument 2 (type 'closure') cannot be handled by 'cat' str(map1$html) List of 4 $ header : chr !DOCTYPE html PUBLIC \-//W3C//DTD XHTML 1.0 Strict//EN\\n \http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\;\nht| __truncated__ $ chart : Named chr [1:7] !-- GeoMap generated in R 2.13.0 by googleVis
Re: [R] Linear multivariate regression with Robust error
Date: Fri, 10 Jun 2011 09:53:20 +0530 From: bkkoc...@gmail.com To: r-help@r-project.org Subject: [R] Linear multivariate regression with Robust error Dear all, i am doing linear regression with robust error to know the effect of a (x) variable on (y)other if i execute the command i found positive trend. But if i check the effect of number of (x.x1,x2,x3)variables on same (y)variable then the positive effect shwon by x variable turns to negative. so plz help me in this situation. take y as goodness and x and x1 have something to do with a product. The first analysis is from company A, second is from company B and the underlying relationship is given with some noise LOL, ( I'm still on first cup of cofee, this was fist example to come to mind as these question keep coming up here everyday ) x=1:100 x1=x*x y-x-x1+runif(100) lm(y~x) Call: lm(formula = y ~ x) Coefficients: (Intercept) x 1718 -100 lm(y~x+x1) Call: lm(formula = y ~ x + x1) Coefficients: (Intercept) x x1 0.5253 1.0024 -1. Barjesh Kochar Research scholar __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Resources for utilizing multiple processors
From: rjeffr...@ucla.edu Date: Wed, 8 Jun 2011 20:54:45 -0700 To: r-help@r-project.org Subject: [R] Resources for utilizing multiple processors Hello, I know of some various methods out there to utilize multiple processors but am not sure what the best solution would be. First some things to note: I'm running dependent simulations, so direct parallel coding is out (multicore, doSnow, etc). the *nix languages. Well, for the situation below you seem to want a function server. You could consider Rapache and just write this like a big web application. A web server, like a DB, is not the first thing you think of with high performance computing but if your computationally intenstive tasks are in native code this could be a reasoanble overhead that requires little learning. If you literally means cores instead of machines keep in mind that cores can end up fighting over resources, like memory ( this cites IEEE article with cores making things worse in non-contrived case) http://lists.boost.org/boost-users/2008/11/42263.php I think people have mentioned some classes like bigmemory, I forget the names exactly, that let you handle larger things. Launching a bunch of threads and letting VM thrash can easily make things slower quickly. I guess a better approach would be to get an implementation that is block oriented and you can do the memory/file stuff in R until they get a data frame that uses disk transparently and with hints on expected access patterns ( prefetch etc). My main concern deals with Multiple analyses on large data sets. By large I mean that when I'm done running 2 simulations R is using ~3G of RAM, the remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic to compare the two resulting samples, grinding the process to a halt. I'd like to have separate cores simultaneously run each analysis. That will save on time and I'll have to ponder the BGR calculation problem another way. Can R temporarily use HD space to write calculations to instead of RAM? The second concern boils down to whether or not there is a way to split up dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1 to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and [[elided Hotmail spam]] So if anyone has any suggestions as to a direction I can look into, it would be appreciated. Robin Jeffries MS, DrPH Candidate Department of Biostatistics UCLA 530-633-STAT(7828) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with geomap in googleVis
To: r-h...@stat.math.ethz.ch From: mjphi...@tpg.com.au Date: Wed, 8 Jun 2011 10:14:01 + Subject: Re: [R] error with geomap in googleVis SNV Krishna primps.com.sg writes: Hi All, I am unable to get the plot geomap in googleVis package. data is as follows head(index.ret) country ytd 1 Argentina -10.18 2 Australia -3.42 3 Austria -2.70 4 Belgium 1.94 5 Brazil -7.16 6 Canada 0.56 map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar = 'ytd') plot(map1) But it just displays a blank page, showing an error symbol at the right bottom corner. I tried demo(googleVis), it also had a similar problem. The demo showed all other plots/maps except for those geomaps. Could any one please hint me what/where could be the problem? Many thanks for the idea and support. I had never used this until yesterday but it seems to generate html. I didn't manage to get a chart to display but if you are familiar with this package and html perhaps you could look at map1$html and see if anything is obvious. One great thing about html/js is that it is human readable and you can integrate it well with other page material without much in the way of special tools. Regards, SNV Krishna [[alternative HTML version deleted]] Hi All, I have also encountered this problem. I have tested the problem in Windows XP 3.0. I have latest java and flash and I have tried both Firefox and IE (both latest k just fine. I too would like to know how to solve this problem. Kind regards, Michael Phipps __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any documents
Date: Thu, 9 Jun 2011 02:21:21 -0700 From: wa7@gmail.com To: r-help@r-project.org Subject: [R] any documents Hi, I'm doing a textual analysis of several articles discussing the evolution of prices in order to give a forecast. if someone can give me a clear approach to this knowing that I work on the package tm. LOL, are you talking about the computer generated analysis such as the thin text platititudes around bandwagon stats such as is trading xx above 30 day moving average etc etc. This sounds funny but is actually an interesting test case as the hidden structured nature of the documents should be easier to analyse than, say, poetry. The field is very much researchy AFAIK and you will need to define an algorithm of do a literature search to get much in the way of helpful response beyond ?tm Absent that, you are almost asking for someone to invent an algorithm. I've refered many posters to terms like computational lingquistics but people who have used these kinds of things don't seem to post here much. If you can give us more details or source article maybe someone can point you in useful direction. Thank you very much -- View this message in context: http://r.789695.n4.nabble.com/any-documents-tp3584961p3584961.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error with geomap in googleVis
I still got blanks with Firefox with the two examples below, I put html up here if you want to look at it, http://98.129.232.232/xxx.html I just downloaded googlevis from mirror 68 and it claimed it was 0.2.5 ( I thought, but maybe I should check again). install.packages(googleVis,dep=T) library(googleVis) df-data.frame(foo=c(Brazil,Canada), bar=c(123,456)) map1-gvisGeoMap(df,locationvar='foo',numvar='bar') cat(map1$html$header,filename=xxx.html,append=F) cat(map1$html$header,file=xxx.html,append=F) cat(map1$html$chart,file=xxx.html,append=T) cat(map1$html$caption,file=xxx.html,append=T) cat(map1$html$footer,file=xxx.html,append=T) m-gvisMotionChart(Fruits,idvar=Fruit,timevar=Year) str(m) cat(m$html$header,file=xxx.html,append=F) cat(m$html$chart,file=xxx.html,append=T) cat(m$html$caption,file=xxx.html,append=T) cat(m$html$footer,file=xxx.html,append=T) Subject: RE: [R] error with geomap in googleVis Date: Thu, 9 Jun 2011 14:06:22 +0100 From: markus.gesm...@lloyds.com To: marchy...@hotmail.com; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch Hi all, This issue occurs with googleVis 0.2.4 and RJSONIO 0.7.1. Version 0.2.5 of the googleVis package has been uploaded to CRAN two days ago and should have fixed this issue. Can you please try to update to that version, e.g. from http://cran.r-project.org/web/packages/googleVis/ Further version 0.2.5 provides new interfaces to more interactive Google charts: - gvisLineChart - gvisBarChart - gvisColumnChart - gvisAreaChart - gvisScatterChart - gvisPieChart - gvisGauge - gvisOrgChart - gvisIntensityMap Additionally a new demo 'AnimatedGeoMap' has been added which shows how a Geo Map can be animated with additional JavaScript. Thanks to Manoj Ananthapadmanabhan and Anand Ramalingam, who provided the idea and initial code. For more information and examples see: http://code.google.com/p/google-motion-charts-with-r/ I hope this helps Markus -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Marchywka Sent: 09 June 2011 11:19 To: mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch Subject: Re: [R] error with geomap in googleVis To: r-h...@stat.math.ethz.ch From: mjphi...@tpg.com.au Date: Wed, 8 Jun 2011 10:14:01 + Subject: Re: [R] error with geomap in googleVis SNV Krishna primps.com.sg writes: Hi All, I am unable to get the plot geomap in googleVis package. data is as follows head(index.ret) country ytd 1 Argentina -10.18 2 Australia -3.42 3 Austria -2.70 4 Belgium 1.94 5 Brazil -7.16 6 Canada 0.56 map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar = 'ytd') plot(map1) But it just displays a blank page, showing an error symbol at the right bottom corner. I tried demo(googleVis), it also had a similar problem. The demo showed all other plots/maps except for those geomaps. Could any one please hint me what/where could be the problem? Many thanks for the idea and support. I had never used this until yesterday but it seems to generate html. I didn't manage to get a chart to display but if you are familiar with this package and html perhaps you could look at map1$html and see if anything is obvious. One great thing about html/js is that it is human readable and you can integrate it well with other page material without much in the way of special tools. Regards, SNV Krishna [[alternative HTML version deleted]] Hi All, I have also encountered this problem. I have tested the problem in 3.0. I have latest java and flash and I have tried both Firefox and IE (both latest k just fine. I too would like to know how to solve this problem. Kind regards, Michael Phipps __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The information in this E-Mail and in any attachments is CONFIDENTIAL and may be privileged. If you are NOT the intended recipient, please destroy this message and notify the sender immediately. You should NOT retain, copy or use this E-mail for any purpose, nor disclose all or any part of its contents to any other person or persons. Any views expressed in this message are those of the individual sender, EXCEPT where
Re: [R] Can we prepare a questionaire in R
Date: Wed, 8 Jun 2011 12:37:33 +0530 From: ammasamri...@gmail.com To: r-help@r-project.org Subject: [R] Can we prepare a questionaire in R Is there a way to prepare a questionnaire in R like html forms whose data can be directly populated into R? I've started to use Rapache although Rserve would also be an option. When installed on server, your can point your html form to an rhtml page and get the form variables in R just as with other web languages. For writing html output, I had been using R2HTML ( see code excerpt below from rhtml page). I had also found Cairo works ok if you don't need X-11 for anything. For your specific situation however, it may be easier to just use whatever you already have and just use R for the data analysis. When a request for a results report is made, send that to Rapache for example. I would mention that I have gone to running two versions of Apache, one with R and one with PHP, to allow for security and easier development( I can write the php to fail nicely if the R apache server is not up and no new security issues are exposed). library(Cairo) library(R2HTML) library(RColorBrewer) You can of course also generate html from normal R commands, or for that matter bash scripts etc. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RgoogleMaps Axes
Date: Tue, 7 Jun 2011 09:50:10 -0700 From: egregory2...@yahoo.com To: r-help@r-project.org Subject: [R] RgoogleMaps Axes R Help, I posted a question on StackOverflow yesterday regarding an issue I've been having with the RgoogleMaps packages' displaying of axes. Here is the text of that submission: http://stackoverflow.com/questions/6258408/rgooglemaps-axes I can't find any documentation of the following problem I'm having with the axis labels in RGoogleMaps: library(RgoogleMaps) datas -structure(list(LAT =c(37.875,37.925,37.775,37.875,37.875), LON =c(-122.225,-122.225,-122.075,-122.075,-122.025)), .Names=c(LAT,LON),class=data.frame, row.names =c(1418L,1419L,1536L,1538L,1578L)) # Get bounding box. boxt -qbbox(lat =datas$LAT,lon =datas$LON) MyMap-GetMap.bbox(boxt$lonR,boxt$latR,destfile =Arvin12Map.png, maptype =mobile) PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON, axes =TRUE,mar =rep(4,4)) I haven't gotten too far as I had to download and install proj4 and rgdal but I did get a transofrm related warning. Did you get warnings? MyMap-GetMap.bbox(boxt$lonR,boxt$latR,destfile =Arvin12Map.png,maptype =m obile) [1] http://maps.google.com/maps/api/staticmap?center=37.85,-122.125zoom=12siz e=640x640maptype=mobileformat=png32sensor=true Loading required package: rgdal Loading required package: sp Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL 1.8.0, released 2011/01/12 Path to GDAL shared files: /usr/local/share/gdal Loaded PROJ.4 runtime: Rel. 4.7.1, 23 September 2009 Path to PROJ.4 shared files: (autodetected) Warning message: In readGDAL(destfile, silent = TRUE) : GeoTransform values not available PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON, + axes =TRUE,mar =rep(4,4)) List of 6 $ lat.center: num 37.8 $ lon.center: num -122 $ zoom : num 12 $ myTile: int [1:640, 1:640] 968 853 855 969 1033 888 855 884 888 995 ... ..- attr(*, COL)= chr [1:1132] #00 #020201 #020202 #030302 ... ..- attr(*, type)= chr rgb $ BBOX :List of 2 ..$ ll: num [1, 1:2] 37.8 -122.2 .. ..- attr(*, dimnames)=List of 2 .. .. ..$ : chr Y .. .. ..$ : chr [1:2] lat lon ..$ ur: num [1, 1:2] 37.9 -122 .. ..- attr(*, dimnames)=List of 2 .. .. ..$ : chr Y .. .. ..$ : chr [1:2] lat lon $ url : chr google NULL [1] -291.2711 291.2711 [1] -276.5158 276.7972 When I run this on my computer the horizontal axis ranges from 300W to 60E, but the ticks in between aren't linearly spaced (300W, 200W, 100W, 0, 100E, 160W, 60W). Also, the vertical axis moves linearly from 300S to 300N. It seems that no matter what data I supply for datas, the axes are always labeled this way. My question is: 1. Does this problem occur on other machines using this code? 2. Does anyone have an explanation for it? and 3. Can anybody suggest a way to get the correct axes labels (assuming these are incorrect, but maybe i'm somehow misinterpreting the plot!)? Thank you for your time. There has been no answer other than that I ought to contact the package maintainer. Since it would be nice to have a publicly displayed solution, I opted to post here first before doing so. Does anyone have any insight to share? Thank you very much for your time, -Erik Gregory CSU Sacramento, Mathematics Student Assistant, California Environmental Protection Agency __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression
Date: Tue, 7 Jun 2011 01:38:32 -0700 From: farah.farid@student.aku.edu To: r-help@r-project.org Subject: [R] Logistic Regression I am working on my thesis in which i have couple of independent variables that are categorical in nature and the depndent variable is dichotomus. Initially I run univariate analysis and added the variables with significant p-values (p0.25) in my full model. I have three confusions. Firstly, I am looking for confounding variables by I'm not sure what your thesis is about, some system that you are strying by statistics or maybe the thesis is about statistics, but according to this disputed wikipedia entry, http://en.wikipedia.org/wiki/Confounding confounding or extraneous is determined by the reality of your system. It may help to consider factors related to that and use the statistics to avoid fooling yourself. Look at the pictures ( non-pompous way of saying look at graphs and scatter plots for some ideas to test ) and then test various ideas. You see bad cause/effect inferences all the time in many fields- from econ to biotech ( although anecdotes suggest these mistakes usually favour the sponsors LOL). Consider some mundane known examples about what your data would look like and see if that relates to what you have. If you were naively measuring car velocity at a single point in front of traffic light and color of light, what might you observe ( much like with an earlier example on iron in patients, there are a number of more precisely defined measurements you could take on a given thing.). If your concern is that I ran test A and it said B but test C said D and D seems inconsistent with B it generally helps to look at assumptions and detailed equations for each model and explore what those mean with your data. With continuous variables anyway, non-monotonic relationships can easily destroy a correlation even with strong causality. using formula (crude beta-cofficient - adjusted beta-cofficient)/ crude beta-cofficient x 100 as per rule if the percentage of any variable is 10% than I have considered that as confounder. I wanted to know that from initial model i have deducted one variable with insignificant p-value to form adjusted model. Now how will i know if the variable that i deducted from initial model was confounder or not? Secondly, I wanted to know if the percentage comes in negative like (-17.84%) than will it be considered as confounder or not? I also wanted to know that confounders should be removed from model? or should be kept in model? Lastly, I wanted to know that I am running likelihood ratio test to identify if the value is falling in critical region or not. So if the value doesnot fall in critical region than what does it show? what should I do in this case? In my final reduced model all p-values are significant but still the value identified via likelihood ratio test is not falling in critical region. So what does that show? -- View this message in context: http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Populating values from html
Date: Tue, 7 Jun 2011 03:35:46 -0700 From: ammasamri...@gmail.com To: r-help@r-project.org Subject: [R] Populating values from html can we populate values into an excel sheet from html forms that has to be used in R for data analysis Can we directly retireve the values from html forms into R fro analysis Judging from the way many websites offer data, you'd think that jpg is the best means for getting it LOL. html, pdf, and image formats are designed for human consumption of limited aspects of a data set. Normally you would prefer something closer to raw data like csv. After having visited yet another public website that offers data in pdf ( YAPWTODIP) I would suggest you first contact the site and ask them to present data in a form which allows it to be easily examined in an open source environment ( note this criterion does not make Excel a preferred choice either). If you have to scrape web pages, apparently there are some R facilities but depending on the page you may need to do a lot of work and it will not survive if the page is redone for artistic reasons etc. See any of these results for example, http://www.google.com/search?hl=enq=cran+page+scraping -- View this message in context: http://r.789695.n4.nabble.com/Populating-values-from-html-tp3579215p3579215.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying sequences
Date: Wed, 1 Jun 2011 17:12:29 +0200 From: cjp...@gmail.com To: r-help@r-project.org Subject: Re: [R] Identifying sequences Thanks to David, Thierry and Jonathan for your help. I have been able to put this function together a=1:10 b=20:30 c=40:50 x=c(a,b,c) seq.matrix - function(x){ lower- x[which(diff(x) != 1)] upper - x[which(diff(x) != 1)+1] extremities - c(1,lower, upper,x[length(x)]) m - data.frame(matrix(extremities[order(extremities)],ncol=2,byrow=TRUE,dimnames=list(rows=paste(group,1:(length(lower)+1),sep=),cols=c(lower,upper m$length=m$upper-m$lower+1 m } s.m=seq.matrix(x) s.m lower upper length group1 1 10 10 group2 20 30 11 group3 40 50 11 One can then make a test to see if a certain value (say 9) falls within one of the groups and use that to find the group name or lower or upper border As I understand, you are looking for large derivatives or approx discontinuity against smooth signal. This seems like a natural application for wavelets, try the haar wavelet and use package wavelets, library(wavelets) f=wt.filter(c(-1,1),modwt=T) z-modwt(X=as.numeric(x),filter=f,n.levels=1) z@W $W1 [,1] [1,] 49 [2,] -1 [3,] -1 [4,] -1 [5,] -1 [6,] -1 [7,] -1 [8,] -1 [9,] -1 [10,] -1 [11,] -10 [12,] -1 [13,] -1 [14,] -1 [15,] -1 [16,] -1 [17,] -1 [18,] -1 [19,] -1 [20,] -1 [21,] -1 [22,] -10 [23,] -1 [24,] -1 [25,] -1 [26,] -1 [27,] -1 [28,] -1 [29,] -1 [30,] -1 [31,] -1 [32,] -1 s.m.test=function(s.m,i){which(s.m[,1] s.m.test(s.m,i=9) [1] 1 e.g. row.names(s.m)[s.m.test(s.m,i=9)] [1] group1 Cheers Christiaan On 1 June 2011 14:31, Jonathan Daily wrote: I am assuming in this case that you are looking for continuity along integers, so if you expect noninteger values this will not work. You can get the index of where breaks can be found in your example using which(diff(x) 1) On Wed, Jun 1, 2011 at 6:27 AM, christiaan pauw wrote: Hallo Everybody Consider the following vector a=1:10 b=20:30 c=40:50 x=c(a,b,c) I need a function that can tell me that there are three set of continuos sequences and that the first is from 1:10, the second from 20:30 and the third from 40:50. In other words: a,b, and c. regards Christiaan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compiling C-code in Windows
Date: Tue, 31 May 2011 11:37:56 -0400 From: murdoch.dun...@gmail.com To: tom.osb...@iinet.net.au CC: r-h...@stat.math.ethz.ch; pmi...@ff.uns.ac.rs On 31/05/2011 7:50 AM, Tom Osborn wrote: You could use cygwin's cc/gcc, or the Watcom opensource compiler. Neither of those is supported. Use what the R Admin manual suggests, or you're on your own. I have had fairly good luck with cygwin or mingw but you may need to have a few conditionals etc. Not sure what R suggests but cygwin should work. Duncan Murdoch [Watcom used to be a commercial compiler which ceased and has become an open project]. But listen to people who've experienced the options. [Not I]. Cheers, Tom. - Original Message - From: Petar Milin To: R-HELP Sent: Tuesday, May 31, 2011 9:43 PM Hello ALL! I am an Linux user (Debian testing i386), with very dusty Win-experience. Nevertheless, my colleagues and I are making some package in R, and I built C-routines to speed up things. I followed instruction how to compile C for R (very useful link: http://www.stat.lsa.umich.edu/~yizwang/software/maxLinear/noteonR.html, and links listed there). Everything works like a charm in Linux. I have *.so and wrapper function from R is doing a right call. However, I wanted to make *.dll library for Win-users. Now, I used my colleague's computer with Win XP on it, and with the latest R. In MS-DOS console, I positioned prompt in 'C;\Program Files\R\R-2.13.0\bin\i386\', and then I run: 'R CMD SHLIB C:\trial.c'. However, nothing happened, no trial.dll, nothing. Then, I tried with: 'R CMD SHLIB --output=trial.dll C:\trial.c', but no luck, again. Please, can anyone help me with this? Can I use: 'R CMD SHLIB --output=trial.dll C:\trial.c' under Linux, and expecting working DLL? Best, PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] newbie: fourier series for time series data
( hotmail won't mark text so I'm top posting... ) Can you post the data? Personally I'd just plot abs(fft(x)) and see what you see, as well as looking at Im(fft(x))/Re(fft(x)) or phase spectrum. Now, presumably you are nominally looking for something with a period of 1 year, that part could be expressed in harmonic specrum as suggested below, but you'd also be looking for trends and noises of various types- additive gausssian, amplitude modulation, maybe even frequncy modulation, etc. I guess you could remove a few power terms ( average, linear, etc) just to simplify ( often you get this spectrum with huge uniformative DC or zero frequency component just gets in the way). You can probably find book online dealing with signal processing and RF ( this is where I'm used to seeing this things). It would of course be helpful to then examine known simple cases and see if you can tell them apart. Create fft( sin(t)*(1+a*sin(epsilon*t)+b*t ) for example. I guess if you want to look at writing a model, you could look at phase portrait ( plot derviative versus value ) to again get some idea what you may have that makes sense as model to fit. Date: Tue, 31 May 2011 10:35:16 -0700 From: spencer.gra...@structuremonitoring.com To: eddie...@gmail.com CC: r-help@r-project.org Subject: Re: [R] newbie: fourier series for time series data On 5/31/2011 5:12 AM, eddie smith wrote: Hi Guys, I had a monthly time series's data of land temperature from 1980 to 2008. After plotting a scatter diagram, it seems that annually, there is a semi sinusoidal cycle. How do I run Fourier's series to the data so that I can fit model on it? There are several methods. 1. The simplest would be to select the number of terms you want, put the data into a data.frame, and use lm(y ~ sin(t/period) + cos(t/period) + sin(2*t/period) + cos(2*t/period) + ..., data), including as many terms as you want in the series. This is not recommended, because it ignores the time series effects and does not apply a smoothness penalty to the Fourier approximation. 2. A second is to use the 'fda' package. Examples are provided (even indexed) in Ramsay, Hooker and Graves (2009) Functional Data Analysis with R and Matlab (Springer). This is probably what Ramsay and Hooker would do, but I wouldn't, because it doesn't treat the time series as a time series. It also requires more work on your part. 3. A third general class of approaches uses Kalman filtering, also called dynamic linear models or state space models. This would allow you to estimate a differential equation model, whose solution could be a damped sinusoid. It would also allow you to estimate regression coefficients of a finite Fourier series but without the smoothness penalty you would get with 'fda'. For this, I recommend the 'dlm' package with its vignette and companion book, Petris, Petrone and Campagnoli (2009) Dynamic Linear Models with R (Springer). If you want something quick and dirty, you might want option 1. For that, I might use option 2, because I know and understand it moderately well (being third author on the book). However, if you really want to understand time series, I recommend option 3. That has the additional advantage that I think it would have the greatest chances of acceptance in a refereed academic journal of the three approaches. I am really sorry for my question sound stupid, but I just don't know where to start. There also are specialized email lists that you might consider for a future post. Go to www.r-project.org - Mailing Lists. In particular, you might be most interested in R-sig-ecology. Hope this helps. Spencer Graves I am desperately looking for help from you guys. Thanks in advance. Eddie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Value of 'pi'
Date: Sun, 29 May 2011 23:09:47 -0700 From: jwiley.ps...@gmail.com To: vincy_p...@yahoo.ca CC: r-help@r-project.org Subject: Re: [R] Value of 'pi' Dear Vincy, I hope that in school you also learned that 22/7 is an approximation. Please consult your local mathematician for a proof that pi != 22/7. A quick search will provide you with volumes of information on what pi is, how it may be calculated, and calculations out to thousands of digits. Cheers, Josh On Sun, May 29, 2011 at 11:01 PM, Vincy Pyne wrote: Dear R helpers, I have one basic doubt about the value of pi. In school, we have learned that pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 3.141593. So which value of pi should be considered? You could do this if you trust your trig functions since that is presumably what you want a value of pi for, atan(1) [1] 0.7853982 atan(1)*4 [1] 3.141593 atan(1)*4*7-22 [1] -0.008851425 atan(1)*4-pi [1] 0 Regards Vincy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested design
Date: Sat, 28 May 2011 09:33:03 -0700 From: jwiley.ps...@gmail.com To: bjorn.robr...@gmail.com CC: r-help@r-project.org Subject: Re: [R] Nested design Hi, If you are not asking for stats help, then do you understand the model and are just confused by how R labels it? We can help match R's labels to the ones you are used to, if you tell us what you are used to. I would not suggest as a rule to use a tool to validate itself but you can use R to make sure your interpretation of other R output is right by giving contrived datasets to the analysis package and see what you get back. Comparison can be to examples from text book or your own paper and pencil analysis. This is also a good way to learn things from basic terms to things like sign or unit conventions in different fields etc. You can generate samples from normal distro and feed that to the questionable package to see what comes back. Cheers, Josh On Sat, May 28, 2011 at 6:54 AM, unpeatable wrote: Dear Dennis, In my opinion I am not at all asking for any stats help, just a question how to read this output. Thanks, Bjorn - Dr. Bjorn JM Robroek Ecology and Biodiversity Group Institute of Environmental Biology, Utrecht University Padualaan 8, 3584 CH Utrecht, The Netherlands Email address: b.j.m.robr...@uu.nl http://www.researcherid.com/rid/C-4379-2008 Tel: +31-(0)30-253 6091 -- View this message in context: http://r.789695.n4.nabble.com/Nested-design-tp3557404p3557472.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predictive accuracy
Date: Thu, 26 May 2011 13:50:15 -0700 From: gunter.ber...@gene.com To: ahmed.el-taht...@pfizer.com CC: r-help@r-project.org Subject: Re: [R] predictive accuracy 1. This is not about R, and should be taken off list. Well, depending on what mod's think a little bit of generic how do I REALLY use this tool discusssion may be of benefit for all here- a maillist for a certain brand of hammer may discuss various uses and types of nails etc. Pesonally I have an interest in this-if the OP will post the data it may be possible to explore some analysis options. 2. You are wading in an alligator infested swamp. Get help from (other) statisticians at Pfizer (there are many good ones there). I thought that is what statisticians do? LOL. We don't know the situation- intern, looking for outside ideas after exhausting internals, specific issues with internal peers, summer student not wishing to bother everyone there for details etc. Best, Bert P.S. The answer to all your questions is no (imho). On Thu, May 26, 2011 at 1:35 PM, El-Tahtawy, Ahmed wrote: The strong predictor is the country/region where the study was conducted. So it is not important/useful for a clinician to use it (as long he/she is in USA or Europe). Excluding that predictor will make another 2 insignificant predictors to become significant!! Can the new model have a reliable predictive accuracy? I thought of excluding all patients from other countries and develop the model accordingly- is the exclusion of a lot of patients and compromise of the power is more acceptable?? LOL, quite the contrary, post hoc selection increases power to find whatever you or sponsor desire... Presuming your general interest is in finding out attributes of a given drug under various conditions, you would probably want to combine the observations with tentative thoughts on causality and see what makes the best story. Statistical significance in isolation is a function of the data and analysis method, doesn't really have anything specific to do with underlying systems. In this case, if you have other continuous prognostic factors, say age, LDH, hemoglobin come to mind, you may be able to find that you have nonmonotinc relations between prognostic factor and outcome. But, furhter,say you have enough patients that you could in fact map dose response curves. It may turn out that this curve is in fact non-montonic with parameters non-monotonic in prognsotic factor. Consider avg_survival= a+b*d-c*d^2 where d is the dose. At for small d, it seems to help but for larger dose it makes things worse. Now consider that c is a complicated function of hematocrit, it may not be hard to imagine that anemics and siderositic( is that a word LOL?) have some underlying problems dealing with your drug. These may be distributed geographically etc etc etc. This is all stuff you can simulate in R or even on paper. It sounds like you are already trying to write a label, which may be a bit premature ( although I defer to the guy from DNA for that LOL). indicated for use in patients in Western Hemisphere with You may have decent luck looking at FDA panel discussion transcripts, search for related general stats terms confined to site:fda.gov Thanks for your help... Al -Original Message- From: Marc Schwartz [mailto:marc_schwa...@me.com] Sent: Thursday, May 26, 2011 10:54 AM To: El-Tahtawy, Ahmed Cc: r-help@r-project.org Subject: Re: [R] predictive accuracy On May 26, 2011, at 7:42 AM, El-Tahtawy, Ahmed wrote: I am trying to develop a prognostic model using logistic regression. I built a full , approximate models with the use of penalization - design package. Also, I tried Chi-square criteria, step-down techniques. Used BS for model validation. The main purpose is to develop a predictive model for future patient population. One of the strong predictor pertains to the study design and would not mean much for a clinician/investigator in real clinical situation and have been asked to remove it. Can I propose a model and nomogram without that strong -irrelevant predictor?? If yes, do I need to redo model calibration, discrimination, validation, etc...?? or just have 5 predictors instead of 6 in the prognostic model?? Thanks for your help Al Is it that the study design characteristic would not make sense to a clinician but is relevant to future samples, or that the study design characteristic is unique to the sample upon which the model was developed and is not relevant to future samples because they will not be in the same or a similar study? Is the study design characteristic a surrogate for other factors that would be relevant to future samples? If so, you might engage in a conversation with the clinicians to gain some insights into other variables to consider for inclusion in
Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.
Date: Wed, 25 May 2011 09:49:00 -0400 From: ro...@bestroman.com To: biomathjda...@gmail.com CC: r-help@r-project.org Subject: Re: [R] Processing large datasets Thanks Jonathan. I'm already using RMySQL to load data for couple of days. I wanted to know what are the relevant R capabilities if I want to process much bigger tables. R always reads the whole set into memory and this might be a limitation in case of big tables, correct? ok, now I ask, perhaps for my first R effort I will try to find source code for data frame and make a paging or streaming derivative. That is, at least for fixed size things, it can supply things like number of total rows but has facilities for paging in and out of memory. Presumably all users of data frame have to work through a limited interface which I guess could be expanded with various hints on prefetch this for example. I haven't looked at this idea in a while but the issue keeps coming up, dev list maybe? Anyway, for your immediate issues with a few statistics you could probably write a simple c++ program that ultimately becomes part of an R package. It is a good idea to see what is available but these questions come up here a lot and the normal suggestion is DB which is exactly the opposite of what you want if you have predictable access patterns ( although even here prefetch could probably be implemented). Doesn't it use temporary files or something similar to deal such amount of data? As an example I know that SAS handles sas7bdat files up to 1TB on a box with 76GB memory, without noticeable issues. --Roman - Original Message - In cases where I have to parse through large datasets that will not fit into R's memory, I will grab relevant data using SQL and then analyze said data using R. There are several packages designed to do this, like [1] and [2] below, that allow you to query a database using SQL and end up with that data in an R data.frame. [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko wrote: Hi R list, I'm new to R software, so I'd like to ask about it is capabilities. What I'm looking to do is to run some statistical tests on quite big tables which are aggregated quotes from a market feed. This is a typical set of data. Each day contains millions of records (up to 10 non filtered). 2011-05-24 750 Bid DELL 14130770 400 15.4800 BATS 35482391 Y 1 1 0 0 2011-05-24 904 Bid DELL 14130772 300 15.4800 BATS 35482391 Y 1 0 0 0 2011-05-24 904 Bid DELL 14130773 135 15.4800 BATS 35482391 Y 1 0 0 0 I'll need to filter it out first based on some criteria. Since I keep it mysql database, it can be done through by query. Not super efficient, checked it already. Then I need to aggregate dataset into different time frames (time is represented in ms from midnight, like 35482391). Again, can be done through a databases query, not sure what gonna be faster. Aggregated tables going to be much smaller, like thousands rows per observation day. Then calculate basic statistic: mean, standard deviation, sums etc. After stats are calculated, I need to perform some statistical hypothesis tests. So, my question is: what tool faster for data aggregation and filtration on big datasets: mysql or R? Thanks, --Roman N. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Date: Wed, 25 May 2011 10:18:48 -0400 From: ro...@bestroman.com To: mailinglist.honey...@gmail.com CC: r-help@r-project.org Subject: Re: [R] Processing large datasets Hi, If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Does this have any specific limitations ? It sounds offhand like it does paging and all the needed buffering for arbitrary size data. Does it work with everything? I seem to recall bigmemory came up before in this context and there was some problem. Thanks. Also, if you find yourself needing to do lots of grouping/summarizing type of calculations over large data frame-like objects, you might want to check out the data.table package: http://cran.r-project.org/web/packages/data.table/index.html -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact I don't think data.table is fundamentally different from data.frame type, but thanks for the suggestion. http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf Just like data.frames, data.tables must fit inside RAM The ff package by Adler, listed in Large memory and out-of-memory data is probably most interesting. --Roman Naumenko __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Processing large datasets
Date: Wed, 25 May 2011 12:32:37 -0400 Subject: Re: [R] Processing large datasets From: mailinglist.honey...@gmail.com To: marchy...@hotmail.com CC: ro...@bestroman.com; r-help@r-project.org Hi, On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka wrote: [snip] If your datasets are *really* huge, check out some packages listed under the Large memory and out-of-memory data section of the HighPerformanceComputing task view at CRAN: http://cran.r-project.org/web/views/HighPerformanceComputing.html Does this have any specific limitations ? It sounds offhand like it does paging and all the needed buffering for arbitrary size data. Does it work with everything? I'm not sure what limitations ... I know the bigmemory (and ff) packages try hard to make using out-of-memory datasets as transparent as possible. That having been said, I guess you will have to port more advanced methods to use such packages, hence the existence of the biglm, biganalytics, bigtabulate packages do. I seem to recall bigmemory came up before in this context and there was some problem. Well -- I don't often see emails on this list complaining about their functionality. That doesn't mean they're flawless (I also don't scrutinize the list traffic too closely). It could be that not too many people use them, or that people give up before they come knocking when there is a problem. Has something specifically failed for you in the past, or? No, I haven't tried. I may have it confused with something else. But this question does come up a bit usually related to I tried to read huge file into data frame and wanted to pass it to something with predictable memory access patterns and it ran out of memory. What can I do? I guess I also stopped reading anything after using a DB as this is generally not a replacement for a data strcuture. I'll take a look when I have a big dataset that I can't condense easily. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] so current status of interactive PDF creation isl? trying to explore econometriic convergence thread
I think I just saw a thread go by on how do I make interactive PDF from rgl output but I ignored it at the time and can't seem to find consensus result on google. Essentially I wanted to create pdf or other output to share a 3D plot and let viewers interact with it but it still wasn't clear on easiest way to do this. Is this 2009 page the best reference? http://cran.r-project.org/web/views/Graphics.html Thanks. Specific case of interest is below, I found rgl to be quite useful in regards to this, http://r.789695.n4.nabble.com/maximum-likelihood-convergence-reproducing-Anderson-Blundell-1982-Econometrica-R-vs-Stata-td3502516.html#a3512807 I generated some data points using this script ( which itself uses this data file http://98.129.232.234/temp/share/em.txt ) http://98.129.232.234/temp/share/em.R.txt After some editing with sed, the output of the above script made this data file showing some optimization's trjectory according to my variables of interest, http://98.129.232.234/temp/share/emx.dat.txt df-read.table(emx.dat,header=F,sep= ) I subsequently found a few ways to plot the data, notably, png(em.png) plot(log(df$V1),df$V2) png.off() http://98.129.232.234/temp/share/em.png png(em2.png) plot(log(df$V1),df$V2,cex=1+.2*(log(df$V3)-min(log(df$V3 dev.off() http://98.129.232.234/temp/share/em2.png But what I'd like to do is publish an interactive 3D plot of this data similar to the rgl output of this, df-read.table(emx.dat,header=F,sep= ) library(rgl) rgl.points(log(df$V1),df$V2,log(df$V3)) quick google search and ?rgl didn't seem to provide immediate answer. Is there a way to publish interactive plots? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RGL package installation problem on Centos
Date: Mon, 23 May 2011 14:43:59 +0100 From: arraystrugg...@gmail.com To: r-help@r-project.org Subject: [R] RGL package installation problem on Centos Dear R users, I have installed the latest version of R from source on Centos (using configure and make install). This seemed to work fine, with no Errors reported and R at the command line starts R. However, if I try and installed the package rgl using; install.packages(rgl) I get the following error; installing to /usr/local/lib64/R/library/rgl/libs ** R ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... ** testing if installed package can be loaded *** caught segfault *** address (nil), cause 'memory not mapped' I just did install of R from source built with various options to support Rapache and tried to load rgl. First it complained no display so I went back to bash and did export DISPLAY=:0 and it seemed to load ok. Do you have X running and a display set? Not sure what happens if you have R without X11 support for example. I probably installed with dep=TRUE and only cygwin I do recall some issues with missing dependencies. Try setting dependencies to true and see if that helps. aborting ... sh: line 1: 23732 Segmentation fault '/usr/local/lib64/R/bin/R' --no-save --slave /tmp/RtmpkvIjOb/file6d97876 ERROR: loading failed * removing â/usr/local/lib64/R/library/rglâ The downloaded packages are in â/tmp/Rtmp5OaGuQ/downloaded_packagesâ Updating HTML index of packages in '.Library' Making packages.html ... done Warning message: In install.packages(rgl) : installation of package 'rgl' had non-zero exit status I read that Open GL header files have to be present and are in /usr/include/GL. I also read about different graphics cards causing problems but I don't know how to find this info out. Any help appreciated and full error message included below. Thanks, sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base full error ## install.packages(rgl) --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://cran.ma.imperial.ac.uk/src/contrib/rgl_0.92.798.tar.gz' Content type 'application/x-gzip' length 162 bytes (1.6 Mb) opened URL == downloaded 1.6 Mb * installing *source* package ârglâ ... checking for gcc... gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for gcc... (cached) gcc -std=gnu99 checking whether we are using the GNU C compiler... (cached) yes checking whether gcc -std=gnu99 accepts -g... (cached) yes checking for gcc -std=gnu99 option to accept ISO C89... (cached) none needed checking for libpng-config... yes configure: using libpng-config configure: using libpng dynamic linkage checking for X... libraries , headers checking GL/gl.h usability... yes checking GL/gl.h presence... yes checking for GL/gl.h... yes checking GL/glu.h usability... yes checking GL/glu.h presence... yes checking for GL/glu.h... yes checking for glEnd in -lGL... yes checking for gluProject in -lGLU... yes checking for freetype-config... yes configure: using Freetype and FTGL configure: creating ./config.status config.status: creating src/Makevars ** libs g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12 -DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext -I/usr/local/include -g -O2 -fpic -g -O2 -c BBoxDeco.cpp -o BBoxDeco.o g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12 -DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext -I/usr/local/include -g -O2 -fpic -g -O2 -c Background.cpp -o Background.o g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12 -DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext -I/usr/local/include -g -O2 -fpic -g -O2 -c Color.cpp -o Color.o g++ -I/usr/local/lib64/R/include -DHAVE_PNG_H -I/usr/include/libpng12 -DHAVE_FREETYPE -Iext/ftgl -I/usr/include/freetype2 -Iext -I/usr/local/include -g -O2 -fpic -g -O2 -c
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
dramatically[[elided Hotmail spam]] # Of course, derivative-free methods newuoa, and Nelder-Mead are improved by scaling, but not by the availability of exact gradients. I don't know what is wrong with bobyqa in this example. In short, even with scaling and exact gradients, this optimization problem is recalcitrant. Best, Ravi. From: Mike Marchywka [marchy...@hotmail.com] Sent: Thursday, May 12, 2011 8:30 AM To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com Cc: r-help@r-project.org Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata So what was the final verdict on this discussion? I kind of lost track if anyone has a minute to summarize and critique my summary below. Apparently there were two issues, the comparison between R and Stata was one issue and the optimum solution another. As I understand it, there was some question about R numerical gradient calculation. This would suggest some features of the function may be of interest to consider. The function to be optimized appears to be, as OP stated, some function of residuals of two ( unrelated ) fits. The residual vectors e1 and e2 are dotted in various combinations creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which is the result to be minimized by choice of theta. Theta it seems is an 8 component vector, 4 components determine e1 and the other 4 e2. Presumably a unique solution would require that e1 and e2, both n-component vectors, point in different directions or else both could become aribtarily large while keeping the error signal at zero. For fixed magnitudes, colinearity would reduce the Error. The intent would appear to be to keep the residuals distributed similarly in the two ( unrelated) fits. I guess my question is, did anyone determine that there is a unique solution? or am I totally wrong here ( I haven't used these myself to any extent and just try to run some simple teaching examples, asking for my own clarification as much as anything). Thanks. From: rvarad...@jhmi.edu To: pda...@gmail.com; alex.ols...@gmail.com Date: Sat, 7 May 2011 11:51:56 -0400 CC: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata There is something strange in this problem. I think the log-likelihood is incorrect. See the results below from optimx. You can get much larger log-likelihood values than for the exact solution that Peter provided. ## model 18 lnl - function(theta,y1, y2, x1, x2, x3) { n - length(y1) beta - theta[1:8] e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e - cbind(e1, e2) sigma - t(e)%*%e logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is something wrong here return(-logl) } data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,) attach(data) require(optimx) start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) # the warnings can be safely ignored in the optimx calls p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) Ravi. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of peter dalgaard [pda...@gmail.com] Sent: Saturday, May 07, 2011 4:46 AM To: Alex Olssen Cc: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata On May 6, 2011, at 14:29 , Alex Olssen wrote: Dear R-help, I am trying to reproduce some results presented in a paper by Anderson and Blundell in 1982 in Econometrica using R. The estimation I want to reproduce concerns maximum likelihood estimation of a singular equation system. I can estimate the static model successfully in Stata but for the dynamic models I have difficulty getting convergence. My R program which uses the same likelihood function as in Stata has convergence properties even for the static case. I have copied my R program and the data below. I realise the code could be made more elegant - but it is short enough. Any ideas would be highly appreciated. Better starting values would help. In this case, almost too good values are available: start - c(coef(lm(y1~x1+x2
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
. In short, even with scaling and exact gradients, this optimization problem is recalcitrant. Best, Ravi. From: Mike Marchywka [marchy...@hotmail.com] Sent: Thursday, May 12, 2011 8:30 AM To: Ravi Varadhan; pda...@gmail.com; alex.ols...@gmail.com Cc: r-help@r-project.org Subject: RE: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata So what was the final verdict on this discussion? I kind of lost track if anyone has a minute to summarize and critique my summary below. Apparently there were two issues, the comparison between R and Stata was one issue and the optimum solution another. As I understand it, there was some question about R numerical gradient calculation. This would suggest some features of the function may be of interest to consider. The function to be optimized appears to be, as OP stated, some function of residuals of two ( unrelated ) fits. The residual vectors e1 and e2 are dotted in various combinations creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which is the result to be minimized by choice of theta. Theta it seems is an 8 component vector, 4 components determine e1 and the other 4 e2. Presumably a unique solution would require that e1 and e2, both n-component vectors, point in different directions or else both could become aribtarily large while keeping the error signal at zero. For fixed magnitudes, colinearity would reduce the Error. The intent would appear to be to keep the residuals distributed similarly in the two ( unrelated) fits. I guess my question is, did anyone determine that there is a unique solution? or am I totally wrong here ( I haven't used these myself to any extent and just try to run some simple teaching examples, asking for my own clarification as much as anything). Thanks. From: rvarad...@jhmi.edu To: pda...@gmail.com; alex.ols...@gmail.com Date: Sat, 7 May 2011 11:51:56 -0400 CC: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata There is something strange in this problem. I think the log-likelihood is incorrect. See the results below from optimx. You can get much larger log-likelihood values than for the exact solution that Peter provided. ## model 18 lnl - function(theta,y1, y2, x1, x2, x3) { n - length(y1) beta - theta[1:8] e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e - cbind(e1, e2) sigma - t(e)%*%e logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is something wrong here return(-logl) } data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,) attach(data) require(optimx) start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) # the warnings can be safely ignored in the optimx calls p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) Ravi. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of peter dalgaard [pda...@gmail.com] Sent: Saturday, May 07, 2011 4:46 AM To: Alex Olssen Cc: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata On May 6, 2011, at 14:29 , Alex Olssen wrote: Dear R-help, I am trying to reproduce some results presented in a paper by Anderson and Blundell in 1982 in Econometrica using R. The estimation I want to reproduce concerns maximum likelihood estimation of a singular equation system. I can estimate the static model successfully in Stata but for the dynamic models I have difficulty getting convergence. My R program which uses the same likelihood function as in Stata has convergence properties even for the static case. I have copied my R program and the data below. I realise the code could be made more elegant - but it is short enough. Any ideas would be highly appreciated. Better starting values would help. In this case, almost too good values are available: start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Apart from that, it seems that the conjugate gradient methods have difficulties with this likelihood, for some less than obvious reason. Increasing the maxit gets you closer but still not satisfactory. I would suggest trying out the experimental optimx package. Apparently, some
Re: [R] text mining analysis and word visualization of pdfs
Date: Wed, 18 May 2011 15:24:49 +0530 From: ashimkap...@gmail.com To: k...@huftis.org CC: r-h...@stat.math.ethz.ch Subject: Re: [R] text mining analysis and word visualization of pdfs On Wed, May 18, 2011 at 1:44 PM, Karl Ove Hufthammer wrote: Ajay Ohri wrote: What is the appropriate software package for dumping say 20 PDFS in a folder, then creating data visualization with frequency counts of certain words as well as measure correlation within each file for certain key relationships or key words. pdftotext + Unix™ for Poets + R (ggplot2) What about the tm package ? I am a beginner and I don't know much about this but I recall that it does have the ability to handle PDF's. A few words from the experts would be nice. I don;t know if I'm an expert, I can't even get a browser that echo's keystrokes in a reasonable time with 4 core CPU on 'dohs, but PDF could mean just about anything in terms of how text is respresented. Whatever R packages do, they will not be able to read the mind of the author. Even with pdftotext, there are many options and even simple things like US IRS instruction forms can be almost impossible to extract in a coherent manner. Many authors could care less about the information as long as the thing looks like paper copy. If you are stuck with PDF, I'd be looking for more tools first as you will probably want to know how they are constrcuted. I would just reiterate that the best approach for many data analysts would be to contact data source explaining problems with improperly authored PDF or other specialized file format that are only supported by limited proprietary tools or that obfuscate information of interest. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help, please
From: dwinsem...@comcast.net To: julio.flo...@spss.com.mx Date: Thu, 19 May 2011 10:40:08 -0400 CC: r-help@r-project.org Subject: Re: [R] Help, please On May 18, 2011, at 6:29 PM, Julio César Flores Castro wrote: Hi, I am using R 2.10.1 and I have a doubt. Do you know how many cases can R handle? I was able to handle (meaning do Cox proportional hazards work with the 'rms' package which adds extra memory overhead with a datadist object) a 5.5 million rows by 100 columns dataframe without difficulty using 24 GB on a Mac (BSD UNIX kernel). I was running into performance slow downs related to paging out to virtual memory at 150 columns, but after expanding to 32 GB can now handle 5.5 MM records with 200 columns without paging. I want to use the library npmc but if I have more than 4,500 cases I get an error message. If I use less than 4500 cases I don´t have problems with this library. Is there any way to increase the number of cases in order to use this library. 64 bit OS, 64 bit R, and more memory. The longer term solution is implementation and algorithm designed to increase coherence of memory accesses ( firefox is doing this to me now dropping every few chars and getting many behind as it thrashes with memory leak, LOL). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power Spectrum from STFT (e1071)?
Date: Tue, 10 May 2011 11:55:34 -0700 From: ajf...@psu.edu To: r-help@r-project.org Subject: [R] Power Spectrum from STFT (e1071)? Hello. Does anyone know how to generate a power spectrum from the STFT function in package e1071? The output for this function supplies the Fourier coefficients, but I do not know how to relate these back to the power spectrum. Alternatively, if anyone is aware of other packages/functions that perform short time Fourier transforms, that would also be helpful. What exactly are you trying to do? From what I can recall, quick look on wikipedia and I've done this in the past for audio envelope/carrier separation, the short-term modifier just means you need to window the input signal. I did just do something like this in R, to deconvolve histograms with an impulse response, and it is not hard to create your own window function and multiply with signal. Power spectrum is just magnitude, but you may want to also invert and see that output is real (Re Im) to verify nothing went wrong in processing. Probably this returns complex numbers, I think you can use abs) or Re() and Im() on them and also str() but post some code and try ?fft for example. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
Date: Fri, 13 May 2011 12:38:51 +0200 From: haenl...@escpeurope.eu To: r-help@r-project.org Subject: [R] Powerful PC to run R Dear all, I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. I'm now thinking about buying a more powerful desktop PC or laptop. Can anybody advise me on the best configuration to run R as fast as possible? I will use this PC exclusively for R so any other factors are of limited importance. ( I think my laptop is overheating with firefox trying to execute whatever stupid code hotmail is using ssh to remote server echos keys faster LOL) The point of the above is that it really depends what you are doing. Heat can com from disk drive as well as silicon. Generally you'd want to consider algorithm and implementation and get profiling info before just buying bigger hammer. If you are thrashing VM, sorting some data may help for example. If it is suited parallelism, you could even try to distribute task over several cheaper computers, hard to knwo. Thanks, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from the following dataset?
Date: Thu, 12 May 2011 10:43:59 +0200 From: jose-marcio.mart...@mines-paristech.fr To: xzhan...@ucr.edu CC: r-help@r-project.org Subject: Re: [R] How to extract information from the following dataset? Xin Zhang wrote: Hi all, I have never worked with this kind of data before, so Please help me out with it. I have the following data set, in a csv file, looks like the following: Jan 27, 2010 16:01:24,000 125 - - - Jan 27, 2010 16:06:24,000 125 - - - Jan 27, 2010 16:11:24,000 176 - - - Jan 27, 2010 16:16:25,000 159 - - - Jan 27, 2010 16:21:25,000 142 - - - Jan 27, 2010 16:26:24,000 142 - - - Jan 27, 2010 16:31:24,000 125 - - - Jan 27, 2010 16:36:24,000 125 - - - Jan 27, 2010 16:41:24,000 125 - - - Jan 27, 2010 16:46:24,000 125 - - - Jan 27, 2010 16:51:24,000 125 - - - Jan 27, 2010 16:56:24,000 125 - - - Jan 27, 2010 17:01:24,000 157 - - - Jan 27, 2010 17:06:24,000 172 - - - Jan 27, 2010 17:11:25,000 142 - - - Jan 27, 2010 17:16:24,000 125 - - - Jan 27, 2010 17:21:24,000 125 - - - Jan 27, 2010 17:26:24,000 125 - - - Jan 27, 2010 17:31:24,000 125 - - - Jan 27, 2010 17:36:24,000 125 - - - Jan 27, 2010 17:41:24,000 125 - - - Jan 27, 2010 17:46:24,000 125 - - - Jan 27, 2010 17:51:24,000 125 - - - .. The first few columns are month, day, year, time with OS3 accuracy. And the last number is the measurement I need to extract. I wonder if there is a easy way to just take out the measurements only from a specific day and hour, i.e. if I want measurements from Jan 27 2010 16:--:-- then I get 125,125,176,159,142,142,125,125,125,125,125,125. Many thanks!! The easiest is in the shell, if you're using some flavour of unix : grep Jan 27, 2010 16 filein.txt | awk '{print $5}' fileout.txt and use fileout which will contain only the column of data you want. Nomrally that is what I do but the R POSIXct features work pretty easily. I guess I'd use bash text processing commands to put the data into a form you like, perhaps y-mo-day time and then read it in in as data frame. Usually I convert everything to time since epoch began because I like integers but there are some facilities here like round that work well with date-times. dx-as.POSIXct(2011-04-03 13:14:15) dx [1] 2011-04-03 13:14:15 CDT round(dx,hour) [1] 2011-04-03 13:00:00 CDT as.integer(dx) [1] 1301854455 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
So what was the final verdict on this discussion? I kind of lost track if anyone has a minute to summarize and critique my summary below. Apparently there were two issues, the comparison between R and Stata was one issue and the optimum solution another. As I understand it, there was some question about R numerical gradient calculation. This would suggest some features of the function may be of interest to consider. The function to be optimized appears to be, as OP stated, some function of residuals of two ( unrelated ) fits. The residual vectors e1 and e2 are dotted in various combinations creating a matrix whose determinant is (e1.e1)(e2.e2)-(e1.e2)^2 which is the result to be minimized by choice of theta. Theta it seems is an 8 component vector, 4 components determine e1 and the other 4 e2. Presumably a unique solution would require that e1 and e2, both n-component vectors, point in different directions or else both could become aribtarily large while keeping the error signal at zero. For fixed magnitudes, colinearity would reduce the Error. The intent would appear to be to keep the residuals distributed similarly in the two ( unrelated) fits. I guess my question is, did anyone determine that there is a unique solution? or am I totally wrong here ( I haven't used these myself to any extent and just try to run some simple teaching examples, asking for my own clarification as much as anything). Thanks. From: rvarad...@jhmi.edu To: pda...@gmail.com; alex.ols...@gmail.com Date: Sat, 7 May 2011 11:51:56 -0400 CC: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata There is something strange in this problem. I think the log-likelihood is incorrect. See the results below from optimx. You can get much larger log-likelihood values than for the exact solution that Peter provided. ## model 18 lnl - function(theta,y1, y2, x1, x2, x3) { n - length(y1) beta - theta[1:8] e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e - cbind(e1, e2) sigma - t(e)%*%e logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) # it looks like there is something wrong here return(-logl) } data - read.table(e:/computing/optimx_example.dat, header=TRUE, sep=,) attach(data) require(optimx) start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) # the warnings can be safely ignored in the optimx calls p1 - optimx(start, lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p2 - optimx(rep(0,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) p3 - optimx(rep(0.5,8), lnl, hessian=TRUE, y1=y1, y2=y2, + x1=x1, x2=x2, x3=x3, control=list(all.methods=TRUE, maxit=1500)) Ravi. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of peter dalgaard [pda...@gmail.com] Sent: Saturday, May 07, 2011 4:46 AM To: Alex Olssen Cc: r-help@r-project.org Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata On May 6, 2011, at 14:29 , Alex Olssen wrote: Dear R-help, I am trying to reproduce some results presented in a paper by Anderson and Blundell in 1982 in Econometrica using R. The estimation I want to reproduce concerns maximum likelihood estimation of a singular equation system. I can estimate the static model successfully in Stata but for the dynamic models I have difficulty getting convergence. My R program which uses the same likelihood function as in Stata has convergence properties even for the static case. I have copied my R program and the data below. I realise the code could be made more elegant - but it is short enough. Any ideas would be highly appreciated. Better starting values would help. In this case, almost too good values are available: start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Apart from that, it seems that the conjugate gradient methods have difficulties with this likelihood, for some less than obvious reason. Increasing the maxit gets you closer but still not satisfactory. I would suggest trying out the experimental optimx package. Apparently, some of the algorithms in there are much better at handling this likelihood, notably nlm and nlminb. ## model 18 lnl - function(theta,y1, y2, x1, x2, x3) { n - length(y1) beta - theta[1:8] e1 - y1 - theta[1] - theta[2]*x1 - theta[3]*x2 - theta[4]*x3 e2 - y2 - theta[5] - theta[6]*x1 - theta[7]*x2 - theta[8]*x3 e - cbind(e1, e2) sigma - t(e)%*%e logl - -1*n/2*(2*(1+log(2*pi)) + log(det(sigma))) return(-logl) } p - optim(0*c(1:8), lnl, method=BFGS, hessian=TRUE, y1=y1, y2=y2, x1=x1,
Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata
Date: Mon, 9 May 2011 22:06:38 +1200 From: alex.ols...@gmail.com To: pda...@gmail.com CC: r-help@r-project.org; da...@otter-rsch.com Subject: Re: [R] maximum likelihood convergence reproducing Anderson Blundell 1982 Econometrica R vs Stata Peter said Ahem! You might get us interested in your problem, but not to the level that we are going to install Stata and Tsp and actually dig out and study the scientific paper you are talking about. Please cite the results and explain the differences. Apologies Peter, will do, The results which I can emulate in Stata but not (yet) in R are reported below. did you actually cut/paste code anywhere and is your first coefficient -.19 or -.019? Presumably typos would be one possible problem. They come from Econometrica Vol. 50, No. 6 (Nov., 1982), pp. 1569 is this it, page 1559? http://www.jstor.org/pss/1913396 generally it helps if we could at least see the equations to check your code against typos ( note page number ?) in lnl that may fix part of the mystery. Is full text available on author's site, doesn't come up on citeseer AFAICT, http://citeseerx.ist.psu.edu/search?q=blundell+1982sort=ascdate I guess one question would be what is beta in lnl supposed to be - it isn't used anywhere but I will also mentioned I'm not that familiar with the R code ( I'm trying to work through this to learn R and the optimizers). maybe some words would help, is sigma supposed to be 2x2 or 8x8 and what are e1 and e2 supposed to be? TABLE II - model 18s coef std err p10 -0.19 0.078 p11 0.220 0.019 p12 -0.148 0.021 p13 -0.072 p20 0.893 0.072 p21 -0.148 p22 0.050 0.035 p23 0.098 The results which I produced in Stata are reported below. I spent the last hour rewriting the code to reproduce this - since I am now at home and not at work :( My results are identical to those published. The estimates are for a 3 equation symmetrical singular system. I have not bothered to report symmetrical results and have backed out an extra estimate using adding up constraints. I have also backed out all standard errors using the delta method. . ereturn display -- | Coef. Std. Err. z P|z| [95% Conf. Interval] -+ a | a1 | -.0188115 .0767759 -0.25 0.806 -.1692895 .1316664 a2 | .8926598 .0704068 12.68 0.000 .7546651 1.030655 a3 | .1261517 .0590193 2.14 0.033 .010476 .2418275 -+ g | g11 | .2199442 .0184075 11.95 0.000 .183866 .2560223 g12 | -.1476856 .0211982 -6.97 0.000 -.1892334 -.1061378 g13 | -.0722586 .0145154 -4.98 0.000 -.1007082 -.0438089 g22 | .0496865 .0348052 1.43 0.153 -.0185305 .1179034 g23 | .0979991 .0174397 5.62 0.000 .0638179 .1321803 g33 | -.0257405 .0113869 -2.26 0.024 -.0480584 -.0034226 -- In R I cannot get results like this - I think it is probably to do with my inability at using the optimisers well. Any pointers would be appreciated. Peter said Are we maximizing over the same parameter space? You say that the estimates from the paper gives a log-likelihood of 54.04, but the exact solution clocked in at 76.74, which in my book is rather larger. I meant +54.04 -76.74. It is quite common to get positive log-likelihoods in these system estimation. Kind regards, Alex On 9 May 2011 19:04, peter dalgaard wrote: On May 9, 2011, at 06:07 , Alex Olssen wrote: Thank you all for your input. Unfortunately my problem is not yet resolved. Before I respond to individual comments I make a clarification: In Stata, using the same likelihood function as above, I can reproduce EXACTLY (to 3 decimal places or more, which is exactly considering I am using different software) the results from model 8 of the paper. I take this as an indication that I am using the same likelihood function as the authors, and that it does indeed work. The reason I am trying to estimate the model in R is because while Stata reproduces model 8 perfectly it has convergence difficulties for some of the other models. Peter Dalgaard, Better starting values would help. In this case, almost too good values are available: start - c(coef(lm(y1~x1+x2+x3)), coef(lm(y2~x1+x2+x3))) which appears to be the _exact_ solution. Thanks for the suggestion. Using these starting values produces the exact estimate that Dave Fournier emailed me. If these are the exact solution then why did the author publish different answers which are completely reproducible in Stata and Tsp? Ahem! You might get us interested in your problem, but not to the level that we are going to install Stata and Tsp and actually dig out and study the scientific
Re: [R] Confidence intervals and polynomial fits
From: pda...@gmail.com Date: Sun, 8 May 2011 09:33:23 +0200 To: rh...@sticksoftware.com CC: r-help@r-project.org Subject: Re: [R] Confidence intervals and polynomial fits On May 7, 2011, at 16:15 , Ben Haller wrote: On May 6, 2011, at 4:27 PM, David Winsemius wrote: On May 6, 2011, at 4:16 PM, Ben Haller wrote: As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly correlated, for values close to zero. Not just for x close to zero: cor( (10:20)^2, (10:20)^3 ) [1] 0.9961938 cor( (100:200)^2, (100:200)^3 ) [1] 0.9966219 Wow, that's very interesting. Quite unexpected, for me. Food for thought. Thanks! Notice that because of the high correlations between the x^k, their parameter estimates will be correlated too. In practice, this means that the c.i. for the quartic term contains values for which you can compensate with the other coefficients and still have an acceptable fit to data. (Nothing strange about that; already in simple linear regression, you allow the intercept to change while varying the slope.) I was trying to compose a longer message but at least for even/odd it isn't hard to find a set of values for which cor is zero or to find a set of points that make sines of different frequencies have non-zero correlation- this highlights the fact that the computer isn't magic and it needs data to make basis functions different from each other. For background, you probably want to look up Taylor Series and Orthogonal Basis. I would also suggest using R to add noise to your input and see what that does to your predictions or for that matter take simple known data and add noise although I think in principal you can do this analytically. You can always project a signal onto some subspace and get an estimate of how good your estimate is, but that is different from asking how well you can reconstruct your signal from a bunch of projections. If you want to know, what can I infer about the slope of my thing at x=a that is a specific question about one coefficient but at this point statisticians can elaborate about various issues with the other things you ignore. Also, I think you said something about correclated at x=0 but you can change your origin, (x-a)^n and expand this in a finite series in x^m to see what happens here. Also, if you are using hotmail don't think that a dot product is not html LOL since hotmail know you must mean html when you use less than even in text email... -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls problem with R
Date: Thu, 5 May 2011 01:20:33 -0700 From: sterles...@hotmail.com To: r-help@r-project.org Subject: Re: [R] nls problem with R ID1 ID2 t V(t) 1 1 0 6.053078443 2 1 0.3403 5.56937391 3 1 0.4181 5.45484486 4 1 0.4986 5.193124598 5 1 0.7451 4.31386722 6 1 1.0069 3.645422269 7 1 1.5535 3.587710965 8 1 1.8049 3.740362689 9 1 2.4979 3.699837726 10 1 6.4903 2.908485019 11 1 13.5049 1.888179494 12 1 27.5049 1.176091259 13 1 41.5049 1.176091259 The model (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))] A=Vo, B-Vo*epi, C=exp(-c*t0) V(t)=A-B+B*C*exp(-ct) or further, D=A-B, F=B*C, V(t)=D+F*exp(-ct) this model only really has 3 attriubtes: initial value, final value, and decay constant yet you ask for 4 parameters. There is no way to get a unique answer. For some reason this same form comes up a lot here, I think this is about third time I've sene this in last few weeks. I guess when fishing or shopping for forms to fit, it is tempting to throw a bunch of parameteres into your model but this can create intractable ambiguities. Indeed, if I just remove t0 and use your first 8 points I get this ( random starting values, but convewrged easily you still need to plot etc) [1] 1 v= 8.77181162126362 epi= 0.672516376478598 cl= 1.90973175223917 t0= 0 .643481321167201 summary(nls2) Formula: V2 ~ v0 * (1 - epi + epi * exp(-cl * (T2))) Parameters: Estimate Std. Error t value Pr(|t|) v0 6.2901 0.3384 18.585 8.3e-06 *** epi 0.5430 0.1373 3.955 0.0108 * cl 0.9684 0.5491 1.763 0.1381 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3579 on 5 degrees of freedom Number of iterations to convergence: 11 Achieved convergence tolerance: 4.057e-06 (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]} in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5} lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5} A=(epi*c-lambda2)/(lambda1-lambda2) The regression rule : for formula (1):(t=2,that is) first 8 rows are used for non-linear regression epi,c,t0,V0 parameters are obtained for formula (2):all 13 rows of results are used for non-linear regression lambda1,lambda2,A (with these parameters, delta can be calculated from them) Thanks for help Ster Lesser -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls problem with R
Date: Wed, 4 May 2011 07:07:44 -0700 From: sterles...@hotmail.com To: r-help@r-project.org Subject: Re: [R] nls problem with R Thanks Andrew. I am sorry for some typos that I omit some numbers of T2. Based on your suggestion,I think the problem is in the initial values. And I will read more theory about the non-linear regression. there is unlikely to be any magic involved, unlike getting hotmail to work. As a tool for understanding your data, you should have some idea of the qualitiative properties of model and data and the error function you use to reconcile the two. If you can post your full data set I may post an R example of somethings to try. I was looking for an excuse to play with nls, I'm not expert here, and curious to see what I can do with your example for critique by others. If you want to fully automate this for N contnuous parameters, you can take a shotgun approach but not sure it helps other htna to find gross problems in model or data. I actually wrote a loop to keep picking random parameter values and calculate and SSE between predicted and real data. What you soon find is that this is like trying to decode a good crypto algorithm by guessing- you can do the math to see the problem LOL. -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up plotting to MSWindows graphics window
Date: Wed, 27 Apr 2011 11:16:26 +0200 From: jonat...@k-m-p.nl To: r-help@r-project.org Subject: [R] Speed up plotting to MSWindows graphics window Hello, I am working on a project analysing the performance of motor-vehicles through messages logged over a CAN bus. I am using R 2.12 on Windows XP and 7 I am currently plotting the data in R, overlaying 5 or more plots of data, logged at 1kHz, (using plot.ts() and par(new = TRUE)). The aim is to be able to pan, zoom in and out and get values from the plotted graph using a custom Qt interface that is used as a front end to R.exe (all this works). The plot is drawn by R directly to the windows graphic device. The data is imported from a .csv file (typically around 100MB) to a matrix. (timestamp, message ID, byte0, byte1, ..., byte7) I then separate this matrix into several by message ID (dimensions are in the order of 8cols, 10^6 rows) The panning is done by redrawing the plots, shifted by a small amount. So as to view a window of data from a second to a minute long that can travel the length of the logged data. My problem is that, the redrawing of the plots whilst panning is too slow when dealing with this much data. i.e.: I can see the last graphs being drawn to the screen in the half-second following the view change. I need a fluid change from one view to the next. My question is this: Are there ways to speed up the plotting on the MSWindows display? By reducing plotted point densities to *sensible* values? Well, hard to know but it would help to know where all the time is going. Usually people start complaining when VM thrashing is common but if you are CPU limited you could try restricting the range of data you want to plot rather than relying on the plot to just clip the largely irrelevant points when you are zoomed in. It should not be too expensive to find the limits either incrementally or with binary search on ordered time series. Presumably subsetting is fast using foo[a:b,] One thing you may want to try for change of scale is wavelet or multi-resolution analysis. You can make a tree ( increasing memory usage but even VM here may not be a big penalty if coherence is high ) and display the resolution appropriate for the current scale. Using something other than plot.ts() - is the lattice package faster? I don't need publication quality plots, they can be rougher... I have tried: -Using matrices instead of dataframes - (works for calculations but not enough for plots) -increasing the max usable memory (max-mem-size) - (no change) -increasing the size of the pointer protection stack (max-ppsize) - (no change) -deleting the unnecessary leftover matrices - (no change) -I can't use lines() instead of plot() because of the very different scales (rpm-1, flags -1to3) I am going to do some resampling of the logged data to reduce the vector sizes. (removal of *less* important data and use of window.ts()) But I am currently running out of ideas... So if sombody could point out something, I would be greatfull. Thanks, Jonathan Gabris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed up plotting to MSWindows graphics window
Date: Wed, 27 Apr 2011 14:40:23 +0200 From: jonat...@k-m-p.nl To: r-help@r-project.org Subject: Re: [R] Speed up plotting to MSWindows graphics window On 27/04/2011 13:18, Mike Marchywka wrote: Date: Wed, 27 Apr 2011 11:16:26 +0200 From:jonat...@k-m-p.nl To:r-help@r-project.org Subject: [R] Speed up plotting to MSWindows graphics window Hello, I am working on a project analysing the performance of motor-vehicles through messages logged over a CAN bus. I am currently plotting the data in R, overlaying 5 or more plots of data, logged at 1kHz, (using plot.ts() and par(new = TRUE)). The aim is to be able to pan, zoom in and out and get values from the plotted graph using a custom Qt interface that is used as a front end to R.exe (all this works). The plot is drawn by R directly to the windows graphic device. The data is imported from a .csv file (typically around 100MB) to a matrix. (timestamp, message ID, byte0, byte1, ..., byte7) I then separate this matrix into several by message ID (dimensions are in the order of 8cols, 10^6 rows) The panning is done by redrawing the plots, shifted by a small amount. So as to view a window of data from a second to a minute long that can travel the length of the logged data. My problem is that, the redrawing of the plots whilst panning is too slow when dealing with this much data. i.e.: I can see the last graphs being drawn to the screen in the half-second following the view change. I need a fluid change from one view to the next. My question is this: Are there ways to speed up the plotting on the MSWindows display? By reducing plotted point densities to*sensible* values? Well, hard to know but it would help to know where all the time is going. Usually people start complaining when VM thrashing is common but if you are CPU limited you could try restricting the range of data you want to plot rather than relying on the plot to just clip the largely irrelevant points when you are zoomed in. It should not be too expensive to find the limits either incrementally or with binary search on ordered time series. Presumably subsetting is fast using foo[a:b,] One thing you may want to try for change of scale is wavelet or multi-resolution analysis. You can make a tree ( increasing memory usage but even VM here may not be a big penalty if coherence is high ) and display the resolution appropriate for the current scale. I forgot to add, for plotting I use a command similar to: plot.ts(timestampVector, dataVector, xlim=c(a,b)) a and b are timestamps from timestampVector Is the xlim parameter sufficient for limiting the scope of the plots? Or should I subset the timeseries each time I do a plot? well, maybe time series knows the data to be ordered, I never use that, but in general it has to go check each point and clip the out of range ones. It could I suppose binary search for start/end points but I don't know.Based on what you said it sounds like it does not. The multi-resolution analysis looks interesting. I shall spend some time finding out how to use the wavelets package. Cheers! Using something other than plot.ts() - is the lattice package faster? I don't need publication quality plots, they can be rougher... I have tried: -Using matrices instead of dataframes - (works for calculations but not enough for plots) -increasing the max usable memory (max-mem-size) - (no change) -increasing the size of the pointer protection stack (max-ppsize) - (no change) -deleting the unnecessary leftover matrices - (no change) -I can't use lines() instead of plot() because of the very different scales (rpm-1, flags -1to3) I am going to do some resampling of the logged data to reduce the vector sizes. (removal of*less* important data and use of window.ts()) But I am currently running out of ideas... So if sombody could point out something, I would be greatfull. Thanks, Jonathan Gabris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting
Re: [R] Survival analysis: same subject with multiple treatments and experience multiple events
Date: Fri, 22 Apr 2011 10:00:31 -0400 From: littleduc...@gmail.com To: r-help@r-project.org Subject: [R] Survival analysis: same subject with multiple treatments and experience multiple events Hi there, I need some help to figure out what is the proper model in survival analysis for my data. Subjects were randomized to 3 treatments in trial 1, some of them experience the event during the trial; After period of time those subjects were randomized to 3 treatments again in trial 2, but different from what they got in 1st trial, some of them experience the event during the 2nd trial (I think the carryover effect can be ignored since the time between two trials is long enough.) What I am interested is whether the survival functions differ among treatments. How should I deal with the correlation between the observation since the same subject was treated with two different drugs in two trials? Should I add TRIAL , whether the event happened before, or number of times the event happened before as covariate(s)? Any input will be appreciated. Thank you. Qian No one else replied so I would just suggest a web search using the term crossover design http://www.google.com/#q=cran+crossover+design+survival and refer you to any FDA panel discussions regarding any drugs that have been debated with similar trial designs as part of the debate. http://www.google.com/#sclient=psyhl=ensite=source=hpq=site:fda.gov+briefing+crossover The point of the above is to get some idea what can happen as no battle plan survives first contact with data. Usually the objective in these designs is to infer something about causality in some system and you just use the statistics to avoid fooling yourself. Personally it seems to me that understanding of disease and host dynamics is improving to the point where you can do more with the carryover effct that you mention as well as parametric putative prognostic factors but you can also see opinions vary. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using Java methods in R
Date: Sat, 23 Apr 2011 05:32:59 -0700 From: hill0...@umn.edu To: r-help@r-project.org Subject: Re: [R] Using Java methods in R No answer to my post, so let's try a simpler question. Am I doing this correctly? I have the RGui with R Console on the screen. On rhe top pullDowns, Packages Install Packages USA(IA) rJava library(rJava) .jinit() qsLin - .jnew(C:/ad/j/CalqsLin) Error in .jnew(C:/ad/j/CalqsLin) : java.lang.NoClassDefFoundError: C:/ad/j/CalqsLin i haven't used rjava yet, I think I installed it on linux for testing but no real usage, but this message appears to come from jvm probably because you specified an aboslute path. You can try this from command line, $ ls ../geo/dayte* ../geo/dayte.class mmarchywka@phlulap01 /cygdrive/c/d/phluant/duphus $ java ../geo/dayte Exception in thread main java.lang.NoClassDefFoundError: ///geo/dayte Caused by: java.lang.ClassNotFoundException: ...geo.dayte at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) Could not find the main class: ../geo/dayte. Program will exit. you want to search for the classpath and set that up then refernce the class name not the path ( the class loader will follow the path looking for it ). Not sure how you do that in rjava under 'dohs but if you search for that term is should be apparent. And of course 'dohs and pdf aren't always friendly for automated work so get something like cygwin and reduce pdf's to text etc. So I got this error which means I don't understand very much. I go to C:/ad/j and get C:\ad\jdir CalqsLin.class Volume in drive C has no label. Volume Serial Number is 9A35-67A2 Directory of C:\ad\j 04/23/2011 07:11 AM 14,651 CalqsLin.class 1 File(s) 14,651 bytes 0 Dir(s) 104,257,716,224 bytes free Just to show my intentions, I had next wanted to call this java function method: linTimOfCalqsStgIsLev(20110405235959,-4) using: dblTim - .jcall(qsLin,D,linTimOfCalqsStgIsLev,20110405235959,-4) but that will probably also be wrong? Obviously I don't understand. -- View this message in context: http://r.789695.n4.nabble.com/Using-Java-methods-in-R-tp3469299p3469848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using Java methods in R
From: marchy...@hotmail.com To: hill0...@umn.edu; r-help@r-project.org Date: Sat, 23 Apr 2011 15:12:30 -0400 Subject: Re: [R] Using Java methods in R Date: Sat, 23 Apr 2011 05:32:59 -0700 From: hill0...@umn.edu To: r-help@r-project.org Subject: Re: [R] Using Java methods in R No answer to my post, so let's try a simpler question. Am I doing this correctly? I have the RGui with R Console on the screen. On rhe top pullDowns, Packages Install Packages USA(IA) rJava library(rJava) .jinit() qsLin - .jnew(C:/ad/j/CalqsLin) Error in .jnew(C:/ad/j/CalqsLin) : java.lang.NoClassDefFoundError: C:/ad/j/CalqsLin i haven't used rjava yet, I think I installed it on linux for testing but no real usage, but this message appears to come from jvm probably because you specified an aboslute path. You can try this from command line, $ ls ../geo/dayte* ../geo/dayte.class mmarchywka@phlulap01 /cygdrive/c/d/phluant/duphus $ java ../geo/dayte Exception in thread main java.lang.NoClassDefFoundError: ///geo/dayte Caused by: java.lang.ClassNotFoundException: ...geo.dayte at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) Could not find the main class: ../geo/dayte. Program will exit. you want to search for the classpath and set that up then refernce the class name not the path ( the class loader will follow the path looking for it ). Not sure how you do that in rjava under 'dohs but if you search for that term is should be apparent. And of course 'dohs and pdf aren't always friendly for automated work so get something like cygwin and reduce pdf's to text etc. It appears that by default the current directory is on the classpath. I must have installed this on 'dohs before and if I copy the class file into R directory it can find it, library(rJava) .jinit() .jnew(foo) Error in .jnew(foo) : java.lang.NoClassDefFoundError: foo .jnew(dayte) [1] Java-Object{dayte@1de3f2d} that's unlikely to fix all your problems, you want to set the classpath but if you know the term and can load cygwin you should be able to find the way to set that up. So I got this error which means I don't understand very much. I go to C:/ad/j and get C:\ad\jdir CalqsLin.class Volume in drive C has no label. Volume Serial Number is 9A35-67A2 Directory of C:\ad\j 04/23/2011 07:11 AM 14,651 CalqsLin.class 1 File(s) 14,651 bytes 0 Dir(s) 104,257,716,224 bytes free Just to show my intentions, I had next wanted to call this java function method: linTimOfCalqsStgIsLev(20110405235959,-4) using: dblTim - .jcall(qsLin,D,linTimOfCalqsStgIsLev,20110405235959,-4) but that will probably also be wrong? Obviously I don't understand. -- View this message in context: http://r.789695.n4.nabble.com/Using-Java-methods-in-R-tp3469299p3469848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to answer the question about transitive correlation?
Date: Fri, 22 Apr 2011 11:42:59 +0800 From: mailzhu...@gmail.com To: r-help@r-project.org Subject: [R] How to answer the question about transitive correlation? Hi, everyone. I know it may be a basic statistical question. But I can't find a good answer. I have a question raised by one of the reviewers. Factor A expression was strongly correlated with B expression (chi-square) in this series. Prior reports by the same authors showed that B expression strongly correlated with survival (Log-rank). Please provide an explanation why then were the results not transitive. The only explanation that would have any value would require you post the data and let everyone look at it and any R output would be a benefit too. So you compared A and B with one test, B vs C with another, cutoff the result at some arbitrary criterion, and then wonder why some unspeficied test and criterion applied to A vs C doesn't vote in the majority with the same answer? Changing acceptance levels or applying a correction after careful shoppping can always fix that logical inconsistency LOL. Its not hard to plot 3 normal curves on same graph and see one example of how their overlaps can relate. I'm not sure what to think but perhaps you could look at stuff like this, http://en.wikipedia.org/wiki/Path_analysis_%28statistics%29 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extrapolating data points for individuals who lack them
From: de...@exeter.ac.uk To: r-help@r-project.org Date: Wed, 20 Apr 2011 12:41:29 +0100 Subject: [R] Extrapolating data points for individuals who lack them Hi, We have an experiment where individuals responses were measured over 5 days. Some responses were not obtained because we only allowed individuals to respond within a limited time-frame. These individuals are given the maximum response time as they did not respond, yet we feel they may have done if given time (and by looking at the rest of their responses over time, the non-response days stand out). We therefore want to extrapolate data points for individuals, on days when they didn't respond, using a regression of days when they did. Does anyone know how we could do this quickly and easily in R? You are probably talking about right censoring. See things like this, ( you may have good luck just with R rather than CRAN ) http://www.google.com/#sclient=psyhl=ensource=hpq=CRAN+informative+%22right+censoring%22 If you post data maybe someone can try a few things. It isn't hard to take data subsets, fit models, and replace data with model predictions but easier and more interesting to illustrate with your data. Personally I would avoid making up data and of course extrapolation tends to be the most error prone way of doing that. Thanks very much Dave __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regression and lmer
Date: Mon, 18 Apr 2011 03:27:40 -0700 From: lampria...@yahoo.com To: r-help@r-project.org Subject: Re: [R] regression and lmer Dear all, I hope this is the right place to ask this question. ( hotmail not marking your text sorry I can';t find option to change that ) opinions vary but mods seem to let this by and personally it seems appropriate to discuss general questions about what R can do. ( end my text ) I am reviewing a research where the analyst(s) are using a linear regression model. The dependent variable (DV) is a continuous measure. The independent variables (IVs) are a mixture of linear and categorical variables. ( my text ) No one wants to do homework or do your job for free but open free peer review should not be a problem philosophically. ( / my text, afraid to use less-than for inciting hotmail ) The author investigates whether performance (DV - continuous linear) is a function of age (continuous IV1 - measured in years), previous performance (continuous IV2), country (categorical IV3 - six countries), the percentage of PhD graduates in each country (continuous IV4 - country level data - apparently only six different percentages since we have only six countries) and population of country (continuous IV5 - country level data - again only six numbers here, one for each country population). My own opinion is that the lm function cannot be used with country level data as IVs (for example IV4 and IV5 cannot be entered into the model because they are country level data). If IV4 and IV5 are included in the model, it is possible that the model will not be able to be defined because we only have six countries and it is very likely that the levels of counties (IV3) may be confounding with IV4 and IV5. This also calls for multicollinearity issues, right? I would like to suggest to the analyst to use lmer using the IV3 as a random variable and IV4 and IV5 as IV at the second level of the two-level model. The questions are: (a) Is it true that IV4 and IV5 cannot be entered in a one-level regression if we also have IV3?, (b) can I use an lm function to check for multicollinearity between IV3, IV4 and IV5? and (c) If we use a two-level regression model, does lmer cope well with only six coutnries as a random effect? ( my txt) So you have presumably a large number of test subjects per country and a small number ( n~6 ) of countries. You could ask a number of questions such as, do the mean performances change from country to country by more than that expected given the observed distributions of performances within country? You could also ask a question like, if I try to describe performance as a function of country attriubte what fitting parameters minimize an error between fit and observation? Apparently author tried to write an expression like average_performance= a[country_index] + m1*some_attribute_of_country + m2* some_other_attribute_of_country + b and then expected the fittring algoright to pick a[i], b,m1, and m2 in such a way as to minimize the resulting error. The reported fits hopefully minimize the error function but then you need to exmaine the second derivative in various directions, so you have to ask how the error varies as you change a[i],b,m1, and m2. ( Ignore b right now and assume it s included in a[i]). I guess if you can find a direction where the error can not change due to these contraints then it would seem to be impossible for the fit to come up with unique values. If you change each a[i] and m1 by some amounts, for example, can you pick those amounts to not change anything? ( /my text ) Thank you for your help Jason [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rsquared for anova
( did this msg make it through the lists as rich text? hotmail didn't seem to think it was plain text?) Anyway, having come in in the middle of this it isn't clear if your issues are with R or stats or both. Usually the hard core stats people punt the stats questions to other places but both can be addressed somewhat. In any case, exploratory work is a good way to learn both and I always like looking at new data. If you have one or a few dependent variable and many independent variable, it would probably help if you could visualize a surface with the response as a function of the input variables and then, maybe with the input of prior information or anecdotes, you have some idea what tests or analyses would make sense. just some thoughts for illustration only df-read.table(results_processedCP.txt,header=T) first it helps to make sure everything went ok and do quick checks, for example, str(df) unique(df$nh1) unique(df$nh2) unique(df$nh3) unique(df$randsize) unique(df$aweoghts) unique(df$aweights) now personally lots of binary variable confuse me and I can munge them all together since I expect I can later identify issues in following plots. So, with this data you can create a composite variable like this, ( now I have not checked any of this for accuracy and typos and other problems may render the results useless) x=df$nh1+2*df$nh2+4*df$nh3+2*df$randsize+32*df$aweights df2-cbind(df,x) str(df2) not sure if time was an input or output but you could see if there is any obvious trend or periodicity of time with your new made up variable, plot(df2$time,df2$x) Apparently x is a num rather than int, it can be changed for illustration but probably of no consequence, xi=as.integer(x) str(xi) and then you can add color based on this varaiable, min(xi) c=rainbow(56) cx=c[xi+1] str(cx) and make color coded scatter plots. Now, if you got lucky and guessed right you may see some patterns that you want to test, plot(df2$tos,df2$tws,col=cx) in this case, I get a cool red-yellow-green line along bottom ( very compelling linear fit question ) and scattered magenta( pink red? LOL ) and blue points everywhere with cluster near origin and nothing in top right quadrant. Also note a few blues lines above the red-green-yellow line but much shorter. And in fact, presumably you already knew this as it looks like it was designed in, if you just plot the red and green points the fit looks perfect for linear, good=which(df2$x20) plot(df2$tos[good],df2$tws[good],col=cx[good]) now if you look at results of fit of Good points vs all points, it isn't clear that anything like this would emerge from just looking at summaries of a linear fit, td=df2$tos[good] ti=df2$tws[good] lm(td~ti) lm(df2$tos~ df2$tws) summary(lm(td~ti)) summary(lm(df2$tos~ df2$tws)) Now of course tests need to be considered ahead of time or else it is easy to go shopping for the answer you want. Anything post hoc needs to be very complete and you should at least try to rationalize test results you don't happen to like ( assuming you are trying to understand the system from which the data was measured rather than justify some particular outcome). Date: Sun, 17 Apr 2011 11:34:14 +0200 From: dorien.herrem...@ua.ac.be To: dieter.me...@menne-biomed.de CC: r-help@r-project.org Subject: Re: [R] Rsquared for anova Thanks for your remarks. I've been reading about R for the last two days, but I don't really get when I should use lm or aov. I have attached the dataset, feel free to take a look at it. So far, running it with alle the combinations did not take too long and there seem to be some effects between the parameters. However, 2x2 combinations might suffice. Thanks for any help, or a pointer to some good documentation, Dorien On 16 April 2011 10:13, Dieter Menne dieter.me...@menne-biomed.de wrote: dorien wrote: fit - lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length, data=expdata)) Error: unexpected ',' in fit - lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length, Peter's point is the important one: too many interactions, and even with + instead of * you might be running into problems. But anyway: if you don't let us access /home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP you cannot expect a better answer which will depend on the structure of the data set. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Rsquared-for-anova-tp3452399p3453719.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dorien Herremans *Department of Environment, Technology and Technology Management* Faculty of Applied
Re: [R] Identify period length of time series automatically?
Date: Thu, 14 Apr 2011 11:29:23 +0200 From: r.m.k...@gmail.com To: r-help@r-project.org Subject: [R] Identify period length of time series automatically? -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi I have 10.000 simulations for a sensitivity analysis. I have done a few sensitivity analysis for different response variables already, but now, as most of the simulations (if not all) show some cyclic behaviour, see how the independent input parameter influence the frequency of the cyclic changes and how cyclic they actually are. So effectively, I have 39 values, and I want to identify automatically the frequency / period length of the series and a kind of a measure on how cyclic the series is. Probably google Digital Signal Processing or Fourier transform. From this, you resolve your time series into sinusoids of various components and you can separate peaks in line spectra from background noise. Depending on what you consider to be cyclic the analysis details will vary. If you look at things like amplitude and frequncy modulation of one sine wave with another and various relationships between carrier and modulation frequency, you can get some ideas of what to look for in spectra. Alternatively, you can try to define exactly what you mean by cyclic and maybe make a better transform that discriminates that from acyclic but offhand I would suggest FFT and various tests on the spectra. Just off hand I'm not sure that 39 points would be a lot to go on but you can simulate some examples in R quite easily if you know what the data looks like in various cases you think may exist. How can I do that automatically without individual checking? I do not want to do an eyeball assessment for 10.000 time series Thanks, Rainer - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incremental ReadLines
Date: Wed, 13 Apr 2011 10:57:58 -0700 From: frederikl...@gmail.com To: r-help@r-project.org Subject: Re: [R] Incremental ReadLines Hi there, I am having a similar problem with reading in a large text file with around 550.000 observations with each 10 to 100 lines of description. I am trying to parse it in R but I have troubles with the size of the file. It seems like it is slowing down dramatically at some point. I would be happy for any This probably occurs when you run out of physical memory but you can probably verify by looking at task manager. A readline() method wouldn't fit real well with R as you try to had blocks of data so that inner loops, implemented largely in native code, can operate efficiently. The thing you want is a data structure that can use disk more effectively and hide these details from you and algorightm. This works best if the algorithm works with data strcuture to avoid lots of disk thrashing. You coudl imagine that your read would do nothing until each item is needed but often people want the whole file validated before procesing, lots of details come up with exception handling as you get fancy here. Note of course that your parse output could be stored in a hash or something represnting a DOM and this could get arbitrarily large. Since it is designed for random access, this may cause lots of thrashing if partially on disk. Anything you can do to make access patterns more regular, for example sort your data, would help. suggestions. Here is my code, which works fine when I am doing a subsample of my dataset. #Defining datasource file - filename.txt #Creating placeholder for data and assigning column names data - data.frame(Id=NA) #Starting by case = 0 case - 0 #Opening a connection to data input - file(file, rt) #Going through cases repeat { line - readLines(input, n=1) if (length(line)==0) break if (length(grep(Id:,line)) != 0) { case - case + 1 ; data[case,] -NA split_line - strsplit(line,Id:) data[case,1] - as.numeric(split_line[[1]][2]) } } #Closing connection close(input) #Saving dataframe write.csv(data,'data.csv') Kind regards, Frederik -- View this message in context: http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identify period length of time series automatically?
Date: Thu, 14 Apr 2011 12:42:28 +0200 From: r.m.k...@gmail.com To: marchy...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] Identify period length of time series automatically? -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/04/11 11:57, Mike Marchywka wrote: Date: Thu, 14 Apr 2011 11:29:23 +0200 From: r.m.k...@gmail.com To: r-help@r-project.org Subject: [R] Identify period length of time series automatically? -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi I have 10.000 simulations for a sensitivity analysis. I have done a few sensitivity analysis for different response variables already, but now, as most of the simulations (if not all) show some cyclic behaviour, see how the independent input parameter influence the frequency of the cyclic changes and how cyclic they actually are. So effectively, I have 39 values, and I want to identify automatically the frequency / period length of the series and a kind of a measure on how cyclic the series is. Hi Mike, thanks for your answer - it confirms my fears ... Well, if it is just that you wanted an alt to FFT I have not provided one so it shouldn't confirm that fear as others could exist. Indeed, if you have an idea of what you want you may be able to create your own. It wasn't clear in OP that you had looked at FFT. Probably google Digital Signal Processing or Fourier transform. From this, you resolve your time series into sinusoids of various components and you can separate peaks in line spectra from background noise. Depending on what you consider to be cyclic the analysis details will vary. If you look at things like amplitude and frequncy modulation of one sine wave with another and various relationships between carrier and modulation frequency, you can get some ideas of what to look for in spectra. That is what I thought as well. As I have no idea about fourier analysis, could you give me a small example in R, which gives me the frequencies of the resulting sin waves after a fourier transformation? I only see large matrices as return values when using e.g. fft(). Alternatively, you can try to define exactly what you mean by cyclic and maybe make a better transform that discriminates that from acyclic but offhand I would suggest FFT and various tests on the spectra. the shape of the fluctuations can be quite different - so no common pattern there. Just off hand I'm not sure that 39 points would be a lot to go on but you can simulate some examples in R quite easily if you know what the data looks like in various cases you think may exist. Well - the data is over a year summed up data from daily data points, so I could easily go to daily data, which would be 365*39. But that would make the analysis probably more difficult, as I have seasonal fluctuations, and fluctuations over several years (1, 2, 3, 4, ...?; depending on the parameters used for the simulation). plot(abs(fft(x)) as presumably your confusion is due to complex result and spectrum is probably all you care about at first. If that works, there are various R facilities for peak detection that may be of use. There may be programs to fit FFT assuming discrete + contiuum noise but I have not looked, there are apparently things like that for fitting mass spectra. Any ideas on how to do this in R? I have the feeling, that the quesion id more difficult then I thought... Rainer How can I do that automatically without individual checking? I do not want to do an eyeball assessment for 10.000 time series Thanks, Rainer - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2mz5QACgkQoYgNqgF2egqZ8QCfZrtSmYczWo+Gq9NgY25mtP5Q LHwAn3qaWKoo2wkc4pjTe9skZhcW7UL+ =4uTI -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incremental ReadLines
Date: Thu, 14 Apr 2011 11:57:40 -0400 Subject: Re: [R] Incremental ReadLines From: frederikl...@gmail.com To: marchy...@hotmail.com CC: r-help@r-project.org Hi Mike, Thanks for your comment. I must admit that I am very new to R and although it sounds interesting what you write I have no idea of where to start. Can you give some functions or examples where I can see how it can be done. I'm not sure I have a good R answer, simply pointing out the likley isuse and maybe the rest belongs on r-develoiper list or something. If you can determine you are running out of physical memory, then you either need to partitition somehting or make accesses more regular. My favorite example from personal experience is sorting a data set prior to piping into a c++ program that changed the execution time substantially by avoiding VM thrashing. R either needs a swapping buffer or has an equivalent that someone else could mention. I was under the impression that I had to do a loop since my blocks of observations are of varying length. Thanks again, Frederik On Thu, Apr 14, 2011 at 6:19 AM, Mike Marchywka wrote: Date: Wed, 13 Apr 2011 10:57:58 -0700 From: frederikl...@gmail.com To: r-help@r-project.org Subject: Re: [R] Incremental ReadLines Hi there, I am having a similar problem with reading in a large text file with around 550.000 observations with each 10 to 100 lines of description. I am trying to parse it in R but I have troubles with the size of the file. It seems like it is slowing down dramatically at some point. I would be happy for any This probably occurs when you run out of physical memory but you can probably verify by looking at task manager. A readline() method wouldn't fit real well with R as you try to had blocks of data so that inner loops, implemented largely in native code, can operate efficiently. The thing you want is a data structure that can use disk more effectively and hide these details from you and algorightm. This works best if the algorithm works with data strcuture to avoid lots of disk thrashing. You coudl imagine that your read would do nothing until each item is needed but often people want the whole file validated before procesing, lots of details come up with exception handling as you get fancy here. Note of course that your parse output could be stored in a hash or something represnting a DOM and this could get arbitrarily large. Since it is designed for random access, this may cause lots of thrashing if partially on disk. Anything you can do to make access patterns more regular, for example sort your data, would help. suggestions. Here is my code, which works fine when I am doing a subsample of my dataset. #Defining datasource file - filename.txt #Creating placeholder for data and assigning column names data - data.frame(Id=NA) #Starting by case = 0 case - 0 #Opening a connection to data input - file(file, rt) #Going through cases repeat { line - readLines(input, n=1) if (length(line)==0) break if (length(grep(Id:,line)) != 0) { case - case + 1 ; data[case,] -NA split_line - strsplit(line,Id:) data[case,1] - as.numeric(split_line[[1]][2]) } } #Closing connection close(input) #Saving dataframe write.csv(data,'data.csv') Kind regards, Frederik -- View this message in context: http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting edgelist to symmetric matrix/ plotting sparse network with lots of nodes
Date: Sat, 9 Apr 2011 14:34:28 -0700 From: kmshafi...@yahoo.com To: r-help@r-project.org Subject: [R] Converting edgelist to symmetric matrix Hi, I have network data in the form of a couple of edgelists containing weights in the format x,y,weight whereby x represents row header and y represents column header. All edgelists are based on links among 634 nodes and I need to convert them into a 634*634 weighted matrix. I searched for online help using possible keywords I know, but I could not find a clue how to do this in R. Any help will be appreciated. I'd replied earlier suggesting the ncol format, I'd like to follow up on that as I have tried with some success but maybe someone else can comment on alternatives and suggest ideas for plotting. I have a set of nodes or states specified by two parameters ( these are isotopes specified by proton and mass connected by decay paths with probability of that path being its weight). This seems to almost work for your needs( note that I have taken out a lot of extraneous stuff and may have dropped somethimng important LOL, also setup for Rgraphviz is not simple on 'dohs as i had to manually edit env variable etc) , library(Rgraphviz) library(QuACN) nxg-read.graph(ncol.txt,format=ncol) nn-igraph.to.graphNEL(nxg) aasd-adjacencyMatrix(nn) str((aasd)) num [1:2561, 1:2561] 0 100 95.8 2.7 0 0 0 0 0 0 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:2561] 17_10 17_9 16_9 15_7 ... ..$ : chr [1:2561] 17_10 17_9 16_9 15_7 ... $ head ncol.txt 17_10 17_9 100 17_10 16_9 95.8 17_10 15_7 2.7 18_10 18_9 100 19_10 19_9 100 23_10 23_11 100 243_100 239_98 100 245_100 241_98 100 246_100 242_98 92 246_100 246_99 1 However, for my needs plotting has been a big problem. I apparently have 2561 isotopes ( none of this has been validated yet LOL) that are sparely connected by a few decay modes ( presumably acyclic directed graph but DAG in searches didn't help much ). Any thoughts on which R classes to try to visualize this or even what I should be thinking about artistically? This is largely just a way to learn R for some other things I want to do for analyzing data on wireless devices but I am curious about this result too. Some of the things I did try are below, library(Rgraphviz) library(QuACN) nxg-read.graph(ncol.txt,format=ncol) foo-adjacencyMatrix(nxg) ?graphNEL ?NELgraph df-data.frame(nxg) plot.igraph(nxg,layout=layout.svd) rglplot.igraph(nxg,layout=layout.svd) rglplot.igraph(nxg,layout=layout.svd) rglplot.igraph(nxg) tkplot.igraph(nxg) library(tcltk) tkplot.igraph(nxg) tkplot(nxg) dx-decompose.graph(nxg) nn-igraph.to.graphNEL(nxg) igraph.plotting(nxg) library(sna) gplot(nxg) dx-get.adjacency(nxg) gplot(dx) gplot3d(dx) plot(nxg) library(ElectroGraph) eg-electrograph(nxg) eg-electrograph(aasd) plot(eg) Thanks. Best regards, Shafique __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting edgelist to symmetric matrix
Date: Sat, 9 Apr 2011 14:34:28 -0700 From: kmshafi...@yahoo.com To: r-help@r-project.org Subject: [R] Converting edgelist to symmetric matrix Hi, I have network data in the form of a couple of edgelists containing weights in the format x,y,weight whereby x represents row header and y represents column header. All edgelists are based on links among 634 nodes and I need to convert them into a 634*634 weighted matrix. not find a clue how to do this in R. Any help will be appreciated. I'm trying to do something related and found ?read.graph will format=ncol do what you need? This apparently creates a graph object that likely has capacilities you need. Again, I haven't actually used any of this just found while trying to solve a different problem. 'It is a simple text file with one edge per line. An edge is defined by two symbolic vertex names separated by whitespace. (The symbolic vertex names themselves cannot contain whitespace. They might followed by an optional number, this will be the weight of the edge; the number can be negative and can be in scientific notation. If there is no weight specified to an edge it is assumed to be zero. ' Best regards, Shafique __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scrap java scripts and styles from an html document
Date: Thu, 7 Apr 2011 04:15:50 -0700 From: antuj...@gmail.com To: r-help@r-project.org Subject: Re: [R] Scrap java scripts and styles from an html document Hi , I am working on developing a web crawler. Comments like this come up on the list every few weeks or so and I keep suggesting that someone ( other than me of course LOL) investigates an R interface to webkit for any efforts that require mimic of large parts of a browser function. Perhaps just make a debug build or custom build of webkit to dump whatever it is you want into a structured text file ( I've actually done this for what would amount to a crawler, I modified maybe one or two classes to output the links being fetched to stdout but I think there are ways to dump a DOM or other stuff in a format usable by R). For valid pages, you can just parse html as xml and get what you want in this case but usually people are looking for information only apparent after large pieces of js are executed. If you want comments only, these may be easy to isolate yourself.If you google CRAN HTML parser some hits do come up, for example http://cran.r-project.org/web/packages/scrapeR/scrapeR.pdf http://r.789695.n4.nabble.com/How-to-import-HTML-and-SQL-files-td879480.html Removing javascripts and styles is a part of the cleaning of the html document. What I want is a cleaned html document with only the html tags and textual information, so that i can figure out the pattern of the web page. This is being done to extract relevant information from the webpage like comments for a particular product. For e.g the amazon.com has all such comments within the and tags, with regular occuring for breaks. So tags which appear the most help us in locating the required information. Different websites have different patterns, but its more likely that tags that will occur the most will have the relevant information enclosed in them. So, once the html page is cleaned, it would be easy to role up the tags and knowing their frequency of occurrence, we can target the information. Should there be any suggestions to help, please let me know. I would be more than pleased. Regards, Antuj -- View this message in context: http://r.789695.n4.nabble.com/Scrap-java-scripts-and-styles-from-an-html-document-tp3413894p3433052.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] system() command in R
Date: Tue, 5 Apr 2011 13:37:12 +0530 From: nandan.a...@gmail.com To: rasanpreet.k...@gmail.com CC: r-help@r-project.org Subject: Re: [R] system() command in R On 4 April 2011 16:54, rasanpreet kaur suri wrote: Hi all, I have a local server insalled on my system and have to start that from within my R function. here is how I start it: cmd-sh start-server.sh system(cmd, wait=FALSE) My function has to start the server and proceed with further steps. The server starts but the further steps of the program are not executed.The cursor keeps waiting after the server is started. How r u executing further steps after starting server, meant for server from R ?? i tried removing the wait=FALSE, but it still keeps waiting. I also tried putting the start-server in a separate function and my further script in a separate function and then run them together, but it still waits. The transition from the start of server to next step is not happening. Please help. I have been stuck on this for quite some time now. -- I hadn't done this in R but expect to do so soon. I just got done with some java code to do something similar and you can expect in any implementation these things will be system dependent. It often helps to have simple test cases to isolate the problem. Here I made a tst script called foo that takes a minute or so to exevute and generates some output. If I type system(./foo,wait=F) the prompt comes back right away but stdout seems to still go to my console and maybe stdin is not redicrected either and it could eat your input ( no idea, but this is probably not what you want). I did try this that could fix your problem, on debian anyway it seems to work, system(nohup ./foo ) you can man nohup for details. Rasanpreet Kaur [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Amar Kumar Nandan Karnataka, India, 560100 http://aknandan.co.nr [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series example in Koop
Date: Tue, 5 Apr 2011 07:35:04 -0500 From: ravi.k...@gmail.com To: r-help@r-project.org Subject: [R] Time series example in Koop I am trying to reproduce the output of a time series example in Koop's book Analysis of Financial Data. Koop does the example in Excel and I used the ts function followed by the lm function. I am unable to get the exact coefficients that Koop gives - my coefficients are slightly different. After loading the data file and attaching the frame, my code reads: y = ts(m.cap) x = ts(oil.price) d = ts.union(y,x,x1=lag(x,-1),x2=lag(x,-2),x3=lag(x,-3),x4=lag(x,-4)) mod1 = lm(y~x+x1+x2+x3+x4, data=d) summary(mod1) Koop gives an intercept of 92001.51, while the code above gives 91173.32. The other coefficients are also slightly off. The differences here seem to be of order 1 percent. You could suspect a number of things, including the published data file being published to less precision than that used in the book numbers(also look at number of points and see if any were added or dropped etc ). However, you may want to judge these based on what they do to your error which they presumably are both supposed to minimize but the calculation of which could be subject to various roundoff errors etc. Unless minimization is done analytically, it is of course subject to limitations of convergence or iteration count. Plotting both fits over the data and looking at residuals may help too. Depending on what you are really trying to do, you may want to change your error calculation etc. Details of numerical results often depend on details of implementation. This is why stats packages that are not open source have limitations in applicability. With real models of course things get even more confusing. ( take a look at credit rating agencies results for example LOL). This is the example in Table 8.3 of Koop. I also attach a plain text version of the tab separated file badnews.txt. http://r.789695.n4.nabble.com/file/n3427897/badnews.txt badnews.txt Any light on why I do not get Koop's coefficients is most welcome... Ravi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help
Date: Sun, 3 Apr 2011 01:35:16 +0530 From: nandan.a...@gmail.com To: padmanabhan.vija...@gmail.com CC: r-help@r-project.org Subject: Re: [R] help One way that u might have thought of is to create plot in PDF in R and the use pdftools. Additionally one can also think of running R script using R CMD and then using pdftools in a .sh script file if u r in linux. I am not aware of pdftools capability in R. On 2 April 2011 23:01, Vijayan Padmanabhan wrote: Dear R Help group I need to run a command line script from within R session. I am not clear how i can acheive this. I tried shell and system function, but i am missing something critical.can someone provide help? My intention is to create a pdf file of a plot in R and then attach existing files from my system as attachment into the newly created pdf file. Any help would be greatly appreciated.. Here is the command line script i want to execute from within R. pdftools -S attachfiles=C:\test1.pdf -i C:\test2.pdf -o C:\test4.pdf Regards Vijayan Padmanabhan I just tried system(pdftk --help) and it appeared to work as I have pdftk from cygwin.I routinely do this the other way however and invoke R from a bash script and then use external tools like this from the bash script after R is done. If I'm generating various pieces, it seems to make sense to get them all first and release any resources R has accumulated as pdf manipulation itself can often require lots of memory etc. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Asking Favor For the Script of Median Filter
( sorry if this is a duplicate, I am not sure if hotmail is dropping some of my posts. Thanks ) You obviously want to delegate inner loops to R packages that execute as native, hopefully optimized, code. Generally a google search that starts with R CRAN will help. In this case it looks like a few packages available, http://www.google.com/search?sclient=psyhl=enq=R+cran+median+filter Date: Sun, 27 Mar 2011 07:56:11 -0700 From: chuan...@hotmail.com To: r-help@r-project.org Subject: [R] Asking Favor For the Script of Median Filter Hello,everybody. My name is Chuan Zun Liang. I come from Malaysia. I am just a beginner for R. Kindly to ask favor about median filter. The problem I facing as below: x-matrix(sample(1:30,25),5,5) x [,1] [,2] [,3] [,4] [,5] [1,] 7 8 30 29 13 [2,] 4 6 12 5 9 [3,] 25 3 22 14 24 [4,] 2 15 26 23 19 [5,] 28 18 10 11 20 This is example original matrices of an image. I want apply with median filter with window size 3X# to remove salt and pepper noise in my matric. Here are the script I attend to writing.The script and output shown as below: MedFilter-function(mat,sz) + {out-matrix(0,nrow(mat),ncol(mat)) + for(p in 1:(nrow(mat)-(sz-1))) + {for(q in 1:(ncol(mat)-(sz-1))) + {outrow-median(as.vector(mat[p:(p+(sz-1)),q:(q+(sz-1))])) + out[(p+p+(sz-1))/2,(q+q+(sz-1))/2]-outrow}} + out} MedFilter(x,3) [,1] [,2] [,3] [,4] [,5] [1,] 0 0 0 0 0 [2,] 0 8 12 14 0 [3,] 0 12 14 19 0 [4,] 0 18 15 20 0 [5,] 0 0 0 0 0 Example to getting value 8 and 12 as below: 7 8 30 8 30 29 4 6 12 (median=8) 6 12 5 (median=12) 25 3 22 3 22 14 Even the script can give output. However, it is too slow. My image size is 364*364. It is time consumption. Is it get other ways to improving it? Best Wishes Chuan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Asking Favor For the Script of Median Filter
CC: chuan...@hotmail.com; r-help@r-project.org From: dwinsem...@comcast.net To: marchy...@hotmail.com Subject: Re: [R] Asking Favor For the Script of Median Filter Date: Sun, 27 Mar 2011 18:30:48 -0400 On Mar 27, 2011, at 1:07 PM, Mike Marchywka wrote: You obviously want to delegate inner loops to R packages that execute as native, hopefully optimized, code. Generally a google search that starts with R CRAN will help. In this case it looks like a few packages available, http://www.google.com/search?sclient=psyhl=enq=R+cran+median+filter Did you find any that include a 2D median filter? All the ones I looked at were for univariate data. I put almost zero thought or effort into that but an interested party could modify the words a bit and and, for example image and one of the first interesting hits is this, http://cran.r-project.org/web/packages/biOps/biOps.pdf x - readJpeg(system.file(samples, violet.jpg, package=biOps)) y - imgBlockMedianFilter(x, 5) -- David. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated
Date: Fri, 25 Mar 2011 09:40:39 + From: all...@cybaea.com To: muenchen@gmail.com CC: frien...@yorku.ca; had...@rice.edu; r-h...@stat.math.ethz.ch Subject: Re: [R] Popularity of R, SAS, SPSS, Stata, Statistica, S-PLUS updated Not R, but just to get the data (format is month year,week,count) to compare with your students' output: [...] Hope this helps a little. Allan (Who thinks it is very sad that he can remember that $c=()=$a=~$b construct...) On 22/03/11 23:26, Bob Muenchen wrote: On 3/22/2011 5:15 PM, Hadley Wickham wrote: I don't doubt that R may be the most popular in terms of discussion group traffic, but you should be aware that the traffic for SAS comprises two separate lists that used to be mirrored, but are no longer linked I think this discussion highlights the need for more structured document formats on the internet so you can separete out mirrored or copied text( see for example all the ad supported sites that simply copy wikipedia content). I've sometimes done things like this with pubmed citations but they provide something called eutils api http://eutils.ncbi.nlm.nih.gov/ so you don't need to scrape html or other human readable content. It is then easy to plot paper or author count as function of year for some key word criteria and it can be interesting to see how fads come and go. You see a lot of questions here on how do I use R to scrape html and it is a big problem in doing many kinds of analysis. Yahoo is doing a nice service by making downloads of histoical data available but this is not common, even places like census often only offer Excel format downloads ( while this is fine for R users, csv files would be just as good and reach a wider audience ) and some places do require you to make complicated POST or other request types. Usenet -- news://comp.soft-sys.sas (what you counted) listserve -- SAS-L http://www.listserv.uga.edu/archives/sas-l.html R programming challenge: create a script that parses those html pages to compute the total number of messages per week! (Maybe I'll use this in class) Hadley That would be nice! I'd love to have all the sources, which includes various company forums. Sounds like students could be kept busy for [[elided Hotmail spam]] Cheers, Bob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Snowball RWeka
Date: Thu, 24 Mar 2011 03:35:31 -0700 From: kont...@alexanderbachmann.de To: r-help@r-project.org Subject: [R] Problem with Snowball RWeka Dear Forum, when I try to use SnowballStemmer() I get the following error message: Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a hint where to look for, it would be very much appreciated. Thank you very much. If you only want answers from people who have encountered this exact problem before then that's great but you are more likely to get a useful response if you include reproducible code and some data to produce the error you have seen. Sometimes I investigate these things because they involve a package or objective I wanted to look at anyway. It could be that the only problem is that the OP missed something in documentation or had typo etc. In this case, to pursue it from the perspective of debuggin the code, you probably want to find some way to get a stack trace and then find out which java variable was null and relate it back to how you invoked it. This likely points to a missing object in your call or maybe the installation lacked a dependency as this occured during init, but hard to speculate with what you have provided. You could try reinstalling and check for errors. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rapache ( was Developing a web crawler )
Subject: Re: [R] Rapache ( was Developing a web crawler ) From: m...@biostatmatt.com To: marchy...@hotmail.com CC: r-help@r-project.org Date: Sun, 6 Mar 2011 13:51:53 -0500 On Sun, 2011-03-06 at 08:06 -0500, Mike Marchywka wrote: Date: Thu, 3 Mar 2011 13:04:11 -0600 From: matt.shotw...@vanderbilt.edu To: r-help@r-project.org Subject: Re: [R] Developing a web crawler / R webkit or something similar? [off topic] On 03/03/2011 08:07 AM, Mike Marchywka wrote: Date: Thu, 3 Mar 2011 01:22:44 -0800 From: antuj...@gmail.com To: r-help@r-project.org Hi Mike, If you've built and configured RApache, then the difficult plowing is over :). RApache operates at the top (HTTP) layer of the OSI stack, whereas Rserve works at the lower transport/network layer. Hence, the scope of Rserve applications is far more general. Extending Rserve to operate at the HTTP layer (via PHP) will mean more work. I finally got back to this and started from scratch on a clean machine. It took most of the day, on and off, but downloading and building R, apache, and rapache was relatively easy and info page worked but I had to go get Cairo and various packages to get graphic demo pages to work. I'll probably have to play with it a bit to see if I can use it for anything useful but getting it to run was not too difficult ( I think before I didn't bother to build Apache from source and the failure mode wasnt real clear ) . RApache offers high level functionality, for example, to replace PHP with R in web pages. No interface code is necessary. Here's a simple What's The Time? webpage using RApache and yarr [1] to handle the code: setContentType(text/html\n\n) Message body Here's a live version: [2]. Interfacing PHP with Rserve in this context would be useful if installation of R and/or RApache on the web host were prohibited. A PHP/Rserve framework might also be useful in other contexts, for example, to extend PHP applications (e.g. WordPress, MediaWiki). Best, Matt [1] http://biostatmatt.com/archives/1000 [2] http://biostatmatt.com/yarr/time.yarr -Matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?
Date: Tue, 22 Mar 2011 09:31:01 -0700 From: crossp...@hotmail.com To: r-help@r-project.org Subject: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense? Dear all, I want to improve my adj - R sq. I 've chequed some established models and they introduce two times the same variable, one transformed, and the other not. It also improves my adj - R sq. But, isn't this bad for the collinearity? Do I interpret coefficients as usual? I'm not sure how many replies you got or if your question was answered but just offhand let me see if I understand your concern. If your data is only over a limited range of v1 where you can Taylor expand to linear term only then sure it can be hard to tell a linear from log dependence of quantify a mixture of the two. If you try to find a and b to fit y=a*f(x) + b*g(x) that minimizes some error, you should be able to see the issues on paper. Presumaly log is not linear over a larger range and any error function, like SSE, would have reasonbly peaked minimum for some values of the two coefficients but you could do a sensitivty analysis to check- find the second derivatives of your error function or just perturb the coefficients a bit. I guess if there is some direction where the error does not change as a and b vary then you have the case you are worried about. I'm not sure what you consider to be usual but when I'm doing something like this, I usually have some physical interpretation mind. Most uninfomratively, you could interpret these coefficients as those which minimize your error given the data you have :) What you do from there depends on a lot of specifics. To tell if a given function seems to be appropriate for the data, it is always good to look at a plot of residuals. Note that ability to find a unique set of coefficients that minimizes a given error has nothing to do with independence of the two terms attached to the coefficients- indeed polynomial fits are a common example( log having a taylor series just constrains a lot of coefficient relationships LOL). P-values and confidence intervals are another matter with post hoc exploratory work but I'll let a statistician comment on that as well as the meaning of the R output. Usually the final decision on a putative model impovement comes from your ability to infer something about the underlying system although you may just want a simple empirical approximation and be more worried about meeting a given error with a limited number of computations etc etc. Apparently you found on a retrospective literature search that everyone else is using the log term. Sometimes you see people ask questions like, given that in 10 papers on the subject 4 of them used the log term and these authors have historically been right 50 percent of the time but the other 6 are right 40 percent of the time, what are the chances that the log term should be included? I will also avoid commenting on this question except to say it illustrates a number of ways people do approach these problems and what you consider to be relevant to your situation. Estimate Std. Error t value Pr(|t|) (Intercept) 1.73140 7.22477 0.240 0.81086 v1 -0.33886 0.20321 -1.668 0.09705 . log(v1) 2.63194 3.74556 0.703 0.48311 v2 -0.01517 0.01089 -1.394 0.16507 log(v3) -0.45719 0.27656 -1.653 0.09995 . factor1 -1.81517 0.62155 -2.920 0.00392 ** factor2 -1.87330 0.84375 -2.220 0.02759 * Analysis of Variance Table Response: height rise Df Sum Sq Mean Sq F value Pr(F) v1 1 51.25 51.246 21.4128 6.842e-06 *** log(v1) 1 13.62 13.617 5.6897 0.018048 * v2 1 2.84 2.836 1.1850 0.277713 log(v3) 1 3.02 3.024 1.2638 0.262357 factor1 1 17.62 17.616 7.3608 0.007279 ** factor2 1 11.80 11.797 4.9294 0.027586 * Residuals 190 454.71 2.393 Thanks, u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3396935.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to draw a map of Europe?
Date: Sat, 19 Mar 2011 19:32:43 -0500 From: shali...@gmail.com To: r-help@r-project.org Subject: [R] How to draw a map of Europe? Hi R users, I need to draw a map of select European countries with country names shown on the map. Does anyone know how to do this in R? Also, is it possible to draw a historical map of European countries using R? This came up on this list a while ago and I was just using that example for some work I'm doing but I have only copied what was posted and added minor things to it. I can't give you actual answer but you are probably just a few key words away from a google searhc. The term you are probably looking for is shapefile. Try that on google. For example, http://www.google.com/#sclient=psyhl=enq=shapefile+cran+europe turns up things that may help. Thanks a lot for your help. Maomao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make sure R's g++ compiler uses certain C++ flags when making a package
Date: Wed, 16 Mar 2011 13:50:37 -0700 From: solomon.mess...@gmail.com To: r-help@r-project.org Subject: Re: [R] How to make sure R's g++ compiler uses certain C++ flags when making a package Looks like the problem may be that R is automatically passing this flag to the g++ compiler: -arch x86_64 which appears to be causing trouble for opencv. Does anyone know how to suppress this flag? Are you builing a 32 or 64 bit app? I haven't looked but offhand that may be what this flag is for ( you can man g++ and check I suppose ) I was going to reply after doing some checking as I have never actually made a complete R package but I did make a test one and verified I could call the c++ code from R. I seem to remember that it was easy to just invoke g++ explicitly and R would use whatever binaries you happen to have left there. That is, if you just compile the code for your R modules yourself there is no real reason to worry about how R does it. Mixing some flags will prevent linking or, worse, create mysterious run time errors. Also, IIRC, your parameters were the output of foreign scripts ( opencv apparently) and not too helpful here ( and in any case you should echo these to see what they are if they are an issue). -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatial cluster analysis of continous outcome variable
Did you post your data or hypothetical data? Usually that helps make your problem more clear and more interesting ( likely to get a useful response to your post). From: tintin...@hotmail.com To: r-help@r-project.org Date: Thu, 17 Mar 2011 17:38:14 +0100 Subject: [R] Spatial cluster analysis of continous outcome variable Dear R Users, R Core Team, I have a two dimensional space where I measure a numerical value in two situations at different points. I have measured the change and I would like to test if there are areas in this 2D-space where there is a different amount of change (no change, increase, decrease). I don´t know if it´s better to analyse the data just with the coordinates or if its better to group them in pixels (and obtain the mean value for each pixel) and then run the cluster analysis. I would like to know if there is a package/function that allows me to do these calculations.I would also like to know if it could be done in a 3D space (I have collapsed the data to 2D because I don´t have many points. Thanks in advance J Toledo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] minimum distance between line segments
Date: Wed, 9 Mar 2011 10:55:46 +1300 From: darcy.web...@gmail.com To: r-help@r-project.org Subject: [R] minimum distance between line segments Dear R helpers, I think that this may be a bit of a math question as the more I consider it, the harder it seems. I am trying to come up with a way to work out the minimum distance between line segments. For instance, consider 20 random line segments: x1 - runif(20) y1 - runif(20) x2 - runif(20) y2 - runif(20) plot(x1, y1, type = n) segments(x1, y1, x2, y2) Inititally I thought the solution to this problem was to work out the distance between midpoints (it quickly became apparent that this is totally wrong when looking at the plot). So, I thought that perhaps finding the minimum distance between each of the lines endpoints AND their midpoints would be a good proxy for this, so I set up a loop that uses pythagoras to work out these 9 distances and find the minimum. But, this solution is obviously flawed as well (sometimes lines actually intersect, sometimes the minimum distances are less etc). Any help/dection on this one would be much appreciated. I wasn't too happy before since I thought there was a good and simple way to approach this and no one came up with it, including me. On further thought, I seem to recall that a vector approach should be quite general without special cases. For example, describe the segments as R=a*n^ + B where n^ is a unit vector in the direction of the line, B an initial point, a a scalar describing single coordinate along the line, and R the point on line corresponding to a. This should avoid issues with infinite slopes and other issues. You should be able to derive, for each segment, a set of n^, B, amin, and amax . calculating d=|R1-R2 | as a vector expression should be trivial and then you can minimize and constrain a. Presumably d as a function of a1 and a2 will allow you to verify you can limit to amin and amax and let you see how to generalize to 3D etc. Since the segment is described originally by 4 numbers and this representation uses 6, you have some freedom in how to do this. The first thought is taking the initial point as B and then amin is zero and amax could be 1 if you don't actually bother to normalize n^ and just take it as (dx,dy). And again in 2D nonparallel lines intersect so the distance between them is likely to be zero until you limit their extents. I haven't done this in so long I forgot about vectors but I hate special cases when there is a more concise and general way to approach a prblem :) As always with free advice on the internet, caveat emptor, but I'd be curious to see if anyone knows better. I think someone else suggested checking if they interesct first but again this is motivated by an attempt to avoid special cases and get a better understanding of what is going on. Thanks in advance, Darcy. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] minimum distance between line segments
To: r-h...@stat.math.ethz.ch From: hwborch...@googlemail.com Date: Wed, 9 Mar 2011 17:45:53 + Subject: Re: [R] minimum distance between line segments Darcy Webber gmail.com writes: Dear R helpers, I think that this may be a bit of a math question as the more I consider it, the harder it seems. I am trying to come up with a way to work out the minimum distance between line segments. For instance, consider 20 random line segments: x1 - runif(20) y1 - runif(20) x2 - runif(20) y2 - runif(20) plot(x1, y1, type = n) segments(x1, y1, x2, y2) Inititally I thought the solution to this problem was to work out the distance between midpoints (it quickly became apparent that this is totally wrong when looking at the plot). So, I thought that perhaps finding the minimum distance between each of the lines endpoints AND their midpoints would be a good proxy for this, so I set up a loop that uses pythagoras to work out these 9 distances and find the minimum. But, this solution is obviously flawed as well (sometimes lines actually intersect, sometimes the minimum distances are less etc). Any help/dection on this one would be much appreciated. Thanks in advance, Darcy. A correct approach could proceed as follows: (1) Find out whether two segments intersect (e.g., compute the intersection point of the extended lines and compare its x-coords with those of the segment endpoints); if they do, the distance is 0, otherwise set it to Inf. (2) For each endpoint, compute the intersection point of the perpendicular line through this point with the other segment line; if this point lies on the other segment, take the distance, otherwise compute the distance to the other two endpoints. (3) The minimum of all those distances is what you seek. I have done a fast implementation, but the code is so crude that I would not like to post it here. If you are really in need I could send it to you. LOL, I sent a private reply suggesting essentially the opposite approach since I discovered pmax and pmin. That is, parameterize the location along lines ofinifinte length and minimize the distance WRT the two locations ( one for each line). You can do this by hand, find them to be perp or probably find an R routine to minimze the distnace. Then, with your array of positions along the lines, limit them with pmin or pmax. Infinite lines always cross unless parallel, so you will probably do a lot of clipping, but stuff like that would probably become apparent as you work through it. If things fail or you want to extend to 3D, you have some starting point for improvement. This is probably a common issue in some fields, like graphics, thought there may be something packaged but no idea. --Hans Werner (I am not aware of a pure plane geometry package in R --- is there any?) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.