Re: [R] subsetting from a vector or matrix
both the following will probably do the trick. ?subset ?[ Basically on the second one, you want to come down to something that looks like x[L] where x is a matrix/vector, and L is a logical vector that has the same dimension as x, but is TRUE on the values of x that you want to select. for instance x - rnorm(10) L - x 3 x[L] will return all values of x that are greater than 3. or you can just do x[x3] On Sep 25, 9:45 am, Jim Bouldin jrboul...@ucdavis.edu wrote: I realize this should be simple but I'm having trouble subsetting vectors and matrices, for example extracting all values meeting a certain criterion, from a vector. Cannot seem to figure out the correct syntax and help page not very helpful. Or should I be using some other function than subset. Thanks for any help. Jim Bouldin __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphics mailing list?
OK, it makes sense. Let's try that. Best, baptiste 2009/9/25 Paul Murrell p.murr...@auckland.ac.nz: Hi baptiste.auguie wrote: (Sorry about the double post earlier, googlemail is having hiccups today) 2009/9/24 Romain Francois romain.franc...@dbmail.com: Why just grid ? why not a list for all kind of graphics ? I figured that a good share of the traffic on r-help might be considered graphics-related, while I was aiming at discussing less documented areas. But I agree that the distinction shouldn't be made on a particular package or system. I'm not wildly keen on a SIG (it could only mean fewer eyes seeing the discussion). I think r-devel should serve quite well for these discussions, at least until people start complaining that it is being overrun with graphics questions ... Paul Best, baptiste On 09/24/2009 04:34 PM, baptiste.auguie wrote: Dear all, Would it make sense to have a separate mailing list (special interest group*) for Grid graphics? (or is there one already?) I don't feel comfortable asking questions about the design of new a new grid class in R-help where I'm guessing most people won't be interested. Of course having yet another mailing list would only make sense if it's to be followed by those people who work with Grid (lattice, vcd, ggplot2, latticeExtra, Rgraphics, etc.). Having read a bit of code from these packages recently, I get the feeling that several people may have been facing similar problems or reinventing the same things. Just a thought, Best regards, baptiste *: http://www.r-project.org/mail.html -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc |- http://tr.im/yw8E : New R package : sos `- http://tr.im/y8y0 : search the graph gallery from R __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 p...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] superimposing xyplots on same scale
2009/9/25 Felix Andrews fe...@nfrac.org: Sorry, doubleYScale is not appropriate, since you specifically want a common y scale. I think Baptiste was suggesting to use layer(), rather than as.layer(): Truth be told, I wasn't quite sure what the initial request meant. I took it quite literally, as superimposing two existing xyplots. Clearly the other options that were given are much better. Best, baptiste __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] differing behaviour between xts (0.6-7) and zoo (1.5-8)
Folks, I have some weekly dataseries that I convert to monthly xts (with yearmon indices), and obtain the two following extracts: str(sig) An 'xts' object from Apr 1998 to Sep 1998 containing: Data: num [1:6, 1] 0.0083 0.2799 -0.2524 -0.0119 0.18 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr e1 Indexed by objects of class: [yearmon] TZ: GMT xts Attributes: NULL str(ret) An 'xts' object from Mar 1998 to Aug 1998 containing: Data: num [1:6, 1] -0.007829 0.006452 -0.000276 -0.000644 0.002572 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr twi.Close Indexed by objects of class: [yearmon] TZ: GMT xts Attributes: NULL I understand that mathematical objects on xts objects will be performed only on the datapoints with common indices, in this case Apr 1998 to Aug 1998. So I do: sig * ret Data: numeric(0) Index: NULL Which doesn't give me what I expect. However, if I do: as.zoo(sig) * as.zoo(ret) e1 Apr 1998 5.351189e-05 May 1998 -7.716467e-05 Jun 1998 1.624531e-04 Jul 1998 -3.055679e-05 Aug 1998 4.122321e-04 Which is as I expect. I took a look at the structure of the two objects: dput(sig) structure(c(0.00829354917358671, 0.279914830605598, -0.252440486192738, -0.0118822201758384, 0.179972233000564, -0.209066714293924), index = c(891388800, 893980800, 896659200, 899251200, 901929600, 904608000), .Dim = c(6L, 1L), .Dimnames = list(NULL, e1), class = c(xts, zoo), .indexTZ = structure(GMT, .Names = TZ), .indexCLASS = yearmon) dput(ret) structure(c(-0.00782945094736132, 0.00645222996118644, -0.000275671952124412, -0.000643530245146628, 0.00257163991836062, 0.00229053194651918 ), index = c(890784000, 893808000, 896227200, 898646400, 901670400, 904089600), .Dim = c(6L, 1L), .Dimnames = list(NULL, twi.Close), .indexCLASS = yearmon, .indexTZ = structure(GMT, .Names = TZ), class = c(xts, zoo)) So clearly the internal values of the supposedly overlapping parts of the indices are different, although they are both 'yearmon' and seem to represent the same months. If I do dput(as.zoo(ret)) structure(c(-0.00782945094736132, 0.00645222996118644, -0.000275671952124412, -0.000643530245146628, 0.00257163991836062, 0.00229053194651918 ), .Dim = c(6L, 1L), .Dimnames = list(NULL, twi.Close), index = structure(c(1998.167, 1998.25, 1998.333, 1998.417, 1998.5, 1998.583 ), class = yearmon), class = zoo) dput(as.zoo(sig)) structure(c(0.00829354917358671, 0.279914830605598, -0.252440486192738, -0.0118822201758384, 0.179972233000564, -0.209066714293924), .Dim = c(6L, 1L), .Dimnames = list(NULL, e1), index = structure(c(1998.25, 1998.333, 1998.417, 1998.5, 1998.583, 1998.667), class = yearmon), class = zoo) Now the indices have the expected overlaps. I'm not sure if this is a bug in xts? sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices datasets tcltk utils methods base other attached packages: [1] xts_0.6-7 zoo_1.5-8 svSocket_0.9-43 svMisc_0.9-48 TinnR_1.0.3 R2HTML_1.59-1 Hmisc_3.6-1 rcom_2.2-1 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] cluster_1.12.0 grid_2.9.2 lattice_0.17-25 tools_2.9.2 Please advise. Thanks, Murali __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] basic cubic spline smoothing
hm567 wrote: I am unsure about spar being the smoothness parameter, about where to put the standard errors of the points, and about the return of the smooth.spline function: Smoothing Parameter spar= 0.5 lambda= 0.006833112 best regards, Basically, the implementation based on the attached paper, for a standard error of points =1.0, the smoothing is too insensitive to the lambda smoothness parameter. From 1 to almost 0.01, there is almost no smoothing... Only from 0.01 to 0 does one start to see smoothing in action with the limit at 0 being a straight line. Note that this implementation's parameter is (1 - parameter) With R smooth.spline, 'spar' reflects well the smoothness in that: . at 0%, the spline interpolates . at 40% already, its shape is very different from the 0% one ( for my implementation, they are still same ) . at 90% it is almost a straight line . at 100% it is definitely a straight line This is the behavior that I wish to have. It seems I need to change my lambda with some transformation that is similar to the one in the doc of smooth.spline (spar to lambda). Perhaps the reverse one. But I can't see how to do it. The other question is the standard errors. What do they correspond to in the doc of smooth.spline? Regards, -- View this message in context: http://www.nabble.com/basic-cubic-spline-smoothing-tp25569553p25609558.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 2.10.0 is scheduled for October 26
This is to announce that we plan to release R version 2.10.0 on Monday, October 26, 2009. Release procedures start today. The detailed schedule can be found on http://developer.r-project.org The source tarballs will be made available daily (barring build troubles), starting September 28, and the tarballs can be picked up at http://cran.r-project.org/src/base-prerelease/ a little later. Binary builds are expected to appear soon thereafter. For the Core Team Peter Dalgaard -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 ___ r-annou...@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-announce __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Downloading data from from internet
Thank you so much for those helps. However I need little more help. In the site http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php; if I scroll below then there is an option Historical CPI Index For USA Next if I click on Get Data then another table pops-up, however without any significant change in address bar. This tables holds more data starting from 1999. Can you please help me how to get the values of this table? Thanks Duncan Temple Lang wrote: Thanks for explaining this, Charlie. Just for completeness and to make things a little easier, the XML package has a function named readHTMLTable() and you can call it with a URL and it will attempt to read all the tables in the page. tbls = readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php') yields a list with 10 elements, and the table of interest with the data is the 10th one. tbls[[10]] The function does the XPath voodoo and sapply() work for you and uses some heuristics. There are various controls one can specify and also various methods for working with sub-parts of the HTML document directly. D. cls59 wrote: Bogaso wrote: Hi all, I want to download data from those two different sources, directly into R : http://www.rateinflation.com/consumer-price-index/usa-cpi.php http://eaindustry.nic.in/asp2/list_d.asp First one is CPI of US and 2nd one is WPI of India. Can anyone please give any clue how to download them directly into R. I want to make them zoo object for further analysis. Thanks, The following site did not load for me: http://eaindustry.nic.in/asp2/list_d.asp But I was able to extract the table from the US CPI site using Duncan Temple Lang's XML package: library(XML) First, download the website into R: html.raw - readLines( 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' ) Then, convert to an HTML object using the XML package: html.data - htmlTreeParse( html.raw, asText = T, useInternalNodes = T ) A quick scan of the page source in the browser reveals that the table you want is encased in a div with a class of dynamicContent-- we will use a xpath specification[1] to retrieve all rows in that table: table.html - getNodeSet( html.data, '//d...@class=dynamicContent]/table/tr' ) Now, the data values can be extracted from the cells in the rows using a little sapply and xpathXpply voodoo: table.data - t( sapply( table.html, function( row ){ row.data - xpathSApply( row, './td', xmlValue ) return( row.data) })) Good luck! -Charlie [1]: http://www.w3schools.com/XPath/xpath_syntax.asp - Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Downloading-data-from-from-internet-tp25568930p25610171.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem on SUSE Linux Enterprise Server 10 (ia64)
Hi, You need to install the headers/libs for readline. Probably using your package manager, look for something like readline-devel. cheers, Paul Yuan Zhidong wrote: Dear Sir, When I install R on SUSE Linux Enterprise Server 10 (ia64) (Linux a450 2.6.16.21-0.8-default #1 SMP Mon Jul 3 18:25:39 UTC 2006 ia64 ia64 ia64 GNU/Linux) it reported the wrong messages at the end: # ./configure checking build system type... ia64-unknown-linux-gnu checking host system type... ia64-unknown-linux-gnu loading site script './config.site' loading build specific script './config.site'checking for pwd... /bin/pwd checking whether builddir is srcdir... yes checking for working aclocal... found checking for working autoconf... found . checking for readline/readline.h... no checking for rl_callback_read_char in -lreadline... no checking for main in -lncurses... yes checking for rl_callback_read_char in -lreadline... no checking for history_truncate_file... no configure: error: --with-readline=yes (default) and headers/libs are not available Could you tell me how to fix the problem? Thank you! Best wishes, Yuan Zhidong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variation in one variable
Hello, Could you please tell me wether there is any function in R that tell me how many subgroup in one variable I have? So for example if my data are x - c(rnorm(50,50,3),rgamma(50,2,1),runif(50,0,1)) I want to know how many group I have? Many thank in advance, Samuel --- On Thu, 9/17/09, Samuel Okoye samu...@yahoo.com wrote: From: Samuel Okoye samu...@yahoo.com Subject: SVM To: r-h...@stat.math.ethz.ch Date: Thursday, September 17, 2009, 4:39 AM Hello, I have 12 sample each sample has got 1000 observation, i.e I have a matrix X with 1000 rows and 12 columns! m - svm(t(X)) p - predict (m) Can anyone tell me how to use svmtrain() in R! Many Yhanks, Samuel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binomial
Dear R-users, Suppose I have the following sample of data, 0 1 2 4 3 1 2 1 3 1 1 3 3 4 1 0 1 2 1 2 1 4 1 4 2 1 2 2 1 1 The first variable is the response variable where 0 is defective and 1 normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I want to fit a binomial model . How do I do that? I am guessing the response variable should be transformed but not sure which family of transformation to use. It is easy to do it in SAS but I just want to learn using R Any help is highly appreciated Ashta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R CMD INSTALL --build: Folders /inst and /etc not in zip-file and WindowsXP locks /library/[package]/etc/
Dear R users, My set-up: OS=Windows XP, R-2.9.2, Rtools210 I faced the follwing problem with the package compilation: There is no /inst or /etc subdirectory in the package-zip-file. And the content of the /etc subdirectory is lost, too. I tried a simplified test package. The test package has the following structure (see also attachement: test package as source file): /test |---/inst | |---/etc | |---menus.txt | |---/man | |---mymean.Rd | |---test-package.Rd | |---/R | |---mymean.R | |---NAMESPACE |---DESCRIPTION The file menus.txt (inspired by the Rcmdr menu structure) contains one single comment line. The file mymean.R contains a simple function that computes the mean. Situation A) [R CMD BUILD test] works fine and the tar-ball contains the subdirectory /inst/etc/ and the file menus.txt. Situation B) [R CMD INSTALL --build test] generates the test_1.0.zip file without any error message. But: 1) This zip-file does neither contain the /etc folder nor the menus.txt file. 2) The installation created the folder /test in the library path /R-2.9.2/library/test/ , but this folder is locked by WindowsXP. That is, the access is denied and can only be resolved by re-defining the owner and the permission rights. 3) Having removed the package from the library, and trying to install the zip-package using install from local zip files... does the installation. But, the /etc folder is empty; the file menus.txt is missing. Is there a known problem/bug in the R CMD INSTALL --build process dealing with the subdirectories /inst and /etc and the contents of the these folders? How to resolve the problem? Thanks Tobias http://www.nabble.com/file/p25609569/source_test.zip source_test.zip -- View this message in context: http://www.nabble.com/R-CMD-INSTALL---build%3A-Folders--inst-and--etc-not-in-zip-file-and-WindowsXP-locks--library--package--etc--tp25609569p25609569.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] synchronisation of time series data using interpolation
Create the series as zoo series from the data, and then merge them and fill in NAs with interpolated values using na.approx. Finally use window to pick off the times that were in z1 and plot. See the three vignettes that come with zoo and for time and dates see the article in R News 4/1 and its references. Lines1 - time,datum 01:00:00,500 01:00:15,600 01:00:30,750 01:00:45,720 01:01:00,700 01:01:15,725 01:01:30,640 01:01:45,710 Lines2 - time,datum 01:00:12,20 01:01:01,55 01:01:55,22 library(zoo) library(chron) z1 - read.zoo(textConnection(Lines1), header = TRUE, sep = ,, FUN = times) z2 - read.zoo(textConnection(Lines2), header = TRUE, sep = ,, FUN = times) z3 - window(na.approx(merge(z1, z2)), time(z1)) plot(z3$z1, z3$z2) On Fri, Sep 25, 2009 at 1:41 AM, e-letter inp...@gmail.com wrote: Readers, I have data with different time stamps that I wish to plot (for example): data set 1 time(hh:mm:ss),datum 01:00:00,500 01:00:15,600 01:00:30,750 01:00:45,720 01:01:00,700 01:01:15,725 01:01:30,640 01:01:45,710 data set 2 time,datum 01:00:12,20 01:01:01,55 01:01:55,22 The time interval in data set 1 does not change, but the time interval in data set 2 does change, such that for a specific total time range (e.g. 60 minutes) there will be more data in data set 1 than in data set 2. I thought I could solve this problem using interpolation, to create a new data set using data from data set 2, interpolated to the time stamps in data set 1: data set 3 time,datum 01:00:00,18 01:00:15,23 01:00:30,30 01:00:45,41 01:01:00,53 01:01:15,46 01:01:30,38 01:01:45,29 Therefore I would then be able to plot the data in data set 1 against the interpolated data in data set 3, because there would be equal quantities of data in both data sets. I've looked at the interp function in the help manual, but I don't understand if this function can perform the task I want. Any advice please? Yours, rhelp at conference.jabber.org r 251 (27-06-07) mandriva 2008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem on plotting TS using GGPLOT
Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Java to R interface.
I want to call R functions from Java. I read a couple of forums that said to install package rJava in R. However I am not able to install rJava package in linux Ubuntu.I tried with two commands. One is install.packages(rJava) and another I downloaded the rJava_0.7-0.tar.gz file from R site. and gave the command R CMD INSTALL rJava_0.7-0.tar.gz. I got the followin Errors :- Warning in install.packages(rJava) : argument 'lib' is missing: using '/home/vikrant/R/i486-pc-linux-gnu-library/2.9' trying URL 'http://cran.uk.r-project.org/src/contrib/rJava_0.7-0.tar.gz' Content type 'application/x-gzip' length 249486 bytes (243 Kb) opened URL == downloaded 243 Kb * Installing *source* package ‘rJava’ ... mv: cannot move `/home/vikrant/R/i486-pc-linux-gnu-library/2.9/rJava' to `/home/vikrant/R/i486-pc-linux-gnu-library/2.9/00LOCK/rJava': Permission denied checking for gcc... gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/wait.h that is POSIX.1 compatible... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for string.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking for an ANSI C-conforming const... yes checking whether time.h and sys/time.h may both be included... yes configure: checking whether gcc -std=gnu99 supports static inline... yes checking Java support in R... present: interpreter : '/usr/bin/java' archiver: '/usr/bin/jar' compiler: '/usr/bin/javac' header prep.: '/usr/bin/javah' cpp flags : '-I/usr/lib/jvm/java-6-openjdk/jre/../include' java libs : '-L/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client -L/usr/lib/jvm/java-6-openjdk/jre/lib/i386 -L/usr/lib/jvm/java-6-openjdk/jre/../lib/i386 -L -L/usr/java/packages/lib/i386 -L/lib -L/usr/lib -L/usr/lib/jni -ljvm' checking whether JNI programs can be compiled... yes checking JNI data types... configure: error: One or more JNI types differ from the corresponding native type. You may need to use non-standard compiler flags or a different compiler in order to fix this. ERROR: configuration failed for package ‘rJava’ Please Help me to install rJava. and anyone Could u suggest Is there any better way to call R from Java And provide me the tutorial for the same ? Thanks in Advance -- View this message in context: http://www.nabble.com/Java-to-R-interface.-tp25606893p25606893.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data import from .csv-file with numeric header
Hello everybody out there using R, How can I import data with a numeric header from a .csv-file? My file example.csv has the following content (a duplicate measurement of potentials for three different currents): 1; 2; 6 1.0; 2.1; 5.9 1.1; 2.0; 6.0 I try to import the data by using: measurement - read.table(example.csv,sep=;,header=T) However, the values in the header are renamed to the column names X1, X2 and X3. When I try to plot the data, I don't get the right x-values (the three different currents 1, 2 and 6), but 1.0, 2.0 and 3.0: plot(mean(measurement)) Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nested select
my data : library(doBy) lines-lo ptcl5 ptcl99 variable 430. 8787a 430 3422343 m 430. 89mr 4314564774a 431 299 2777m 4319996 mr 432333 3433 a 432 .7377m 432. 676 mr DF - read.table(con- textConnection(Lines), skip = 1) close(con) what i want is select lo when ptcl5 is missing and variable is either a or m . I tried the following query sqldf(select lo from DF where lo=(select lo where ptcl5='.' and variable='m') or lo=(select lo where ptcl5='.' and variable='a')). But I'm getting entire data instead of limited by the condition. Is my query right please help me in this. -- View this message in context: http://www.nabble.com/Nested-select-tp25608506p25608506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] re peated measures
Hi, Thank you. It was that. Julien. Tal Galili wrote: check for missing values. Tal On Wed, Sep 23, 2009 at 3:27 PM, pompon julien.pom...@agr.gc.ca wrote: Hi, I am performing a repeated measures 2-way ANOVA to assess the influence of plant and leaf on aphid fecundity. Fecundity is measured for each aphid on a single leaf. Here is what I typed. wingless - reshape(Wingless, varying = list(c(d0,d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16)), v.names = c(fecundity), timevar = time, direction = long) wingless.aov - aov(fecundity ~ factor(time) * clip.cage * plant + Error(factor(id)), data = wingless) summary(wingless.aov) and I obtained Error: factor(id) Df Sum Sq Mean Sq F value Pr(F) factor(time)4 56.789 14.197 3.0613 0.05925 . clip.cage 1 14.149 14.149 3.0509 0.10621 plant 1 3.251 3.251 0.7010 0.41880 factor(time):clip.cage 1 0.304 0.304 0.0655 0.80240 clip.cage:plant 1 17.114 17.114 3.6903 0.07880 . Residuals 12 55.652 4.638 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Error: Within Df Sum Sq Mean Sq F value Pr(F) factor(time) 16 340.83 21.30 11.5222 2e-16 *** factor(time):clip.cage16 27.341.71 0.9242 0.54195 factor(time):plant16 46.362.90 1.5673 0.07783 . factor(time):clip.cage:plant 16 24.501.53 0.8281 0.65304 Residuals255 471.441.85 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 I don't understand why I have the factor(time) inmy between subject results, whereas with a similar set of data I don't. Thank you very much, Julien Pompon. -- View this message in context: http://www.nabble.com/repeated-measures-tp25531110p25531110.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/repeated-measures-tp25531110p25610539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] if else and loop for code in R
I am using if else and loop to sortout the data set that is the values less than o or more than 100 will be chosen.I could not get outTable with loop. Please help me to correct the code: I USED: # Read a_data - read.table(D:/SNP/copy.sas, header=T, sep=\t) tr - a_data$truck ca - a_data$cars length - nrow(a_data) outTable - matrix(nrow=length,ncol=3) stat - for (i in 1:length) { if (tr0) {0} else if (ca100) {0}else {ca} outTable - c(i, stat, tr) } # Writing the output file colnames(outTable) - c(number, stat, tr) write.table(outTable,D:/SNP/mixed.txt,append=FALSE,quote=FALSE,sep='\t', row.names=F) # Graph plot(stat, type=o, col=red, axes=FALSE, ann=FALSE) # Create a title with a red, bold/italic font title(main=Autos, col.main=red, font.main=4) # Start PNG device driver to save output to figure.png png(filename=D:/SNP/figure.png, height=295, width=300, bg=white) . COMPLAIN Error: object 'stat' not found In addition: Warning message: In if (tr 0) { : the condition has length 1 and only the first element will be used ... Thank alot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data import from .csv-file with numeric header
Try this: measurement - read.table(example.csv, sep = ;, header = TRUE, check.names = FALSE) plot(mean(measurement), names(measurement), xaxt = 'n') axis(1, names(measurement)) On Fri, Sep 25, 2009 at 3:53 AM, Tobias Ruff lisem...@ymail.com wrote: Hello everybody out there using R, How can I import data with a numeric header from a .csv-file? My file example.csv has the following content (a duplicate measurement of potentials for three different currents): 1; 2; 6 1.0; 2.1; 5.9 1.1; 2.0; 6.0 I try to import the data by using: measurement - read.table(example.csv,sep=;,header=T) However, the values in the header are renamed to the column names X1, X2 and X3. When I try to plot the data, I don't get the right x-values (the three different currents 1, 2 and 6), but 1.0, 2.0 and 3.0: plot(mean(measurement)) Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data import from .csv-file with numeric header
Tobias Ruff wrote: Hello everybody out there using R, How can I import data with a numeric header from a .csv-file? My file example.csv has the following content (a duplicate measurement of potentials for three different currents): 1; 2; 6 1.0; 2.1; 5.9 1.1; 2.0; 6.0 I try to import the data by using: measurement - read.table(example.csv,sep=;,header=T) However, the values in the header are renamed to the column names X1, X2 and X3. When I try to plot the data, I don't get the right x-values (the three different currents 1, 2 and 6), but 1.0, 2.0 and 3.0: plot(mean(measurement)) I got X1, X2 and X6, because 1, 2, and 6 aren't legal variable names. If you want to use them as names anyway, use the check.names=FALSE argument. I don't know how you tried to plot them so I can't help you with that. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug
Thank you very much for help. Here is my values of Sale Date sample test[1:100, 76] 11989-08-01 21900-01-01 32003-11-18 42003-05-30 52005-08-18 61990-04-01 71989-01-01 81900-01-01 91996-03-12 10 1900-01-01 11 2005-11-14 12 2002-05-08 13 2000-10-10 14 1900-01-01 15 2007-03-27 Gabor Grothendieck wrote: You have found a bug. It would be best to use dput(test1) to show unambiguously display what is in test1 but in the absence of that I will assume that its as in test1 shown below. library(sqldf) test1 - data.frame(sale_date = as.Date(c(2008-08-01, 2031-01-09, + 1990-01-03, 2007-02-03, 1997-01-03, 2004-02-04))) sqldf(select max(sale_date) from test1) max(sale_date) 1 9864.0 Evidently it is taking the internal numeric representation and then storing it in the database as characters and then taking the maximum of those characters. As the fifth entry starts with 9 its the maximum when sorted alphabetically: as.numeric(test1[[1]]) [1] 14092 22288 7307 13547 9864 12452 I will have to investigate whether the problem is in sqldf or the underlying software. In the meantime if you represent the Date data as character you should be ok: test2 - transform(test1, sale_date = as.character(sale_date)) sqldf(select max(sale_date) from test2) max(sale_date) 1 2031-01-09 packageDescription(sqldf)$Version [1] 0-1.7 R.version.string [1] R version 2.9.2 Patched (2009-09-08 r49647) Please provide the output of dput(test1) so that we know unambiguously what your data looks like. On Thu, Sep 24, 2009 at 9:07 AM, dhanasekaran dhana...@gmail.com wrote: The data looks like 2008-08-01 2031-01-09 1990-01-03 2007-02-03 1997-01-03 2004-02-04 Thanks. On Thu, Sep 24, 2009 at 5:20 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Please read and follow the last line to every message on r-help. On Thu, Sep 24, 2009 at 5:32 AM, dhansekaran dhana...@gmail.com wrote: Hello R users I tried to get maximum of sale date from my dataframe using sqldf in R. First time when i was executing the following code sqldf(select max(sale_date) from test1) i got the result as 9997.0 BUT when i was running the same for second time, the result was 2031-04-09 (this is what correct one!) why it was happened? thanks. -- View this message in context: http://www.nabble.com/Bug-tp25548042p25548042.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Best Dhanasekaran Without trust, words become the hollow sound of a wooden gong. With trust, words become life itself.” __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Bug-tp25548042p25610059.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binomial
On Sep 25, 2009, at 6:29 AM, Ashta wrote: Dear R-users, Suppose I have the following sample of data, 0 1 2 4 3 1 2 1 3 1 1 3 3 4 1 0 1 2 1 2 1 4 1 4 2 1 2 2 1 1 The first variable is the response variable where 0 is defective and 1 normal. The other four factors( x1,x2,x3,x4) that influence the outcome. I want to fit a binomial model . How do I do that? I am guessing the response variable should be transformed but not sure which family of transformation to use. It is easy to do it in SAS but I just want to learn using R Any help is highly appreciated Ashta Presuming that your reference to SAS is to PROC LOGISTIC, then in R you would use glm() with 'family = binomial'. Using: help.search(logistic regression) would get you a lot of hints. See ?glm for more information, or alternatively, the lrm() function in Frank's 'rms' package on CRAN. I would however hope that your actual working data set is much larger, as you don't have enough data above to support a single covariate, much less 4. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Java to R interface.
You could also try Rserve http://www.rforge.net/Rserve/ -cj vikrant S wrote: I want to call R functions from Java. I read a couple of forums that said to install package rJava in R. However I am not able to install rJava package in linux Ubuntu.I tried with two commands. One is install.packages(rJava) and another I downloaded the rJava_0.7-0.tar.gz file from R site. and gave the command R CMD INSTALL rJava_0.7-0.tar.gz. I got the followin Errors :- Warning in install.packages(rJava) : argument 'lib' is missing: using '/home/vikrant/R/i486-pc-linux-gnu-library/2.9' trying URL 'http://cran.uk.r-project.org/src/contrib/rJava_0.7-0.tar.gz' Content type 'application/x-gzip' length 249486 bytes (243 Kb) opened URL == downloaded 243 Kb * Installing *source* package ‘rJava’ ... mv: cannot move `/home/vikrant/R/i486-pc-linux-gnu-library/2.9/rJava' to `/home/vikrant/R/i486-pc-linux-gnu-library/2.9/00LOCK/rJava': Permission denied checking for gcc... gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -std=gnu99 accepts -g... yes checking for gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -std=gnu99 -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/wait.h that is POSIX.1 compatible... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for string.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking for an ANSI C-conforming const... yes checking whether time.h and sys/time.h may both be included... yes configure: checking whether gcc -std=gnu99 supports static inline... yes checking Java support in R... present: interpreter : '/usr/bin/java' archiver: '/usr/bin/jar' compiler: '/usr/bin/javac' header prep.: '/usr/bin/javah' cpp flags : '-I/usr/lib/jvm/java-6-openjdk/jre/../include' java libs : '-L/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client -L/usr/lib/jvm/java-6-openjdk/jre/lib/i386 -L/usr/lib/jvm/java-6-openjdk/jre/../lib/i386 -L -L/usr/java/packages/lib/i386 -L/lib -L/usr/lib -L/usr/lib/jni -ljvm' checking whether JNI programs can be compiled... yes checking JNI data types... configure: error: One or more JNI types differ from the corresponding native type. You may need to use non-standard compiler flags or a different compiler in order to fix this. ERROR: configuration failed for package ‘rJava’ Please Help me to install rJava. and anyone Could u suggest Is there any better way to call R from Java And provide me the tutorial for the same ? Thanks in Advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packGrob and dynamic resizing
Thank you Paul, I was convinced I tried this option but I obviously didn't! In ?packGrob, the user is warned that packing grobs can be slow. In order to quantify this, I made the following comparison of 3 functions, - table1 uses frameGrob and packGrob - table2 uses frameGrob but calculates the sizes manually and uses placeGrob - table3 creates a grid.layout and draws the grobs in the different viewports. The three functions have (almost) the same output, but the timing does differ quite substantially ! system.time(table1(content)) # user system elapsed # 126.733 2.414 135.450 system.time(table2(content)) # user system elapsed # 22.387 0.508 24.457 system.time(table3(content)) # user system elapsed # 4.868 0.124 5.695 A few questions: - why should the placeGrob approach of table2 be 5 times slower than table3 (pushing viewports) ? - if so, what are the merits of using a frameGrob over creating a layout manually? - can one add some padding to the content placed with a placeGrob approach? Best regards, baptiste The code follows below, sessionInfo() R version 2.9.2 (2009-08-24) i386-apple-darwin8.11.1 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base ### code starts ### library(grid) # a few helping functions rowMax.units - function(u, nrow){ # rowMax with a fake matrix of units matrix.indices - matrix(seq_along(u), nrow=nrow) do.call(unit.c, lapply(seq(1, nrow), function(ii) { max(u[matrix.indices[ii, ]]) })) } colMax.units - function(u, ncol){ # colMax with a fake matrix of units matrix.indices - matrix(seq_along(u), ncol=ncol) do.call(unit.c, lapply(seq(1, ncol), function(ii) { max(u[matrix.indices[, ii]]) })) } textii - function(d, gp=gpar(), name=row-label-){ function(ii) textGrob(label=d[ii], gp=gp, name=paste(name, ii, sep=)) } # create a list of text grobs from a data.frame makeContent - function(d){ content - as.character(unlist(c(d))) makeOneLabel - textii(d=content, gp=gpar(col=blue), name=content-label-) lg - lapply(seq_along(content), makeOneLabel) list(lg=lg, nrow=nrow(d), ncol=ncol(d)) } the comparison starts here ## table1 uses grid.pack table1 - function(content){ gcells = frameGrob(name=table.cells, layout = grid.layout(content$nrow, content$ncol)) label.ind - 1 # index running accross labels for (ii in seq(1, content$ncol, 1)) { for (jj in seq(1, content$nrow, 1)) { gcells = packGrob(gcells, content$lg[[label.ind]], row=jj, col=ii, dynamic=TRUE) label.ind - label.ind + 1 } } grid.draw(gTree(children=gList(gcells))) } ## table2 uses grid.place table2 - function(content){ padding - unit(4, mm) lg - content$lg ## retrieve the widths and heights of all textGrobs (including some zeroGrobs) wg - lapply(lg, grobWidth) # list of grob widths hg - lapply(lg, grobHeight) # list of grob heights ## concatenate this units widths.all - do.call(unit.c, wg) # all grob widths heights.all - do.call(unit.c, hg)#all grob heights ## matrix-like operations on units to define the table layout widths - colMax.units(widths.all, content$ncol) # all column widths heights - rowMax.units(heights.all, content$nrow) # all row heights gcells = frameGrob(name=table.cells, layout = grid.layout(content$nrow, content$ncol, width=widths+padding, height=heights+padding)) label.ind - 1 # index running accross labels for (ii in seq(1, content$ncol, 1)) { for (jj in seq(1, content$nrow, 1)) { gcells = placeGrob(gcells, content$lg[[label.ind]], row=jj, col=ii) label.ind - label.ind + 1 } } grid.draw(gTree(children=gList(gcells))) } ## table3 uses grid.layout table3 - function(content){ padding - unit(4, mm) lg - content$lg ## retrieve the widths and heights of all textGrobs (including some zeroGrobs) wg - lapply(lg, grobWidth) # list of grob widths hg - lapply(lg, grobHeight) # list of grob heights ## concatenate this units widths.all - do.call(unit.c, wg) # all grob widths heights.all - do.call(unit.c, hg)#all grob heights ## matrix-like operations on units to define the table layout widths - colMax.units(widths.all, content$ncol) # all column widths heights - rowMax.units(heights.all, content$nrow) # all row heights cells = viewport(name=table.cells, layout = grid.layout(content$nrow, content$ncol, width=widths+padding, height=heights+padding) ) pushViewport(cells) label.ind - 1 # index running accross labels ## loop over columns and rows for (ii in seq(1, content$ncol, 1)) { for (jj in seq(1, content$nrow, 1)) { ##
Re: [R] keeping all rows with the same values, and not only unique ones
Thank you so much, everyone! Very helpful! Dimitri On Thu, Sep 24, 2009 at 7:46 PM, Moshe Olshansky m_olshan...@yahoo.com wrote: test[which(test[,total] %in% needed),] --- On Fri, 25/9/09, Dimitri Liakhovitski ld7...@gmail.com wrote: From: Dimitri Liakhovitski ld7...@gmail.com Subject: [R] keeping all rows with the same values, and not only unique ones To: R-Help List r-h...@stat.math.ethz.ch Received: Friday, 25 September, 2009, 8:52 AM Dear R-ers, I have a data frame test: test-data.frame(x=c(1,2,3,4,5,6,7,8),y=c(2,3,4,5,6,7,8,9),total=c(7,7,8,8,9,9,10,10)) test I have a vector needed: needed-c(7,9) needed I need the result to look like this: 1 2 7 2 3 7 5 6 9 6 7 9 When I do the following: result-test[test[total]==needed,] result I only get unique rows that have 7 or 9 in total: 1 2 7 6 7 9 How could I keep ALL rows that have 7 or 9 in total Thanks a million! -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else and loop for code in R
It looks like you are trying to mimic the SAS data step. In R you can vectorise this. a_data - read.table(D:/SNP/copy.sas, header=T, sep=\t) a_data$stat - with(a_data, ifelse(truck 0, 0, ifelse(cars 100, 0, cars))) a_data$i - seq_len(nrow(a_data)) outTable - a_data[, c(i, stat, truck)] HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Minh Duy Mai Verzonden: vrijdag 25 september 2009 13:53 Aan: r-help@R-project.org CC: minhmai...@yahoo.com Onderwerp: [R] if else and loop for code in R I am using if else and loop to sortout the data set that is the values less than o or more than 100 will be chosen.I could not get outTable with loop. Please help me to correct the code: I USED: # Read a_data - read.table(D:/SNP/copy.sas, header=T, sep=\t) tr - a_data$truck ca - a_data$cars length - nrow(a_data) outTable - matrix(nrow=length,ncol=3) stat - for (i in 1:length) { if (tr0) {0} else if (ca100) {0}else {ca} outTable - c(i, stat, tr) } # Writing the output file colnames(outTable) - c(number, stat, tr) write.table(outTable,D:/SNP/mixed.txt,append=FALSE,quote=FALSE,sep='\t ', row.names=F) # Graph plot(stat, type=o, col=red, axes=FALSE, ann=FALSE) # Create a title with a red, bold/italic font title(main=Autos, col.main=red, font.main=4) # Start PNG device driver to save output to figure.png png(filename=D:/SNP/figure.png, height=295, width=300, bg=white) . COMPLAIN Error: object 'stat' not found In addition: Warning message: In if (tr 0) { : the condition has length 1 and only the first element will be used ... Thank alot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem on plotting TS using GGPLOT
You are mixing data from two datasets with different lengths. Your x variable has 51 elements, while the y variable has 306 elements? What did you expect to happen with that? Use only one dataset within a geom(). Otherwise you are likely the get in troubles. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens bogaso.christofer Verzonden: vrijdag 25 september 2009 14:47 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] Problem on plotting TS using GGPLOT Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does anybody know how to connect to SAS from within R?
Yeah, I also would like to know what synergy can I get from combining the power of R and SAS... Maybe there are something that's particularly strong in R and someother that's particularly strong in SAS? Thanks! On Thu, Sep 24, 2009 at 10:26 PM, Indrajit Sengupta indra_cali...@yahoo.com wrote: Here's a good website on using R SAS. I am not sure if this site or the book mentioned talks about connecting to SAS from R, but nevertheless its worth going through. I have used both - and really can't see much benefit other than transferring datasets. Regards, Indrajit - Original Message From: Michael comtech@gmail.com To: r-help r-h...@stat.math.ethz.ch Sent: Friday, September 25, 2009 6:06:58 AM Subject: [R] Does anybody know how to connect to SAS from within R? And what might be the benefit doing that? Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem on plotting TS using GGPLOT
Thanks for this reply. Here my goal is to plot multiple time series in the same plotting window. Here y variable has 306 elements, however each value is associated with factor which is represented by vv variable. I want to plot total 6 time series, for example 1st 51 values of y, represented by a should be treated as single TS, with index as index of dat2.similarly 2nd 51 values of y, represented by b should be treated as another single TS, with index as index of dat2...and so on. What is the problem here? Thanks, ONKELINX, Thierry wrote: You are mixing data from two datasets with different lengths. Your x variable has 51 elements, while the y variable has 306 elements? What did you expect to happen with that? Use only one dataset within a geom(). Otherwise you are likely the get in troubles. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens bogaso.christofer Verzonden: vrijdag 25 september 2009 14:47 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] Problem on plotting TS using GGPLOT Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612219.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem on plotting TS using GGPLOT
Let me be more specific. My goal is to plot following multiple TS, using ggplot2 dat1 - zooreg(matrix(rnorm(306), 51), as.yearmon(as.Date(2000-01-01)), frequency=12) colnames(dat1) - letters[1:6] dat1 Still I can not get what is problem in my ggplot2 codes. Please give some idea. Best, Bogaso wrote: Thanks for this reply. Here my goal is to plot multiple time series in the same plotting window. Here y variable has 306 elements, however each value is associated with factor which is represented by vv variable. I want to plot total 6 time series, for example 1st 51 values of y, represented by a should be treated as single TS, with index as index of dat2.similarly 2nd 51 values of y, represented by b should be treated as another single TS, with index as index of dat2...and so on. What is the problem here? Thanks, ONKELINX, Thierry wrote: You are mixing data from two datasets with different lengths. Your x variable has 51 elements, while the y variable has 306 elements? What did you expect to happen with that? Use only one dataset within a geom(). Otherwise you are likely the get in troubles. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens bogaso.christofer Verzonden: vrijdag 25 september 2009 14:47 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] Problem on plotting TS using GGPLOT Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612307.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem on plotting TS using GGPLOT
First get the correct representation which here would be a multivariate zoo series with 51 time points and 6 components series and then plot it using zoo's plot function: z - zoo(matrix(dat, 51), time(dat2)) # all in one panel plot(z, pch = letters[1:6], screen = 1, type = b, col = 1:6) # or in separate panels (same but omit screen = 1) plot(z, pch = letters[1:6], type = b, col = 1:6) There are many examples of plotting zoo series in the 3 vignettes that come with zoo and also in ?plot.zoo and ?xyplot.zoo If you wish to use ggplot2 you can extract the data and times into a new data frame and use that data frame for further computation. DF - cbind(tt = time(z), as.data.frame(z)) On Fri, Sep 25, 2009 at 9:32 AM, Bogaso bogaso.christo...@gmail.com wrote: Thanks for this reply. Here my goal is to plot multiple time series in the same plotting window. Here y variable has 306 elements, however each value is associated with factor which is represented by vv variable. I want to plot total 6 time series, for example 1st 51 values of y, represented by a should be treated as single TS, with index as index of dat2.similarly 2nd 51 values of y, represented by b should be treated as another single TS, with index as index of dat2...and so on. What is the problem here? Thanks, ONKELINX, Thierry wrote: You are mixing data from two datasets with different lengths. Your x variable has 51 elements, while the y variable has 306 elements? What did you expect to happen with that? Use only one dataset within a geom(). Otherwise you are likely the get in troubles. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens bogaso.christofer Verzonden: vrijdag 25 september 2009 14:47 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] Problem on plotting TS using GGPLOT Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612219.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] Problem on plotting TS using GGPLOT
Thanks Gabor for your input. However I know there is option in zoo to plot multiple time series. However I want to go with ggplot2 because it looks better. If anyone point me where is the problem in my ggplot2 code, I would be truly grateful. Thanks, Gabor Grothendieck wrote: First get the correct representation which here would be a multivariate zoo series with 51 time points and 6 components series and then plot it using zoo's plot function: z - zoo(matrix(dat, 51), time(dat2)) # all in one panel plot(z, pch = letters[1:6], screen = 1, type = b, col = 1:6) # or in separate panels (same but omit screen = 1) plot(z, pch = letters[1:6], type = b, col = 1:6) There are many examples of plotting zoo series in the 3 vignettes that come with zoo and also in ?plot.zoo and ?xyplot.zoo If you wish to use ggplot2 you can extract the data and times into a new data frame and use that data frame for further computation. DF - cbind(tt = time(z), as.data.frame(z)) On Fri, Sep 25, 2009 at 9:32 AM, Bogaso bogaso.christo...@gmail.com wrote: Thanks for this reply. Here my goal is to plot multiple time series in the same plotting window. Here y variable has 306 elements, however each value is associated with factor which is represented by vv variable. I want to plot total 6 time series, for example 1st 51 values of y, represented by a should be treated as single TS, with index as index of dat2.similarly 2nd 51 values of y, represented by b should be treated as another single TS, with index as index of dat2...and so on. What is the problem here? Thanks, ONKELINX, Thierry wrote: You are mixing data from two datasets with different lengths. Your x variable has 51 elements, while the y variable has 306 elements? What did you expect to happen with that? Use only one dataset within a geom(). Otherwise you are likely the get in troubles. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens bogaso.christofer Verzonden: vrijdag 25 september 2009 14:47 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] Problem on plotting TS using GGPLOT Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Problem-on-plotting-TS-using-GGPLOT-tp25611332p25612219.html Sent from the R help mailing list archive at Nabble.com. __
[R] Spliting columns, strings or reg exp returning substrings
Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn (trying to use apply); (2) split the column into two based on '_', but I am not sure if this is possible; (3) use a regular expression to return the substring up to the '_', but I am unsure how to make a regular expression return the substring it matches to in R. Any ideas on all three counts would be gratefully recieved. -- AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem on plotting TS using GGPLOT
Once you have reduced it to a data frame as already discussed, its just a ggplot2 problem so you can take it to the ggplot2 group: http://groups.google.com/group/ggplot2 On Fri, Sep 25, 2009 at 9:58 AM, Bogaso bogaso.christo...@gmail.com wrote: Thanks Gabor for your input. However I know there is option in zoo to plot multiple time series. However I want to go with ggplot2 because it looks better. If anyone point me where is the problem in my ggplot2 code, I would be truly grateful. Thanks, Gabor Grothendieck wrote: First get the correct representation which here would be a multivariate zoo series with 51 time points and 6 components series and then plot it using zoo's plot function: z - zoo(matrix(dat, 51), time(dat2)) # all in one panel plot(z, pch = letters[1:6], screen = 1, type = b, col = 1:6) # or in separate panels (same but omit screen = 1) plot(z, pch = letters[1:6], type = b, col = 1:6) There are many examples of plotting zoo series in the 3 vignettes that come with zoo and also in ?plot.zoo and ?xyplot.zoo If you wish to use ggplot2 you can extract the data and times into a new data frame and use that data frame for further computation. DF - cbind(tt = time(z), as.data.frame(z)) On Fri, Sep 25, 2009 at 9:32 AM, Bogaso bogaso.christo...@gmail.com wrote: Thanks for this reply. Here my goal is to plot multiple time series in the same plotting window. Here y variable has 306 elements, however each value is associated with factor which is represented by vv variable. I want to plot total 6 time series, for example 1st 51 values of y, represented by a should be treated as single TS, with index as index of dat2.similarly 2nd 51 values of y, represented by b should be treated as another single TS, with index as index of dat2...and so on. What is the problem here? Thanks, ONKELINX, Thierry wrote: You are mixing data from two datasets with different lengths. Your x variable has 51 elements, while the y variable has 306 elements? What did you expect to happen with that? Use only one dataset within a geom(). Otherwise you are likely the get in troubles. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens bogaso.christofer Verzonden: vrijdag 25 september 2009 14:47 Aan: r-h...@stat.math.ethz.ch Onderwerp: [R] Problem on plotting TS using GGPLOT Hi, I have following codes : library(zoo); library(ggplot2); library(plyr) dat - rnorm(306); vv - letters[1:6]; dat1 - data.frame(dat, vv) dat2 = zooreg(rnorm(51), as.yearmon(as.Date(2000-01-01)), frequency=12) ggplot(dat1) + geom_line(aes(y=dat, x=index(dat2), colour=vv), group=vv, size = 1.3) However I got error while plotting them : Error in data.frame(x = c(2000, 2000.083, 2000.167, 2000.25, : arguments imply differing number of rows: 51, 306 I could not find why that error is coming. Any idea please ? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] summarize-plyr package
Hi,I am using the amazing package 'plyr. I have one problem. I would appreciate help to fix the following error: Thanks. __ library(plyr) data(baseball) summarise(baseball, + duration = max(year) - min(year), + nteams = length(unique(team))) Error: could not find function summarise ddply(baseball, id, summarise, + duration = max(year) - min(year), + nteams = length(unique(team))) Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress) : object summarise not found -- Professor of Family Medicine Boston University Tel: 617-414-6221, Fax:617-414-3345 emails: chett...@gmail.com,vche...@bu.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spliting columns, strings or reg exp returning substrings
Try this: DF - data.frame(A = c('11_12', '22_23', '33_34'), B = sample(3)) #1) Using strsplit transform(DF, C = sapply(strsplit(as.character(DF$A), _), '[', 1)) #2) Using substr transform(DF, C = substr(DF$A, 1, 2)) #3) Using regex transform(DF, C = gsub(_.*, , DF$A)) On Fri, Sep 25, 2009 at 11:01 AM, Dry, Jonathan R jonathan@astrazeneca.com wrote: Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn (trying to use apply); (2) split the column into two based on '_', but I am not sure if this is possible; (3) use a regular expression to return the substring up to the '_', but I am unsure how to make a regular expression return the substring it matches to in R. Any ideas on all three counts would be gratefully recieved. -- AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summarize-plyr package
Hi, it works for me with plyr version 0.1.9. Try upgrading to the latest version, or post your sessionInfo() HTH, baptiste 2009/9/25 Veerappa Chetty chett...@gmail.com: Hi,I am using the amazing package 'plyr. I have one problem. I would appreciate help to fix the following error: Thanks. __ library(plyr) data(baseball) summarise(baseball, + duration = max(year) - min(year), + nteams = length(unique(team))) Error: could not find function summarise ddply(baseball, id, summarise, + duration = max(year) - min(year), + nteams = length(unique(team))) Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress) : object summarise not found -- Professor of Family Medicine Boston University Tel: 617-414-6221, Fax:617-414-3345 emails: chett...@gmail.com,vche...@bu.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summarize-plyr package
Works alright for me: summarise(baseball,duration = max(year) - min(year),nteams = length(unique(team))) duration nteams 1 136132 ddply(baseball, id, summarise, duration = max(year) - min(year), nteams = length(unique(team))) id duration nteams 1aaronha01 22 3 2abernte02 17 7 3adairje01 12 4 4adamsba01 20 2 5adamsbo03 13 4 cheers, -Girish sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] RWinEdt_1.8-1 gtools_2.6.1 gmodels_2.15.0 ggplot2_0.8.3 reshape_0.8.3 [6] plyr_0.1.9 proto_0.3-8doBy_4.0.2 loaded via a namespace (and not attached): [1] cluster_1.12.0 Formula_0.1-3gdata_2.6.1 Hmisc_3.7-0 kinship_1.1.0-23 [6] lattice_0.17-25 MASS_7.2-48 nlme_3.1-94 plm_1.1-4 sandwich_2.2-1 [11] splines_2.9.2survival_2.35-7 tools_2.9.2 - Veerappa Chetty wrote: Hi,I am using the amazing package 'plyr. I have one problem. I would appreciate help to fix the following error: Thanks. __ library(plyr) data(baseball) summarise(baseball, + duration = max(year) - min(year), + nteams = length(unique(team))) Error: could not find function summarise ddply(baseball, id, summarise, + duration = max(year) - min(year), + nteams = length(unique(team))) Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress) : object summarise not found -- Professor of Family Medicine Boston University Tel: 617-414-6221, Fax:617-414-3345 emails: chett...@gmail.com,vche...@bu.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/summarize-plyr-package-tp25612974p25613167.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multicomp plotting
I have been trying using the following require(multcomp) tmp - list(confint=sig.data) attr(tmp, type) - none old.oma - par(oma=c(0,1,0,0)) multcomp:::plot.confint.glht(tmp) par(old.oma) I have not been able to get it to work. I would greatly appreciate some suggestion. Thanks .../Murli From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Nair, Murlidharan T [mn...@iusb.edu] Sent: Thursday, September 24, 2009 2:06 PM To: r-help@r-project.org Subject: [R] multicomp plotting I am trying to plot my multiple comparison data. Can anyone give me some input of the error I am getting. The data and code is appended below. Thanks ../Murli library(multcomp) sig.data-structure(list(X = 1:63, Cell.lines = structure(c(1L, 6L, 13L, 25L, 33L, 42L, 2L, 7L, 14L, 26L, 34L, 43L, 3L, 4L, 5L, 18L, 22L, 52L, 58L, 8L, 27L, 35L, 45L, 9L, 36L, 46L, 10L, 15L, 28L, 37L, 47L, 11L, 16L, 29L, 38L, 44L, 12L, 17L, 30L, 39L, 48L, 19L, 23L, 53L, 59L, 20L, 21L, 24L, 54L, 60L, 31L, 40L, 49L, 50L, 32L, 41L, 51L, 55L, 61L, 56L, 62L, 57L, 63L), .Label = c(DU145-Caki-2, DU145-Calu1, HCE-7-DU145, HCT116-DU145, HT29-DU145, LAPC4-Caki-2, LAPC4-Calu1, LAPC4-EC-17, LAPC4-Fet, LAPC4-HCE-7, LAPC4-HCT116, LAPC4-HT29, LNCaP-Caki-2, LNCaP-Calu1, LNCaP-HCE-7, LNCaP-HCT116, LNCaP-HT29, LS174-DU145, LS174-LAPC4, LS174-LNCaP, MCF7-LNCaP, MDA-MB-468-DU145, MDA-MB-468-LAPC4, MDA-MB-468-LNCaP, PC3-Caki-2, PC3-Calu1, PC3-EC-17, PC3-HCE-7, PC3-HCT116-2, PC3-HT29, PC3-LS174, PC3-MDA-MB-468, RWPE1-Caki-2, RWPE1-Calu1, RWPE1-EC-17, RWPE1-Fet, RWPE1-HCE-7, RWPE1-HCT116, RWPE1-HT29, RWPE1-LS174, RWPE1-MDA-MB-468, RWPE2-Caki-2, RWPE2-Calu1, RWPE2-E-HCT116, RWPE2-EC-17, RWPE2-Fet, RWPE2-HCE-7, RWPE2-HT29, RWPE2-LS174, RWPE2-MCF7, RWPE2-MDA-MB-468, SW480-DU145, SW480-LAPC4, SW480-LNCaP, SW480-PC3, SW480-RWPE1, SW480-RWPE2, TE3-DU145, TE3-LAPC4, TE3-LNCaP, TE3-PC3, TE3-RWPE1, TE3-RWPE2), class = factor), estimate = c(-2759.302703, -3690.072718, -2607.150854, -3282.218985, -3635.312686, -3786.281227, -1189.109264, -2119.879279, -1036.957415, -1712.025546, -2065.119246, -2216.087787, 1253.075395, 1009.183561, 808.413018, 2038.189972, 788.61518, 1453.525701, 1001.526663, -1135.02519, -727.171457, -1080.265157, -1231.233698, -682.040377, -627.280345, -778.248885, -2183.84541, -1100.923546, -1775.991677, -2129.085377, -2280.053918, -1939.953576, -857.031712, -1532.099843, -1885.193544, -2036.162085, -1739.183033, -656.261169, -1331.3293, -1684.423001, -1835.391542, 2968.959987, 1719.385195, 2384.295716, 1932.296678, 1886.038123, -578.466846, 636.463331, 1301.373852, 849.374814, -2561.106254, -2914.199954, -3065.168495, -600.663526, -1311.531462, -1664.625162, -1815.593703, 1976.441983, 1524.442945, 2329.535683, 1877.536646, 2480.504224, 2028.505187), lower = c(-3326.68652, -4257.45653, -3174.53467, -3849.6028, -4202.6965, -4353.66504, -1756.49308, -2687.26309, -1604.34123, -2279.40936, -2632.50306, -2783.4716, 685.69158, 441.79975, 241.02921, 1470.80616, 221.23137, 886.14189, 434.14285, -1702.409, -1294.55527, -1647.64897, -1798.61751, -1249.42419, -1194.66416, -1345.6327, -2751.22922, -1668.30736, -2343.37549, -2696.46919, -2847.43773, -2507.33739, -1424.41552, -2099.48366, -2452.57736, -2603.5459, -2306.56685, -1223.64498, -1898.71311, -2251.80681, -2402.77535, 2401.57617, 1152.00138, 1816.9119, 1364.91287, 1318.65431, -1145.85066, 69.07952, 733.99004, 281.991, -3128.49007, -3481.58377, -3632.55231, -1168.04734, -1878.91527, -2232.00897, -2382.97752, 1409.05817, 957.05913, 1762.15187, 1310.15283, 1913.12041, 1461.12137), upper = c(-2191.918891, -3122.688906, -2039.767042, -2714.835173, -3067.928873, -3218.897414, -621.725451, -1552.495466, -469.573602, -1144.641733, -1497.735434, -1648.703975, 1820.459207, 1576.567374, 1375.796831, 2605.573784, 1355.998992, 2020.909513, 1568.910476, -567.641377, -159.787644, -512.881345, -663.849886, -114.656565, -59.896532, -210.865073, -1616.461597, -533.539733, -1208.607864, -1561.701565, -1712.670106, -1372.569764, -289.6479, -964.716031, -1317.809731, -1468.778272, -1171.799221, -88.877357, -763.945488, -1117.039188, -1268.007729, 3536.343799, 2286.769007, 2951.679528, 2499.680491, 2453.421935, -11.083033, 1203.847143, 1868.757664, 1416.758627, -1993.722441, -2346.816142, -2497.784683, -33.279714, -744.147649, -1097.24135, -1248.209891, 2543.825795, 2091.826758, 2896.919496, 2444.920458, 3047.888037, 2595.888999), p.val.raw = c(2.22e-15, 0, 8.22e-15, 0, 0, 0, 6.2e-08, 7.41e-13, 6.07e-07, 6.36e-11, 1.29e-12, 2.85e-13, 2.47e-08, 9.33e-07, 2.3e-05, 1.71e-12, 3.18e-05, 1.59e-09, 1.05e-06, 1.37e-07, 8.74e-05, 3.13e-07, 3.37e-08, 0.000184, 0.000452, 3.77e-05, 3.91e-13, 2.29e-07, 3.02e-11, 6.75e-13, 1.54e-13, 4.84e-12, 1.05e-05, 5.77e-10, 8.81e-12, 1.75e-12, 4.62e-11, 0.000281, 8.24e-09, 8.83e-11, 1.53e-11, 4.44e-16, 5.83e-11, 5.82e-14, 5.26e-12, 8.73e-12, 0.001, 0.000389, 1.25e-08, 1.18e-05, 1.2e-14, 6.66e-16, 2.22e-16, 0.000698, 1.08e-08, 1.12e-10, 1.92e-11,
Re: [R] packGrob and dynamic resizing
On Fri, Sep 25, 2009 at 7:55 AM, baptiste auguie baptiste.aug...@googlemail.com wrote: Thank you Paul, I was convinced I tried this option but I obviously didn't! In ?packGrob, the user is warned that packing grobs can be slow. In order to quantify this, I made the following comparison of 3 functions, - table1 uses frameGrob and packGrob - table2 uses frameGrob but calculates the sizes manually and uses placeGrob - table3 creates a grid.layout and draws the grobs in the different viewports. The three functions have (almost) the same output, but the timing does differ quite substantially ! This matches my experience with ggplot2 - I have been gradually moving away from frameGrob and packGrob because doing the placement myself is much faster (and for most of the cases I'm interested in, the full power of packGrob is not needed) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data
Sometimes it is easiest to open a file using a file selection widget. I keep this in my .Rprofile: getOpenFile - function(...){ require(tcltk) return(tclvalue(tkgetOpenFile())) } With this you can find your file and open it with rel - read.table(getOpenFile(), quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) or filename - getOpenFile() rel - read.table(filename, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) Mike P.S. I keep a couple functions on hand for choosing writable files and directories too... getSaveFile - function(...){ require(tcltk) return(tclvalue(tkgetSaveFile())) } chooseDir - function(...){ require(tcltk) return(tclvalue(tkchooseDirectory())) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error with Mixdist in R
Dear R User, I am an electrical engineering student and have just come across a curve fitting problem. I need to find the constituent Gaussian distribution curves fitting the data attached in Workbook1.txt here. I tried to use Mixdist on R but ran into following problem. Can you suggest me where I am going wrong? super -read.table(Workbook1.txt,,sep =\t) plot(super) fitmixdata -as.mixdata(super) plot(fitmixdata) plotfit1-mix(super,mixparam(c(-75,-67,-38),10),norm,mixconstr(consigma=NONE)) Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist, : missing value in parameter I get this error. Awaiting a reply soon. Thanking you, Regards, Suchit Shah BOSTON Try the new Yahoo! India Homepage. Click here. http://in.yahoo.com/trynew-1500 -1460 -1420 -1380 -1350 -1310 -12744.293 -123118.463 -12027.862 -11613.076 -112173.738 -108322.141 -105633.341 -1011153.095 -97.2 1775.843 -93.5 2956.544 -89.7 4966.666 -86 8216.293 -82.2 13535.566 -78.5 21288.975 -74.8 28815.691 -71 36041.516 -67.3 46679.93 -63.5 59945.395 -59.8 73005.781 -56 89597.742 -52.3 114438.898 -48.6 142680.047 -44.8 170931.375 -41.1 201308.688 -37.3 219909.109 -33.6 209581.188 -29.8 171905.469 -26.1 119971.742 -22.4 71685.445 -18.6 38779.398 -14.9 20554.045 -11.1 10713.763 -7.39 5092.355 -3.65 2304.784 0.0962 720.503 3.8471.953 7.580 11.30 15.10 18.80 22.60 26.30 30 0 33.80 37.50 41.30 45 0 48.80 52.50 56.20 60 0 63.70 67.50 71.20 74.90 78.70 82.40 86.20 89.90 93.70 97.40 101 0 105 0 109 0 112 0 116 0 120 0 124 0 127 0 131 0 135 0 139 0 142 0 146 0 150 0__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nested select
try this: lines-lo ptcl5 ptcl99 variable + 430. 8787a + 430 3422343 m + 430. 89mr + 4314564774a + 431 299 2777m + 4319996 mr + 432333 3433 a + 432 .7377m + 432. 676 mr + DF - read.table(textConnection(lines), header=TRUE) closeAllConnections() subset(DF, (ptcl5 == '.') (variable %in% c('a', 'm'))) lo ptcl5 ptcl99 variable 1 430 . 8787a 8 432 . 7377m On Fri, Sep 25, 2009 at 4:42 AM, premmad mtechp...@gmail.com wrote: my data : library(doBy) lines-lo ptcl5 ptcl99 variable 430 . 8787 a 430 342 2343 m 430 . 89 mr 431 456 4774 a 431 299 2777 m 431 99 96 mr 432 333 3433 a 432 . 7377 m 432 . 676 mr DF - read.table(con- textConnection(Lines), skip = 1) close(con) what i want is select lo when ptcl5 is missing and variable is either a or m . I tried the following query sqldf(select lo from DF where lo=(select lo where ptcl5='.' and variable='m') or lo=(select lo where ptcl5='.' and variable='a')). But I'm getting entire data instead of limited by the condition. Is my query right please help me in this. -- View this message in context: http://www.nabble.com/Nested-select-tp25608506p25608506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error while plotting
I am getting the following errors when I am trying to plot the data below. I cannot figure out the error. Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf #I am using the following code #== library(multcomp) sig.data-structure(list(X = 1:10, Cell.lines = structure(c(2L, 5L, 8L, 9L, 3L, 6L, 10L, 1L, 4L, 7L), .Label = c(T(70%)a-N(0%)c, T(70%)a-N(0%)f, T(70%)a-N(0%)i, T(70%)c-N(0%)c, T(70%)c-N(0%)f, T(70%)c-N(0%)i, T(80%)a-N(0%)c, T(80%)a-N(0%)f, T(90%)-N(0%)f, T(90%)-N(0%)i ), class = factor), estimate = c(9859.74333, -5553.64802, 6227.17947, 8063.6472, 6548.86032, -8864.53103, 4752.7642, 9057.72021, -6355.67115, 5425.15635), lower = c(5560.57875, -9852.8126, 1928.01489, 3764.48262, 2249.69575, -13163.69561, 453.59962, 4758.55563, -10654.83573, 1125.99177), upper = c(14158.90791, -1254.48344, 10526.34405, 12362.81178, 10848.0249, -4565.36645, 9051.92877, 13356.88479, -2056.50657, 9724.32092), p.val.raw = c(1.15e-08, 5.78e-05, 1.36e-05, 3.21e-07, 6.91e-06, 6.97e-08, 0.000331, 4.87e-08, 1.04e-05, 7.63e-05 ), p.val.bon = c(2.66e-06, 0.0133, 0.00315, 7.41e-05, 0.0016, 1.61e-05, 0.0764, 1.13e-05, 0.0024, 0.0176), p.val.adj = c(2.65e-13, 0.000592, 2.82e-05, 9.72e-08, 6.56e-05, 8.76e-09, 0.0117, 6.22e-09, 6.44e-06, 0.000334)), .Names = c(X, Cell.lines, estimate, lower, upper, p.val.raw, p.val.bon, p.val.adj), class = data.frame, row.names = c(T(70%)a-N(0%)f, T(70%)c-N(0%)f, T(80%)a-N(0%)f, T(90%)-N(0%)f, T(70%)a-N(0%)i, T(70%)c-N(0%)i, T(90%)-N(0%)i, T(70%)a-N(0%)c, T(70%)c-N(0%)c, T(80%)a-N(0%)c)) rownames(sig.data)-sig.data[,2] my.hmtest - structure(list( estimate = t(t(structure(sig.data[,estimate], .Names = rownames(sig.data, conf.int = sig.data[,4:5], ctype = ABCC4-2007), class = hmtest) par(mex=0.5) #This helps to accomodate the margins when text is getting cut off plot(my.hmtest, cex.axis=0.7) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting a asymmetric logistic peak curve
UseRs, I am working on the analysis of green area growth in winter wheat and the effects of the amount of water on it. I am trying to fit a asymmetric logistic peak curve to my data as described by Royo et al., Europ. J Agronomy 20 (2004) 419. I want to calculate the maximum green area, maximum growth rate and senescence for each cultivar in each treatment. I started by calculating the means of all cultivars in all treatments in each sampling date and fit this data to the curve using the nls function in the stats package. I am new to non-linear regression and I am getting the error described below. After doing some search, it seems that the problem is the start values of the coefficients and some suggestions were done by linearizing the data in order to have better starting values. I have no idea how to do this with my data. Any help in solving this problem will be appreciated. Marc. sessionInfo() R version 2.9.2 (2009-08-24) i486-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] chron_2.3-30 x ndays y 1 99 0.4047951 2112 0.5894659 3125 0.6570246 4133 0.7065050 5139 0.6634155 6148 0.7051833 7162 0.6794740 8169 0.6399054 9175 0.4850703 10 182 0.2961120 model3 - nls(x ~ a + (b/e)*{(1+exp(ndays+d*log(e)-f)/d)^-((e+1)/e)}*{(exp(ndays+d*log(e)-f)/d)^-(e+1)/e}*(e+1)^{(e+1)/e}, + data = x,start = list(a = -1, b = 0.5, f=-0.4, d=0.6, e=2) + ) Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression for Multinomial Data using R
Hi I want to do logistic regression for multinomial data. How can I do it in R? Thanks a lot Nimal Fernando [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.delim very slow in reading files with lots of columns
Thanks, Ben. The matrix is a pure numeric matrix (6x70, 31mb). I tried the colClasses='numeric' as well as nrows=7(one of these is header line) on the matrix. Also I tested it with not setting the two options in read.delim() Here is the time spent on reading the matrix for each test. system.time( tmp - read.delim(test_data.txt)) usersystem elapsed 50985.42127.665 51013.384 system.time(tmp - read.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=)) usersystem elapsed 51301.56360.491 51362.208 It seems setting the options does not speed up the reading at all. Is it because of the header line? I will test it. Did I misunderstand something? One additional and interesting observation: The one with the options does save memory a lot. It took ~150mb, while the other took ~4GB for reading the matrix. I will try the scan() and see if it helps. Thanks! Mike -Original Message- From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] Sent: Wednesday, September 23, 2009 4:56 PM To: Ping-Hsun Hsieh Cc: r-help@r-project.org Subject: Re: [R] read.delim very slow in reading files with lots of columns use the 'colClasses' argument and you can also set 'nrows'. b On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote: Hi, I am trying to read a tab-delimited file into R (Ver. 2.8). The machine I am using is 64bit Linux with 16 GB. The file is basically a matrix(~600x70) and as large as 3GB. The read.delim() ran extremely slow (hours) even with a subset of the file (31 MB with 6x70) I monitored the memory usage, and found it constantly only took less than 1% of 16GB memory. Does read.delim() have difficulty to read files with lots of columns? Any suggestions? Thanks, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SEa nd CI
How can I get the the standard error and confidence interval for the prediction in a multiple regression model using the R command? for a simple regression I used *predict(xc, newdata=data.frame(var1=10.),se=T) where xc is the glm model using binomial and var1 is teh variable. * I can get the upper and lower intervals of the prediction Any help is welcome . [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric test for location with two unpaired setsof data measured on ordinal scale.
Yes, I agree that the median makes the most sense here, but there could be other measures of location that would be of interest (quartiles, some version of the rank sum). Here is some sample code for a permutation test on the medians (there are a couple of packages that will do this as well, but this is pretty straight forward with straight R code): set1 - c(1,3,2,2,4,3,3,2,2) set2 - c(4,4,4,3,3,5,4,4) sets - c(set1,set2) g1 - seq_along(set1) orig - median( sets[ -g1 ] ) - median( sets[ g1 ] ) perms - replicate( 1999, { tmp - sample(sets) median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) # or pb - winProgressBar(max=1999) setWinProgressBar(pb, 0) perms - replicate(1999, { setWinProgressBar( pb, getWinProgressBar(pb) + 1 ) tmp - sample(sets) median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) close(pb) perms - c(orig,perms) sum( perms = orig ) mean( perms = orig ) prop.test( sum(perms=orig), length(perms) ) hist(perms) abline(v=orig, col='blue') (if you want the progress bar on an os other than windows, then use the tcltk package and the tkProgressBar). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: John Sorkin [mailto:jsor...@grecc.umaryland.edu] Sent: Thursday, September 24, 2009 2:52 PM To: Greg Snow; r-help@r-project.org Subject: Re: [R] Non-parametric test for location with two unpaired setsof data measured on ordinal scale. Greg, I used the term location because I did not want to use the terms mean or median for the exact reason that you gave; these to values can be different in a given distribution. I want to test the null hypothesis that the data come from a single distribution. This is often done by comparing a measure of location (e.g. mean for ANOVOA), but as you know the mean need not be the only measure of location that is tested. Giiven that my data are measured on an ordinal scale, the mean is without meaning, so I suspect that the best measure for me would be a comparison of medians, but I am open to other suggestions. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Greg Snow greg.s...@imail.org 9/24/2009 4:30 PM What do you mean by location? I can think of examples where 2 distributions have the same median but different means, or the same means but different medians. Are you willing to assume that the distributions are exactly the same under the null hypothesis? (not just the same 'center/location') I would probably do a permutation test on the difference between the means or medians (which ever you think is more meaningful), this assumes the exact same distribution under the null. You can also do a Mann-Whitney/Wilcoxin test (but I don't like explaining, or sometimes even thinking about, what it is actually testing), you could do a bootstrap confidence interval on the difference between means/medians (does not assume distributions are the same, just have same mean/median), or you could just replace all values by their ranks and do a t-test (essentially transforms the data to a uniform distribution, the CLT for the uniform kicks in around n=5, but I would simulate just to check). This is not the nice simple answer that you were probably looking for, but hopefully it gives you some things to think about that will help, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of John Sorkin Sent: Thursday, September 24, 2009 1:08 PM To: r-help@r-project.org Subject: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale. Please forgive a stats question. I have to sets of data (unpaired) measured on an ordinal scale. I want to test to see if the two sets are different (i.e. do they have the same location): set1: 1,3,2,2,4,3,3,2,2 set: 4,4,4,3,3,5,4,4 What is the most appropriate non-parametric test to test location? Thanks, John Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help
Re: [R] Logistic Regression for Multinomial Data using R
Use polr from the MASS package Nimal Fernando pnp...@gmail.com Sent by: r-help-boun...@r-project.org 09/25/2009 12:33 PM To r-help@r-project.org cc Subject Re: [R] Logistic Regression for Multinomial Data using R Hi I want to do logistic regression for multinomial data. How can I do it in R? Thanks a lot Nimal Fernando [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.delim very slow in reading files with lots of columns
On Fri, 25 Sep 2009, Ping-Hsun Hsieh wrote: Thanks, Ben. The matrix is a pure numeric matrix (6x70, 31mb). I tried the colClasses='numeric' as well as nrows=7(one of these is header line) on the matrix. Also I tested it with not setting the two options in read.delim() A couple of things come to mind. First, I have not read the internals of scan, but suspect that parsing a really long line may be slowing things down. Since you are attempting to read in a numeric matrix, you can simply do a global replacement of your delimiter with a newline and use scan on the result. On unix-like systems, something like tmp - scan( pipe( 'tr \t \n test_data.txt' ) ) ought to help. Second, the memory occupied by each line - once it has been processed - is spread over the full 32MB (or 3.2 GB for the 600 by 70 version) region of memory. I am guessing that this is causing your cache to work hard to put it in place. If you really want the result to be a 600 by 70 matrix, you might try to read it in smaller blocks using scan( pipe( cut ... ) ) to feed selected blocks of columns of your text file to R. HTH, Chuck Here is the time spent on reading the matrix for each test. system.time( tmp - read.delim(test_data.txt)) usersystem elapsed 50985.42127.665 51013.384 system.time(tmp - read.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=)) usersystem elapsed 51301.56360.491 51362.208 It seems setting the options does not speed up the reading at all. Is it because of the header line? I will test it. Did I misunderstand something? One additional and interesting observation: The one with the options does save memory a lot. It took ~150mb, while the other took ~4GB for reading the matrix. I will try the scan() and see if it helps. Thanks! Mike -Original Message- From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] Sent: Wednesday, September 23, 2009 4:56 PM To: Ping-Hsun Hsieh Cc: r-help@r-project.org Subject: Re: [R] read.delim very slow in reading files with lots of columns use the 'colClasses' argument and you can also set 'nrows'. b On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote: Hi, I am trying to read a tab-delimited file into R (Ver. 2.8). The machine I am using is 64bit Linux with 16 GB. The file is basically a matrix(~600x70) and as large as 3GB. The read.delim() ran extremely slow (hours) even with a subset of the file (31 MB with 6x70) I monitored the memory usage, and found it constantly only took less than 1% of 16GB memory. Does read.delim() have difficulty to read files with lots of columns? Any suggestions? Thanks, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data
You can use R.utils (on CRAN) to help you figure out why the file is not found or not readable. library(R.utils); pathname - C:/Documents and Settings/ashta/My Documents/R_data/rel.dat; pathname - Arguments$getReadablePathname(pathname); rel - read.table(pathname, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)); If the file is not found it gives an error an tries to tell you why, e.g. Arguments$getReadablePathname(C:/Windows/system32/cmd.exe) [1] C:/Windows/system32/cmd.exe Arguments$getReadablePathname(C:/Windows/system323/cmd.exe) Error in list(`Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)` = environment, : [2009-09-25 10:11:57] Exception: Pathname not found: C:/Windows/system323/cmd.exe (C:/Windows/ exists, but nothing beyond) at throw(Exception(...)) at throw.default(Pathname not found: , pathname, reason) at throw(Pathname not found: , pathname, reason) at method(static, ...) at Arguments$getReadablePathname(C:/Windows/system323/cmd.exe) It will also tell you if the file exists, but you don't have the permission to read it. Second, your error message reports on a pathname that starts with 'file=', which I've never seen; cannot open file 'file=C:/Documents and Settings/sewalem/MyDocuments/R_data/rel.dat': Invalid argument what version of R are you use, i.e. what does sessionInfo() give? Third, it is true that backslashes need to be escaped. However, *forward-slashes* work with *any platform*. I stick with the latter so I don't have to think about it. It should make no difference in your case. My $.02 /Henrik On Fri, Sep 25, 2009 at 7:32 AM, Michael A. Miller mmill...@iupui.edu wrote: Sometimes it is easiest to open a file using a file selection widget. I keep this in my .Rprofile: getOpenFile - function(...){ require(tcltk) return(tclvalue(tkgetOpenFile())) } With this you can find your file and open it with rel - read.table(getOpenFile(), quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) or filename - getOpenFile() rel - read.table(filename, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) Mike P.S. I keep a couple functions on hand for choosing writable files and directories too... getSaveFile - function(...){ require(tcltk) return(tclvalue(tkgetSaveFile())) } chooseDir - function(...){ require(tcltk) return(tclvalue(tkchooseDirectory())) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data
On Fri, Sep 25, 2009 at 10:18 AM, Henrik Bengtsson h...@stat.berkeley.edu wrote: You can use R.utils (on CRAN) to help you figure out why the file is not found or not readable. library(R.utils); pathname - C:/Documents and Settings/ashta/My Documents/R_data/rel.dat; pathname - Arguments$getReadablePathname(pathname); rel - read.table(pathname, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)); If the file is not found it gives an error an tries to tell you why, e.g. Arguments$getReadablePathname(C:/Windows/system32/cmd.exe) [1] C:/Windows/system32/cmd.exe Arguments$getReadablePathname(C:/Windows/system323/cmd.exe) Error in list(`Arguments$getReadablePathname(C:/Windows/system323/cmd.exe)` = environment, : [2009-09-25 10:11:57] Exception: Pathname not found: C:/Windows/system323/cmd.exe (C:/Windows/ exists, but nothing beyond) at throw(Exception(...)) at throw.default(Pathname not found: , pathname, reason) at throw(Pathname not found: , pathname, reason) at method(static, ...) at Arguments$getReadablePathname(C:/Windows/system323/cmd.exe) It will also tell you if the file exists, but you don't have the permission to read it. Second, your error message reports on a pathname that starts with 'file=', which I've never seen; cannot open file 'file=C:/Documents and Settings/sewalem/MyDocuments/R_data/rel.dat': Invalid argument what version of R are you use, i.e. what does sessionInfo() give? Did you *really* do? rel - read.table(C:/Documents and Settings/sewalem/MyDocuments/R_data/rel.dat, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) or did you try to do: rel - read.table(file=C:/Documents and Settings/sewalem/MyDocuments/R_data/rel.dat, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) but wrote? rel - read.table(file=C:/Documents and Settings/sewalem/MyDocuments/R_data/rel.dat, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) /H Third, it is true that backslashes need to be escaped. However, *forward-slashes* work with *any platform*. I stick with the latter so I don't have to think about it. It should make no difference in your case. My $.02 /Henrik On Fri, Sep 25, 2009 at 7:32 AM, Michael A. Miller mmill...@iupui.edu wrote: Sometimes it is easiest to open a file using a file selection widget. I keep this in my .Rprofile: getOpenFile - function(...){ require(tcltk) return(tclvalue(tkgetOpenFile())) } With this you can find your file and open it with rel - read.table(getOpenFile(), quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) or filename - getOpenFile() rel - read.table(filename, quote=, header=FALSE, sep=, col.names=c(id,orel,nrel)) Mike P.S. I keep a couple functions on hand for choosing writable files and directories too... getSaveFile - function(...){ require(tcltk) return(tclvalue(tkgetSaveFile())) } chooseDir - function(...){ require(tcltk) return(tclvalue(tkchooseDirectory())) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep or other complex string matching approach to capture necessary information...
Say I have the following data: house_number-floor(runif(100, 200, 600)) water_evaluation-c(No water damage, Water damage, Water On, Water off, water pipes damaged, leaking water) water_evaluation_selection-floor(runif(100, 1,6)) house_info-data.frame(water_evaluation[water_evaluation_selection], house_number) And, that I only want to pull out the ones with negative water evaluations, i.e. Water damage, water pipes damaged, and leaking water. Should/could I use grep in order to pull the house numbers out of house_info with those negative water evaluations? I guess I want to know the house numbers from house_info where the water evaluation is negative. Is there a way to use grep or another R function in order to acquire that information? Thank you again in advance for any insights. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download from github
That's strange, my pc is not that slow, it has 3 mb of Ram. The download button doesn't respond either using my computer at work or at home. When you click the download button, Do you get a dialog box prompting you where to save the files? --- On Thu, 9/24/09, Charlie Sharpsteen ch...@sharpsteen.net wrote: From: Charlie Sharpsteen ch...@sharpsteen.net Subject: Re: [R] How to download from github To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-help@r-project.org Date: Thursday, September 24, 2009, 9:06 PM Hmm, clicking on the 'Download' button and then on either the 'TAR' or 'ZIP' icons is working fine for me. It might take a while for the actual download to start-- GitHub has to compress the files which can take a half a minute or more. Also, GitHub appears to be preparing for a move to a new set of servers-- this may cause some instability and weirdness in the way the website responds. Good luck! -Charlie On Thu, Sep 24, 2009 at 8:20 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi: Is my first attempt to try to download from github. Nothing happens by clicking on the 'download' button. Could anyone give me a hint on how to get all the files from the link below? Thanks http://github.com/hadley/ggplot2-bayarea/tree/0a8bf71dea38cfbf2d928eb713d24dfd928359fc Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep or other complex string matching approach to capture necessary information...
You could use grep, but it's probably easier to use %in% (see also is.element()), e.g.: house_info[ house_info[,1] %in% c(Water damage, water pipes damaged, leaking water), ] water_evaluation.water_evaluation_selection. house_number 6 water pipes damaged 489 8 water pipes damaged 512 11 water pipes damaged 597 19 Water damage 478 21 water pipes damaged 373 23 Water damage 465 house_info[ house_info[,1] %in% c(Water damage, water pipes damaged, leaking water), 2] [1] 489 512 597 478 373 465 337 362 234 535 551 351 415 495 220 216 317 443 346 577 585 268 463 441 225 200 304 486 390 476 485 247 [33] 399 504 262 551 575 359 538 sort(unique(house_info[ house_info[,1] %in% c(Water damage, water pipes damaged, leaking water), 2])) [1] 200 216 220 225 234 247 262 268 304 317 337 346 351 359 362 373 390 399 415 441 443 463 465 476 478 485 486 489 495 504 512 535 [33] 538 551 575 577 585 597 Also, an easier way to generated random integers is sample(), e.g. sample(1:3, size=5, rep=T) [1] 3 1 2 1 1 (This is more straightforward, and more easily avoids possibly unintended errors such as floor(runif(100, 1,6) never generating a 6, but do be careful of the gotcha that sample(2:3, ...) will generate a selection of 2's and 3's, while sample(3,...) will generate samples from 1, 2, and 3.) -- Tony Plate Jason Rupert wrote: Say I have the following data: house_number-floor(runif(100, 200, 600)) water_evaluation-c(No water damage, Water damage, Water On, Water off, water pipes damaged, leaking water) water_evaluation_selection-floor(runif(100, 1,6)) house_info-data.frame(water_evaluation[water_evaluation_selection], house_number) And, that I only want to pull out the ones with negative water evaluations, i.e. Water damage, water pipes damaged, and leaking water. Should/could I use grep in order to pull the house numbers out of house_info with those negative water evaluations? I guess I want to know the house numbers from house_info where the water evaluation is negative. Is there a way to use grep or another R function in order to acquire that information? Thank you again in advance for any insights. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download from github
What is browser that you are using to download? Try the direct link to download: http://github.com/hadley/ggplot2-bayarea/zipball/0a8bf71dea38cfbf2d928eb713d24dfd928359fc On Fri, Sep 25, 2009 at 3:07 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: That's strange, my pc is not that slow, it has 3 mb of Ram. The download button doesn't respond either using my computer at work or at home. When you click the download button, Do you get a dialog box prompting you where to save the files? --- On Thu, 9/24/09, Charlie Sharpsteen ch...@sharpsteen.net wrote: From: Charlie Sharpsteen ch...@sharpsteen.net Subject: Re: [R] How to download from github To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-help@r-project.org Date: Thursday, September 24, 2009, 9:06 PM Hmm, clicking on the 'Download' button and then on either the 'TAR' or 'ZIP' icons is working fine for me. It might take a while for the actual download to start-- GitHub has to compress the files which can take a half a minute or more. Also, GitHub appears to be preparing for a move to a new set of servers-- this may cause some instability and weirdness in the way the website responds. Good luck! -Charlie On Thu, Sep 24, 2009 at 8:20 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi: Is my first attempt to try to download from github. Nothing happens by clicking on the 'download' button. Could anyone give me a hint on how to get all the files from the link below? Thanks http://github.com/hadley/ggplot2-bayarea/tree/0a8bf71dea38cfbf2d928eb713d24dfd928359fc Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.
Greg and John, Just to throw it out there, the data sets here are small enough that you co do a fully enumerable permutation test by replacing your replicate() call with: perms - combn(17, 9, function(x) median(sets[x]) - median(sets[-x])) This is based on an off-list communication that I had with Peter Dalgaard about 3 years ago for a different scenario and gives you: choose(17, 9) [1] 24310 permutations. It does not take long: system.time(perms - combn(17, 9, function(x) median(sets[x]) - median(sets[-x]))) user system elapsed 3.863 0.019 3.898 Which yields: str(perms) num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... table(perms) perms -2 -1.5 -1 -0.50 0.51 1.52 285 175 2595 7000 8050 875 3720 1425 185 perms - c(orig,perms) prop.test( sum(perms=orig), length(perms) ) # or binom.test(sum(perms = orig), length(perms)) # Variation on the graphic... plot(table(perms), type = h) abline(v = orig, col = blue) See ?combn for more information. HTH, Marc Schwartz On Sep 25, 2009, at 11:47 AM, Greg Snow wrote: Yes, I agree that the median makes the most sense here, but there could be other measures of location that would be of interest (quartiles, some version of the rank sum). Here is some sample code for a permutation test on the medians (there are a couple of packages that will do this as well, but this is pretty straight forward with straight R code): set1 - c(1,3,2,2,4,3,3,2,2) set2 - c(4,4,4,3,3,5,4,4) sets - c(set1,set2) g1 - seq_along(set1) orig - median( sets[ -g1 ] ) - median( sets[ g1 ] ) perms - replicate( 1999, { tmp - sample(sets) median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) # or pb - winProgressBar(max=1999) setWinProgressBar(pb, 0) perms - replicate(1999, { setWinProgressBar( pb, getWinProgressBar(pb) + 1 ) tmp - sample(sets) median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) close(pb) perms - c(orig,perms) sum( perms = orig ) mean( perms = orig ) prop.test( sum(perms=orig), length(perms) ) hist(perms) abline(v=orig, col='blue') (if you want the progress bar on an os other than windows, then use the tcltk package and the tkProgressBar). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: John Sorkin [mailto:jsor...@grecc.umaryland.edu] Sent: Thursday, September 24, 2009 2:52 PM To: Greg Snow; r-help@r-project.org Subject: Re: [R] Non-parametric test for location with two unpaired setsof data measured on ordinal scale. Greg, I used the term location because I did not want to use the terms mean or median for the exact reason that you gave; these to values can be different in a given distribution. I want to test the null hypothesis that the data come from a single distribution. This is often done by comparing a measure of location (e.g. mean for ANOVOA), but as you know the mean need not be the only measure of location that is tested. Giiven that my data are measured on an ordinal scale, the mean is without meaning, so I suspect that the best measure for me would be a comparison of medians, but I am open to other suggestions. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Greg Snow greg.s...@imail.org 9/24/2009 4:30 PM What do you mean by location? I can think of examples where 2 distributions have the same median but different means, or the same means but different medians. Are you willing to assume that the distributions are exactly the same under the null hypothesis? (not just the same 'center/location') I would probably do a permutation test on the difference between the means or medians (which ever you think is more meaningful), this assumes the exact same distribution under the null. You can also do a Mann-Whitney/Wilcoxin test (but I don't like explaining, or sometimes even thinking about, what it is actually testing), you could do a bootstrap confidence interval on the difference between means/medians (does not assume distributions are the same, just have same mean/median), or you could just replace all values by their ranks and do a t-test (essentially transforms the data to a uniform distribution, the CLT for the uniform kicks in around n=5, but I would simulate just to check). This is not the nice simple answer that you were probably looking for, but hopefully it gives you some things to think about that will help, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of John
Re: [R] read.delim very slow in reading files with lots of columns
it may be worth it writing a script to transpose the data (in awk, it takes 10min on my laptop)... then read in the transposed data... system.time({x - read.delim(testTransposed.txt, header=F, colClasses=numeric, nrow=70); x - t(x)}) user system elapsed 4.958 0.412 5.477 b On Sep 25, 2009, at 1:35 PM, Ping-Hsun Hsieh wrote: Thanks, Ben. The matrix is a pure numeric matrix (6x70, 31mb). I tried the colClasses='numeric' as well as nrows=7(one of these is header line) on the matrix. Also I tested it with not setting the two options in read.delim() Here is the time spent on reading the matrix for each test. system.time( tmp - read.delim(test_data.txt)) usersystem elapsed 50985.42127.665 51013.384 system.time(tmp - read .delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=)) usersystem elapsed 51301.56360.491 51362.208 It seems setting the options does not speed up the reading at all. Is it because of the header line? I will test it. Did I misunderstand something? One additional and interesting observation: The one with the options does save memory a lot. It took ~150mb, while the other took ~4GB for reading the matrix. I will try the scan() and see if it helps. Thanks! Mike -Original Message- From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] Sent: Wednesday, September 23, 2009 4:56 PM To: Ping-Hsun Hsieh Cc: r-help@r-project.org Subject: Re: [R] read.delim very slow in reading files with lots of columns use the 'colClasses' argument and you can also set 'nrows'. b On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote: Hi, I am trying to read a tab-delimited file into R (Ver. 2.8). The machine I am using is 64bit Linux with 16 GB. The file is basically a matrix(~600x70) and as large as 3GB. The read.delim() ran extremely slow (hours) even with a subset of the file (31 MB with 6x70) I monitored the memory usage, and found it constantly only took less than 1% of 16GB memory. Does read.delim() have difficulty to read files with lots of columns? Any suggestions? Thanks, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download from github
Henrique: It worked nicely, I am using IE 6.0. Thanks a lot for your help --- On Fri, 9/25/09, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] How to download from github To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: Charlie Sharpsteen ch...@sharpsteen.net, r-help@r-project.org Date: Friday, September 25, 2009, 11:15 AM What is browser that you are using to download? Try the direct link to download: http://github.com/hadley/ggplot2-bayarea/zipball/0a8bf71dea38cfbf2d928eb713d24dfd928359fc On Fri, Sep 25, 2009 at 3:07 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: That's strange, my pc is not that slow, it has 3 mb of Ram. The download button doesn't respond either using my computer at work or at home. When you click the download button, Do you get a dialog box prompting you where to save the files? --- On Thu, 9/24/09, Charlie Sharpsteen ch...@sharpsteen.net wrote: From: Charlie Sharpsteen ch...@sharpsteen.net Subject: Re: [R] How to download from github To: Felipe Carrillo mazatlanmex...@yahoo.com Cc: r-help@r-project.org Date: Thursday, September 24, 2009, 9:06 PM Hmm, clicking on the 'Download' button and then on either the 'TAR' or 'ZIP' icons is working fine for me. It might take a while for the actual download to start-- GitHub has to compress the files which can take a half a minute or more. Also, GitHub appears to be preparing for a move to a new set of servers-- this may cause some instability and weirdness in the way the website responds. Good luck! -Charlie On Thu, Sep 24, 2009 at 8:20 PM, Felipe Carrillo mazatlanmex...@yahoo.com wrote: Hi: Is my first attempt to try to download from github. Nothing happens by clicking on the 'download' button. Could anyone give me a hint on how to get all the files from the link below? Thanks http://github.com/hadley/ggplot2-bayarea/tree/0a8bf71dea38cfbf2d928eb713d24dfd928359fc Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.
Thanks Marc, The sampling is so easy that I often forget that we can do the exact permutation test for smaller samples (and I can never remember when small is small enough for this). With the exact permutations we really don't need to do the prop.test or binom.test, I usually do that to get the confidence interval on the p-value due to sampling from the permutations rather than doing all possible (and this tells me if I need to increase the number of permutations to be sure my p-value is precise enough). With all possible permutations, there is no sampling, and no need for an interval, the p-value is exact. Thanks again, I need to remember combn. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Marc Schwartz [mailto:marc_schwa...@me.com] Sent: Friday, September 25, 2009 12:17 PM To: Greg Snow Cc: John Sorkin; r-help@r-project.org Subject: Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale. Greg and John, Just to throw it out there, the data sets here are small enough that you co do a fully enumerable permutation test by replacing your replicate() call with: perms - combn(17, 9, function(x) median(sets[x]) - median(sets[- x])) This is based on an off-list communication that I had with Peter Dalgaard about 3 years ago for a different scenario and gives you: choose(17, 9) [1] 24310 permutations. It does not take long: system.time(perms - combn(17, 9, function(x) median(sets[x]) - median(sets[-x]))) user system elapsed 3.863 0.019 3.898 Which yields: str(perms) num [1:24310(1d)] -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 ... table(perms) perms -2 -1.5 -1 -0.50 0.51 1.52 285 175 2595 7000 8050 875 3720 1425 185 perms - c(orig,perms) prop.test( sum(perms=orig), length(perms) ) # or binom.test(sum(perms = orig), length(perms)) # Variation on the graphic... plot(table(perms), type = h) abline(v = orig, col = blue) See ?combn for more information. HTH, Marc Schwartz On Sep 25, 2009, at 11:47 AM, Greg Snow wrote: Yes, I agree that the median makes the most sense here, but there could be other measures of location that would be of interest (quartiles, some version of the rank sum). Here is some sample code for a permutation test on the medians (there are a couple of packages that will do this as well, but this is pretty straight forward with straight R code): set1 - c(1,3,2,2,4,3,3,2,2) set2 - c(4,4,4,3,3,5,4,4) sets - c(set1,set2) g1 - seq_along(set1) orig - median( sets[ -g1 ] ) - median( sets[ g1 ] ) perms - replicate( 1999, { tmp - sample(sets) median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) # or pb - winProgressBar(max=1999) setWinProgressBar(pb, 0) perms - replicate(1999, { setWinProgressBar( pb, getWinProgressBar(pb) + 1 ) tmp - sample(sets) median( tmp[ -g1 ] ) - median( tmp[ g1 ] ) } ) close(pb) perms - c(orig,perms) sum( perms = orig ) mean( perms = orig ) prop.test( sum(perms=orig), length(perms) ) hist(perms) abline(v=orig, col='blue') (if you want the progress bar on an os other than windows, then use the tcltk package and the tkProgressBar). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: John Sorkin [mailto:jsor...@grecc.umaryland.edu] Sent: Thursday, September 24, 2009 2:52 PM To: Greg Snow; r-help@r-project.org Subject: Re: [R] Non-parametric test for location with two unpaired setsof data measured on ordinal scale. Greg, I used the term location because I did not want to use the terms mean or median for the exact reason that you gave; these to values can be different in a given distribution. I want to test the null hypothesis that the data come from a single distribution. This is often done by comparing a measure of location (e.g. mean for ANOVOA), but as you know the mean need not be the only measure of location that is tested. Giiven that my data are measured on an ordinal scale, the mean is without meaning, so I suspect that the best measure for me would be a comparison of medians, but I am open to other suggestions. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Greg Snow greg.s...@imail.org 9/24/2009 4:30 PM What do you mean by location? I can think of examples where 2 distributions have the same median but different means, or the same
Re: [R] read.delim very slow in reading files with lots of columns
Here is how much time it took to read a file with 10 lines and 700,000 columns per line separated with comma: system.time(input - scan(/tempxx.txt, what=0, sep=',')) Read 700 items user system elapsed 15.620.22 15.84 object.size(input) 5624 bytes 'scan' should be sufficient and it will not take another 10 minutes in awk. On Fri, Sep 25, 2009 at 1:17 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Fri, 25 Sep 2009, Ping-Hsun Hsieh wrote: Thanks, Ben. The matrix is a pure numeric matrix (6x70, 31mb). I tried the colClasses='numeric' as well as nrows=7(one of these is header line) on the matrix. Also I tested it with not setting the two options in read.delim() A couple of things come to mind. First, I have not read the internals of scan, but suspect that parsing a really long line may be slowing things down. Since you are attempting to read in a numeric matrix, you can simply do a global replacement of your delimiter with a newline and use scan on the result. On unix-like systems, something like tmp - scan( pipe( 'tr \t \n test_data.txt' ) ) ought to help. Second, the memory occupied by each line - once it has been processed - is spread over the full 32MB (or 3.2 GB for the 600 by 70 version) region of memory. I am guessing that this is causing your cache to work hard to put it in place. If you really want the result to be a 600 by 70 matrix, you might try to read it in smaller blocks using scan( pipe( cut ... ) ) to feed selected blocks of columns of your text file to R. HTH, Chuck Here is the time spent on reading the matrix for each test. system.time( tmp - read.delim(test_data.txt)) user system elapsed 50985.421 27.665 51013.384 system.time(tmp - read.delim(test_data.txt,colClasses=numeric,nrows=7,comment.char=)) user system elapsed 51301.563 60.491 51362.208 It seems setting the options does not speed up the reading at all. Is it because of the header line? I will test it. Did I misunderstand something? One additional and interesting observation: The one with the options does save memory a lot. It took ~150mb, while the other took ~4GB for reading the matrix. I will try the scan() and see if it helps. Thanks! Mike -Original Message- From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] Sent: Wednesday, September 23, 2009 4:56 PM To: Ping-Hsun Hsieh Cc: r-help@r-project.org Subject: Re: [R] read.delim very slow in reading files with lots of columns use the 'colClasses' argument and you can also set 'nrows'. b On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote: Hi, I am trying to read a tab-delimited file into R (Ver. 2.8). The machine I am using is 64bit Linux with 16 GB. The file is basically a matrix(~600x70) and as large as 3GB. The read.delim() ran extremely slow (hours) even with a subset of the file (31 MB with 6x70) I monitored the memory usage, and found it constantly only took less than 1% of 16GB memory. Does read.delim() have difficulty to read files with lots of columns? Any suggestions? Thanks, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] On what (shared) machine will R perform faster?
Dear R-ers, need your advice on hardware (beware - I am not knowledgeable about that). I find R runs wonderfully on laptops. In my company, we decided to get some kind of a powerful computer (server?) so that we could run big jobs on it (e.g., in R, SAS, SPSS, Excel). We were thinking of something more powerful than a simple additional laptop or desktop, but something with superior computing power - something everyone could log into remotely, and something that several people could work on simultaneously. Our IT has given us what they call a virtual server, but it runs more slowly than my laptop - I checked! Any advice on what we should be looking for? What should that be? What technical characteristics should it have? And should we then just install the regular R on it or something else? Thank you very much for your advice! -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.delim very slow in reading files with lots of columns
or that! :-D thanks jim. b On Sep 25, 2009, at 3:57 PM, jim holtman wrote: Here is how much time it took to read a file with 10 lines and 700,000 columns per line separated with comma: system.time(input - scan(/tempxx.txt, what=0, sep=',')) Read 700 items user system elapsed 15.620.22 15.84 object.size(input) 5624 bytes 'scan' should be sufficient and it will not take another 10 minutes in awk. On Fri, Sep 25, 2009 at 1:17 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote: On Fri, 25 Sep 2009, Ping-Hsun Hsieh wrote: Thanks, Ben. The matrix is a pure numeric matrix (6x70, 31mb). I tried the colClasses='numeric' as well as nrows=7(one of these is header line) on the matrix. Also I tested it with not setting the two options in read.delim() A couple of things come to mind. First, I have not read the internals of scan, but suspect that parsing a really long line may be slowing things down. Since you are attempting to read in a numeric matrix, you can simply do a global replacement of your delimiter with a newline and use scan on the result. On unix-like systems, something like tmp - scan( pipe( 'tr \t \n test_data.txt' ) ) ought to help. Second, the memory occupied by each line - once it has been processed - is spread over the full 32MB (or 3.2 GB for the 600 by 70 version) region of memory. I am guessing that this is causing your cache to work hard to put it in place. If you really want the result to be a 600 by 70 matrix, you might try to read it in smaller blocks using scan( pipe( cut ... ) ) to feed selected blocks of columns of your text file to R. HTH, Chuck Here is the time spent on reading the matrix for each test. system.time( tmp - read.delim(test_data.txt)) usersystem elapsed 50985.42127.665 51013.384 system.time(tmp - read .delim (test_data.txt,colClasses=numeric,nrows=7,comment.char=)) usersystem elapsed 51301.56360.491 51362.208 It seems setting the options does not speed up the reading at all. Is it because of the header line? I will test it. Did I misunderstand something? One additional and interesting observation: The one with the options does save memory a lot. It took ~150mb, while the other took ~4GB for reading the matrix. I will try the scan() and see if it helps. Thanks! Mike -Original Message- From: Benilton Carvalho [mailto:bcarv...@jhsph.edu] Sent: Wednesday, September 23, 2009 4:56 PM To: Ping-Hsun Hsieh Cc: r-help@r-project.org Subject: Re: [R] read.delim very slow in reading files with lots of columns use the 'colClasses' argument and you can also set 'nrows'. b On Sep 23, 2009, at 8:24 PM, Ping-Hsun Hsieh wrote: Hi, I am trying to read a tab-delimited file into R (Ver. 2.8). The machine I am using is 64bit Linux with 16 GB. The file is basically a matrix(~600x70) and as large as 3GB. The read.delim() ran extremely slow (hours) even with a subset of the file (31 MB with 6x70) I monitored the memory usage, and found it constantly only took less than 1% of 16GB memory. Does read.delim() have difficulty to read files with lots of columns? Any suggestions? Thanks, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
[R] How to open only one file in a .gz file?
Hi, Suppose that there are multiple files in a .gz file. How to open only one file in it? I don't find such options in the help. Regards, Peng __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function question
Hi. I was wondering how I can write a function that generates the outcome values for a user specified equation. For example, function(x^2, 4) will return back 16 and function(x^3 - 10, 2) will give back -2... I've been playing around with various lines of code but somehow, I just cannot get R to recognize the equation that I pass to it is just a random variable and doesn't need to be initialized... -- View this message in context: http://www.nabble.com/Function-question-tp25619434p25619434.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function question
Try this: foo - function(expr, x){ eval(substitute(expr)) } foo(x^2, 4) foo(x^3-10, 2) On Fri, Sep 25, 2009 at 6:16 PM, njhuang86 njhuan...@yahoo.com wrote: Hi. I was wondering how I can write a function that generates the outcome values for a user specified equation. For example, function(x^2, 4) will return back 16 and function(x^3 - 10, 2) will give back -2... I've been playing around with various lines of code but somehow, I just cannot get R to recognize the equation that I pass to it is just a random variable and doesn't need to be initialized... -- View this message in context: http://www.nabble.com/Function-question-tp25619434p25619434.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NLM
I am trying to understand NLM package, so I generated this data set consisting y and x using y= a + b*x +c*x^2 + N(0,10), with a=3.5,b=4.5,c=5.5 Given y and x, I am trying to use NLM to have estimates of parameters a, b and c that minimize the least square error my code looks like f- function(y,x,a,b,c) {sum((y-(a+b*x+c*x^2))^2)} nlm(f, y, x, a=3, b=4,c=5) But it comes up with rediculous result. My understanding of this NLM must be wrong. Please help! Andy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in make.names when trying to read.table in if statement
Does this work for you? data_list - list() filepattern=modrate* all_files - list.files(pattern=filepattern) data_list - lapply(all_files, read.table,header=TRUE,sep=,) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in make.names when trying to read.table in if statement
On Sep 21, 2009, at 12:19 PM, Cynthia Sadler wrote: Hi, I'm trying to read data from a collection of CSV files for processing and graphing. All of my files begin with modrate and end with .csv. I think I have the regex working but I am stumped at trying to get read.table to work within an if statement. This works: filepattern=modrate* files - list.files(pattern=filepattern) data - read.table(files[1], header=TRUE, sep=,) But I cannot get this to work: for (i in seq(along=files)) { + data - read.table(files[i], header=TRUE, sep=,) + } Error in make.names(col.names, unique = TRUE) : invalid multibyte string at 'ffd8ffe0' The error message suggests that you might have strange characters in your header lines which make.names() is choking on. What is the result of substituting readLines with parameter of n=1 for the read.table() call on those files? Or perhaps you should be using [[ rather than [ to access the files list? I'm sure I'm making a newbie mistake here. Despite several years, I still consider myself a newbie. I'm using R version 2.9.2 (2009-08-24) on Mac OS X, and didn't find anything about this in the help archives (though, as I'm new to this, I may not have searched in the best way). Any advice? Thanks. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with multi-objective program using lpSolve API
Hello, I am struggling with R and have little experience. I need help or suggestions to create a multi-objective program. I have a table as follows http://www.nabble.com/file/p25615459/smallmodel01.xls smallmodel01.xls My constraints are subject to each origin, their corresponding numbers and the specific number for that origin constraint. This creates my single objective model just fine. However, I need to hold these constraints true, in addition to subjecting the destinations to a specific constraint as well. The destination constraint is to ensure that all transportation to each destination is filled. As in, I need to make this decision binary saying that all transportation to that specific destination is being used or not used. Thank you, Becky -- View this message in context: http://www.nabble.com/help-with-multi-objective-program-using-lpSolve-API-tp25615459p25615459.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Starting values in “arima.sim” fu nction
Hello,  Could someone tell me please how can I find out which starting values has R used for the simulation?  I have AR(1) model:  y(t)=0.2*y(t-1)+0.2*y(t-2) + e(t)   (e(t) is distributed according standard normal distribution)  I need y(0) (or y(t-1), then t=1) values for my following calculations (it is very important parameter). Should I assume that y(0)=mean(yt) or set y(0)=0?  How to find out, which values R uses for y(0), y(-1) and so on?  Thank you very much for the answer!  Best regards, Lina Rusyte [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] evaluate a set of symbols within an IF statement
Hello, writing some R code to cleanse a data set, if the following set of symbols are identified then perform some actions. trying to write the minimum code to do this. tname = VIX checkticker = c(VIX, TYX, TNX, IRX) if (tname == checkticker) { //perform some operations } result i get is tname == checkticker [1] TRUE FALSE FALSE FALSE how do i evaluate this whole list to a single boolean True or False? If any of these are true the whole statement is True, else False. this only seems to work for the first ticker, the rest don't perform the operations within the loop. tname = IRX tname == checkticker [1] FALSE FALSE FALSE TRUE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evaluate a set of symbols within an IF statement
zubin-2 wrote: how do i evaluate this whole list to a single boolean True or False? If any of these are true the whole statement is True, else False. this only seems to work for the first ticker, the rest don't perform the operations within the loop. Try %in% tname %in% checkticker -Charlie - Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://www.nabble.com/evaluate-a-set-of-symbols-within-an-IF-statement-tp25620871p25620900.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-parametric test for location with two unpaired sets of data measured on ordinal scale.
Greg and Marc, Not that it's needed here but, of course, perm.test() in pkg:exactRankTests or oneway_test() in pkg:coin can be used. Using Marc's / Greg's computations, the (two-sided) p-value is sum(abs(perms) = abs(orig)) / length(perms) [1] 0.01937395 perm.test() and oneway_test give a p-value of 0.003249691, using the exact option. Question: Why the difference? Answer: perm.test() uses the *mean* difference instead of the median difference. (Easy to check: just replace 'median' with 'mean' in Marc's computation of perms.) As Greg correctly points out, different test statistics can sensibly be used. But which statistic, mean difference or median difference, is more appropriate for the given data? Assumptions: 1. the null hypothesis is that the two sets of observations represent random samples from the same distribution; 2. the range of the distribution consists of a small set of integers. Fire away! Peter Ehlers Greg Snow wrote: Thanks Marc, The sampling is so easy that I often forget that we can do the exact permutation test for smaller samples (and I can never remember when small is small enough for this). With the exact permutations we really don't need to do the prop.test or binom.test, I usually do that to get the confidence interval on the p-value due to sampling from the permutations rather than doing all possible (and this tells me if I need to increase the number of permutations to be sure my p-value is precise enough). With all possible permutations, there is no sampling, and no need for an interval, the p-value is exact. Thanks again, I need to remember combn. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Design Package - Penalized Logistic Reg. - Query
Dear R experts, The lrm function in the Design package can perform penalized (Ridge) logistic regression. It is my understanding that the ridge solutions are not equivalent under scaling of the inputs, so one normally standardizes the inputs. Do you know if input standardization is done internally in lrm or I would have to do it prior to applying this function. Also, as I'm new in R (coming from SAS) I don't know how well R will handle relatively large data sets (e.g. 1/2 million observations on 40 variables). I'll appreciate your comments. Many thanks in advance. Lars/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fourier Transfrom (FFT) Example
LOL Rolf. Yes I am sure it isn't homework. I am working on an aeroacoustics problem and was trying to figure out how to implement a fourier transform in R. I normally don't work in this field so this stuff was new to me at the time of writing. I have since figured it out. Unfortunately I don't have my actual code where I am now but here is an older version, it might have some bugs in it since I never verified this version. Anyway I hope it helps someone, even if it's your homework! Apparently some don't realize that there are different ways of learning, learning by example being one of those ways. func-function(x) { mag2-mag^2 f-f approx(f,mag2,x)$y } layout(matrix(c(1,2,3,4), 4, 1, byrow = TRUE)) #SETUP T- 5. #time 0 - T dt - 0.01 #s n- T/dt F- 1/dt # freq domain -F/2 - F/2 df - 1/T t- seq(0,T,by=dt) freq - 5 #Hz #SIGNAL FUNCTION y - 10*sin(2*pi*freq*t) #FREQ ARRAY f - 1:length(t)/T #FOURIER WORK Y - fft(y) mag - sqrt(Re(Y)^2+Im(Y)^2)*2/n #Amplitude phase - Arg(Y)*180/pi Yr- Re(Y) Yi- Im(Y) #PLOT SIGNALS plot(t,y,type=l,xlim=c(0,T)) grid(NULL,NULL, col = lightgray, lty = dotted,lwd = 1) par(mar=c(5, 4, 0, 2) + 0.1) plot(f[1:length(f)/2],phase[1:length(f)/2],type=l,xlab=Frequency, Hz,ylab=Phase,deg) grid(NULL,NULL, col = lightgray, lty = dotted,lwd = 1) plot(f[1:length(f)/2],mag[1:length(f)/2],type=l,xlab=Frequency, Hz,ylab=Amplitude) grid(NULL,NULL, col = lightgray, lty = dotted,lwd = 1) plot(f[1:length(f)/2],(mag^2)[1:length(f)/2],type=l,xlab=Frequency, Hz,ylab=Power, Amp^2,log=xy,ylim=c(10^-6,100)) pref-20E-6 #pa p-integrate(func,f[1],f[length(f)/2]) pwrDB-10*log10(p$value/pref^2) cat(Area under power curve: ,p$value,Pa ,pwrDB, dB\n) Rolf Turner-3 wrote: On 17/09/2009, at 3:39 AM, delic wrote: I wrote a script that I anticipating seeing a spike at 10Hz with the function 10* sin(2*pi*10*t). I can't figure out why my plots do not show spikes at the frequencies I expect. Am I doing something wrong or is my expectations wrong? (a) Is this a homework question? (b) Have you figured it out yet? (c) Hint: You have spikes at +/- 40 in a range from -50 to 50. You *want* spikes at 10 and 90 Hz. Could it be that you haven't set your frequency vector ``f'' quite right? :-) cheers, Rolf Turner P. S. You won't get spikes bang on at 10 and 90 Hz. because these are *not* Fourier frequencies when n = 256. If you want spikes in your periodogram at bang on 10 and 90 Hz use a value of n that is divisible by 10, e.g. n=500. Why would you want a power of 2 anyhow? (Well, the fft goes faster when n is a power of 2, but who cares?) R. T. ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Fourier-Transform-fft-help-tp25475063p25621211.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] renaming intercept column when retrieving coeficients from lme using coef function
I am still fairly new to R and have a fairly rudimentary question. I am trying to name a vector of coefficients retrieved from a multilevel model using the coef function. I guess the default name is Intercept and I cannot figure out how to rename it. I have tried the using the code below to name the column of coefficients ind.y derived from an lme model. Unfortunately, the name ind.y is not applied to the column. What can I do to name the column? toy-data.frame(ID=c(1,1,1,2,2,2,3,3,3,4,4,4), x=rnorm(12), y=rnorm(12)) model.toy-lme(y~1, random=~1|ID, data=toy) coef.y-(ind.y=coef(model.toy)) coef.y (Intercept) 1 0.52065015 2 0.04066776 3 0.29793571 4 0.11213693 Thanks, Eric McKibben Doctoral Candidate I-O Psychology Clemson University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] renaming intercept column when retrieving coeficients from lme using coef function
Is this what you want: coef.y (Intercept) 1 0.03109602 2 0.03109602 3 0.03109603 4 0.03109602 str(coef.y) Classes ‘coef.lme’, ‘ranef.lme’ and 'data.frame': 4 obs. of 1 variable: $ (Intercept): num 0.0311 0.0311 0.0311 0.0311 - attr(*, level)= int 1 - attr(*, label)= chr Coefficients - attr(*, effectNames)= chr (Intercept) - attr(*, standardized)= logi FALSE - attr(*, grpNames)= chr ID names(coef.y) - 'myName' coef.y myName 1 0.03109602 2 0.03109602 3 0.03109603 4 0.03109602 On Fri, Sep 25, 2009 at 9:10 PM, Eric McKibben emck...@clemson.edu wrote: I am still fairly new to R and have a fairly rudimentary question. I am trying to name a vector of coefficients retrieved from a multilevel model using the coef function. I guess the default name is Intercept and I cannot figure out how to rename it. I have tried the using the code below to name the column of coefficients ind.y derived from an lme model. Unfortunately, the name ind.y is not applied to the column. What can I do to name the column? toy-data.frame(ID=c(1,1,1,2,2,2,3,3,3,4,4,4), x=rnorm(12), y=rnorm(12)) model.toy-lme(y~1, random=~1|ID, data=toy) coef.y-(ind.y=coef(model.toy)) coef.y (Intercept) 1 0.52065015 2 0.04066776 3 0.29793571 4 0.11213693 Thanks, Eric McKibben Doctoral Candidate I-O Psychology Clemson University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Design Package - Penalized Logistic Reg. - Query
On Sep 25, 2009, at 8:33 PM, Lars Bishop wrote: Dear R experts, The lrm function in the Design package can perform penalized (Ridge) logistic regression. It is my understanding that the ridge solutions are not equivalent under scaling of the inputs, so one normally standardizes the inputs. Do you know if input standardization is done internally in lrm or I would have to do it prior to applying this function. Also, as I'm new in R (coming from SAS) I don't know how well R will handle relatively large data sets (e.g. 1/2 million observations on 40 variables). I don't have the answer to your first question but I routinely work with a dataset that is several times that large using the Design (and now) the rms packages. (You do need to have sufficient physical memory, but it is not R that is the limiting factor.) -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] QQ plotting of various distributions...
#same shape some_data - rgamma(500,shape=6,scale=2) test_data - rgamma(500,shape=6,scale=2) plot(sort(some_data),sort(test_data)) # You can also use qqplot(some_data,test_data) abline(0,1) # different shape some_data - rgamma(500,shape=6,scale=2) test_data - rgamma(500,shape=4,scale=2) plot(sort(some_data),sort(test_data)) abline(0,1) It is helpful to assess the sampling variability, by creating repeated sets of test_data, and plotting all of these along with your observations to create a confidence envelope. The SuppDists provides Inverse Gauss. On Thu, Sep 17, 2009 at 11:46 AM, Petar Milin pmi...@ff.uns.ac.rs wrote: Hello! I am trying with this question again: I would like to test few distributional assumptions for some behavioral response data. There are few theories about true distribution of those data, like: normal, lognormal, gamma, ex-Gaussian (exponential-Gaussian), Wald (inverse Gaussian) etc. The best way would be via qq-plot, to show to students differences. First two are trivial: qqnorm(dat$X) qqnorm(log(dat$X)) Then, things are getting more hairy. I am not sure how to make plots for the rest. I tried gamma with: qqmath(~ X, data=dat, distribution=function(X) qgamma(X, shape, scale)) Which should be the same as: plot(qgamma(ppoints(dat$X), shape, scale), sort(dat$X)) Shape and scale parameters I got via mhsmm package that has gammafit() for shape and scale parameters estimation. Am I on right track? Does anyone know how to plot the rest: ex-Gaussian (exponential-Gaussian), Wald (inverse Gaussian)? Thanks, PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Design Package - Penalized Logistic Reg. - Query
Lars Bishop wrote: Dear R experts, The lrm function in the Design package can perform penalized (Ridge) logistic regression. It is my understanding that the ridge solutions are not equivalent under scaling of the inputs, so one normally standardizes the inputs. Do you know if input standardization is done internally in lrm or I would have to do it prior to applying this function. It's done internally, as buried in the documentation somewhere. Actually lrm puts the scaling factors (standard deviations for continuous variables) into the penalty matrix. Frank Also, as I'm new in R (coming from SAS) I don't know how well R will handle relatively large data sets (e.g. 1/2 million observations on 40 variables). I'll appreciate your comments. Many thanks in advance. Lars/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] basic cubic spline smoothing
The best reference I know for this is something I wrote with Jim Ramsay and Giles Hooker: Functional Data Analysis with R and Matlab (Springer, 2009). Others may have better material. After install.packages('fda'), I suggest you try system.file('scripts', package='fda'), as suggested in the Preface. This will point you the a subdirectory of your local installation of fda that contains files with names like fdarm-ch01.R, fdarm-ch02.R, ..., fdarm-ch11.R. You will likely be most interested Figure 9.4, sections 9.4.2 and 9.4.3, script fdarm-ch09.R. The script by itself may answer your question. If not, you may wish to consult the book. Hope this helps. Spencer Graves hm567 wrote: hm567 wrote: I am unsure about spar being the smoothness parameter, about where to put the standard errors of the points, and about the return of the smooth.spline function: Smoothing Parameter spar= 0.5 lambda= 0.006833112 best regards, Basically, the implementation based on the attached paper, for a standard error of points =1.0, the smoothing is too insensitive to the lambda smoothness parameter. From 1 to almost 0.01, there is almost no smoothing... Only from 0.01 to 0 does one start to see smoothing in action with the limit at 0 being a straight line. Note that this implementation's parameter is (1 - parameter) With R smooth.spline, 'spar' reflects well the smoothness in that: . at 0%, the spline interpolates . at 40% already, its shape is very different from the 0% one ( for my implementation, they are still same ) . at 90% it is almost a straight line . at 100% it is definitely a straight line This is the behavior that I wish to have. It seems I need to change my lambda with some transformation that is similar to the one in the doc of smooth.spline (spar to lambda). Perhaps the reverse one. But I can't see how to do it. The other question is the standard errors. What do they correspond to in the doc of smooth.spline? Regards, -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Downloading data from from internet
Bogaso wrote: Thank you so much for those helps. However I need little more help. In the site http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php; if I scroll below then there is an option Historical CPI Index For USA Next if I click on Get Data then another table pops-up, however without any significant change in address bar. This tables holds more data starting from 1999. Can you please help me how to get the values of this table? Hi again Well, this is a little bit more involved, as this is an HTML form and so we need to be able to emulate submitting a form with values for the different parameters the form expects, along with ensuring they are correct inputs. Ordinarily, this would involve looking at the source of the HTML document, finding the relevant form element, getting its action attribute, and all its inputs and figuring out the possible inputs. This is straightforward but involved. But we have an R package that does this reasonably well in an automated form. This is the RHTMLForms from the www.omegahat.org/R repository. We can use this with install.packages(RHTMLForms, repos = http://www.omegahat.org/R;) Then library(RHTMLForms) ff = getHTMLFormDescription(http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;) # The form we want is the third one. We can determine this # from the names of the parameters. # So we request that this form description be turned into an R function g = createFunction(ff[[3]]) # Now we call this. xx = g(2001, 2008) # This returns the content of an HTML document # so we parse it and then pass this to readHTMLTable() # This is why we have methods for library(XML) doc = htmlParse(xx, asText = TRUE) tbls = readHTMLTable(doc) # we want the last of the tables. tbls[[length(tbls)]] So hopefully that helps solve your problem and introduces another Omegahat package that we hope people find through Google. The RHTMLForms package is an approach to the poor-man's Web services - HTML forms- rather than REST and SOAP that are becoming more relevant each day. The RCurl and SSOAP address the latter. D. Thanks Duncan Temple Lang wrote: Thanks for explaining this, Charlie. Just for completeness and to make things a little easier, the XML package has a function named readHTMLTable() and you can call it with a URL and it will attempt to read all the tables in the page. tbls = readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php') yields a list with 10 elements, and the table of interest with the data is the 10th one. tbls[[10]] The function does the XPath voodoo and sapply() work for you and uses some heuristics. There are various controls one can specify and also various methods for working with sub-parts of the HTML document directly. D. cls59 wrote: Bogaso wrote: Hi all, I want to download data from those two different sources, directly into R : http://www.rateinflation.com/consumer-price-index/usa-cpi.php http://eaindustry.nic.in/asp2/list_d.asp First one is CPI of US and 2nd one is WPI of India. Can anyone please give any clue how to download them directly into R. I want to make them zoo object for further analysis. Thanks, The following site did not load for me: http://eaindustry.nic.in/asp2/list_d.asp But I was able to extract the table from the US CPI site using Duncan Temple Lang's XML package: library(XML) First, download the website into R: html.raw - readLines( 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' ) Then, convert to an HTML object using the XML package: html.data - htmlTreeParse( html.raw, asText = T, useInternalNodes = T ) A quick scan of the page source in the browser reveals that the table you want is encased in a div with a class of dynamicContent-- we will use a xpath specification[1] to retrieve all rows in that table: table.html - getNodeSet( html.data, '//d...@class=dynamicContent]/table/tr' ) Now, the data values can be extracted from the cells in the rows using a little sapply and xpathXpply voodoo: table.data - t( sapply( table.html, function( row ){ row.data - xpathSApply( row, './td', xmlValue ) return( row.data) })) Good luck! -Charlie [1]: http://www.w3schools.com/XPath/xpath_syntax.asp - Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] Downloading data from from internet
Thanks Duncan for your input. However I could not install the package RHTMLForms, it is saying as not not available : install.packages(RHTMLForms, repos = http://www.omegahat.org/R;) Warning in install.packages(RHTMLForms, repos = http://www.omegahat.org/R;) : argument 'lib' is missing: using 'C:\Users\Arrun's\Documents/R/win-library/2.9' Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘RHTMLForms’ is not available I found this package in net : http://www.omegahat.org/RHTMLForms/ However it is gz file which I could not use as I am a window user. Can you please provide me alternate source? Thanks, Duncan Temple Lang wrote: Bogaso wrote: Thank you so much for those helps. However I need little more help. In the site http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php; if I scroll below then there is an option Historical CPI Index For USA Next if I click on Get Data then another table pops-up, however without any significant change in address bar. This tables holds more data starting from 1999. Can you please help me how to get the values of this table? Hi again Well, this is a little bit more involved, as this is an HTML form and so we need to be able to emulate submitting a form with values for the different parameters the form expects, along with ensuring they are correct inputs. Ordinarily, this would involve looking at the source of the HTML document, finding the relevant form element, getting its action attribute, and all its inputs and figuring out the possible inputs. This is straightforward but involved. But we have an R package that does this reasonably well in an automated form. This is the RHTMLForms from the www.omegahat.org/R repository. We can use this with install.packages(RHTMLForms, repos = http://www.omegahat.org/R;) Then library(RHTMLForms) ff = getHTMLFormDescription(http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php;) # The form we want is the third one. We can determine this # from the names of the parameters. # So we request that this form description be turned into an R function g = createFunction(ff[[3]]) # Now we call this. xx = g(2001, 2008) # This returns the content of an HTML document # so we parse it and then pass this to readHTMLTable() # This is why we have methods for library(XML) doc = htmlParse(xx, asText = TRUE) tbls = readHTMLTable(doc) # we want the last of the tables. tbls[[length(tbls)]] So hopefully that helps solve your problem and introduces another Omegahat package that we hope people find through Google. The RHTMLForms package is an approach to the poor-man's Web services - HTML forms- rather than REST and SOAP that are becoming more relevant each day. The RCurl and SSOAP address the latter. D. Thanks Duncan Temple Lang wrote: Thanks for explaining this, Charlie. Just for completeness and to make things a little easier, the XML package has a function named readHTMLTable() and you can call it with a URL and it will attempt to read all the tables in the page. tbls = readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php') yields a list with 10 elements, and the table of interest with the data is the 10th one. tbls[[10]] The function does the XPath voodoo and sapply() work for you and uses some heuristics. There are various controls one can specify and also various methods for working with sub-parts of the HTML document directly. D. cls59 wrote: Bogaso wrote: Hi all, I want to download data from those two different sources, directly into R : http://www.rateinflation.com/consumer-price-index/usa-cpi.php http://eaindustry.nic.in/asp2/list_d.asp First one is CPI of US and 2nd one is WPI of India. Can anyone please give any clue how to download them directly into R. I want to make them zoo object for further analysis. Thanks, The following site did not load for me: http://eaindustry.nic.in/asp2/list_d.asp But I was able to extract the table from the US CPI site using Duncan Temple Lang's XML package: library(XML) First, download the website into R: html.raw - readLines( 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' ) Then, convert to an HTML object using the XML package: html.data - htmlTreeParse( html.raw, asText = T, useInternalNodes = T ) A quick scan of the page source in the browser reveals that the table you want is encased in a div with a class of dynamicContent-- we will use a xpath specification[1] to retrieve all rows in that table: table.html - getNodeSet( html.data, '//d...@class=dynamicContent]/table/tr' ) Now, the data values can be extracted from the cells in the rows using a little sapply and xpathXpply voodoo: table.data - t( sapply( table.html, function( row ){ row.data - xpathSApply( row, './td',