[R] xmlToDataFrame very slow
I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame (package XML). I have successfully read it into R by splitting the file 10 ways then running xmlToDataFrame on each part, then rbind.fill (package plyr) on the result. This takes about 530 s total, and results in a data.frame with 71k rows and object.size of 21MB. But trying to run xmlToDataFrame on the whole file takes forever ( 1 s so far). xmlParse of this file takes only 0.8 s. I tried running xmlToDataFrame on the first 10% of the file, then the first 10% repeated twice, then three times (with the outer tags adjusted of course). Timings: 1 copy: 111 s = 111 per copy 2 copy: 311 s = 155 3 copy: 626 s = 209 The runtime is superlinear. What is going on here? Is there a better approach? Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] execute array of functions
That won't work because R has special rules for evaluating things in the function position. Examples: *OK* min(1:2) min(1:2) f-min; f(1:2) do.call(min,list(1:2)) do.call(min,list(1:2)) # do.call converts string-function *Not OK* (min)(1:2) # string in function position is not converted f-min; f(1:2) # ditto f- c(min,max); f[1](1:2) # ditto What you need to do is make 'f' a list of *function values, *not a vector of strings: f- c(min,max) and then select the element of f with [[ ]] (select one element), not [ ] (select sublist): f[[1]](1:2) Thus your example becomes type- c(min,max) n - 1:10 for (a in 1:2){ print(type[[a]](n)) } Another (uglier) approach is with do.call: type- c(min,max) n - 1:10 for (a in 1:2){ print(do.call(type[a],list(n))) } Does that help? -s On Tue, Feb 14, 2012 at 14:02, Muhammad Rahiz muhammad.ra...@ouce.ox.ac.ukwrote: Hi all, I'm trying to get the min and max of a sequence of number using a loop like the folllowing. Can anyone point me to why it doesn't work. Thanks. type- c(min,max) n - 1:10 for (a in 1:2){ print(type[a](n)) } -- Muhammad __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R development master class: NYC, Dec 12-13
Last time, I was told that I couldn't list my R package and associated papers as a research activity with substantial impact because it was outside my official scope of work. (Even though I wrote it so I could *do* my work.) That seems wrong. My impression is that method papers were frequent citation classics http://garfield.library.upenn.edu/classics.html. Why should a software method paper be treated worse than a (e.g.) chemical method paper? -s On Sun, Nov 13, 2011 at 15:58, Sarah Goslee sarah.gos...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:55 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Some of the money I earn from these courses goes to pay for my summer salary and supports student research. It also gives me confidence that if I don't get tenure because I've been writing R packages instead of papers, I can keep doing the work I love. If that actually happens, that would be an amazing/colossal (not in a good way) testament to how well the rating system works in academia. I'm not in academia, but government research. I do go through a review very similar to the tenure process. Last time, I was told that I couldn't list my R package and associated papers as a research activity with substantial impact because it was outside my official scope of work. (Even though I wrote it so I could *do* my work.) I have no trouble seeing academic administrators do the same thing. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kleinberg's burst detection algorithm
Has anyone here implemented Jon Kleinberg's burst detection algorithm (Bursty and Hierarchical Structure in Streams http://www.cs.cornell.edu/home/kleinber/bhs.pdf)? I'd rather not reimplement if there's already running code available Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading name-value data
Perfect! Thanks! By the way, I see that, unlike base rbind, it does not work for vectors and lists: rbind(c(a=1),c(b=2)) = matrix(1:2,2,1,dimnames=list(NULL,a)) == as.matrix(data.frame(a=1:2)) but rbind.fill(c(a=1),c(b=2)) = NULL Shouldn't it give something like matrix(c(1,NA,NA,2),2,2,dimnames=list(NULL,c(a,b))) or data.frame(a=c(1,NA),b=c(NA,2)) If, on the other hand, it insists on data.frames as input, it should err out if give non-data-frames. -s On Thu, Jul 28, 2011 at 19:30, Hadley Wickham had...@rice.edu wrote: Use plyr::rbind.fill? That does match up columns by name. Hadley On Thu, Jul 28, 2011 at 5:23 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I have a file of data where each line is a series of name-value pairs, but where the names are not necessarily the same from line to line, e.g. a=1,b=2,d=5 b=4,c=3,e=3 a=5,d=1 I would like to create a data frame which lines up the data in the corresponding columns. In this case, this would be data.frame( a = (1, NA, 4), b = (2, 4, NA), c = (NA, 3, NA), d = (5, NA, 1), e = (NA, 3, 1) ) One way I can think of doing this is to read in the data as one 'long' data frame per line with a unique ID, e.g. line one becomes cbind(id=1,data.frame(variable=c('a','b','d'),value=c(1,2,5))) then rbind all the lines and use the reshape package function 'cast'. Is there a more straightforward way? (I'd have thought rbind would line up columns by name, but it doesn't.) -s -- You received this message because you are subscribed to the Google Groups manipulatr group. To post to this group, send email to manipul...@googlegroups.com. To unsubscribe from this group, send email to manipulatr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/manipulatr?hl=en. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading name-value data
I have a file of data where each line is a series of name-value pairs, but where the names are not necessarily the same from line to line, e.g. a=1,b=2,d=5 b=4,c=3,e=3 a=5,d=1 I would like to create a data frame which lines up the data in the corresponding columns. In this case, this would be data.frame( a = (1, NA, 4), b = (2, 4, NA), c = (NA, 3, NA), d = (5, NA, 1), e = (NA, 3, 1) ) One way I can think of doing this is to read in the data as one 'long' data frame per line with a unique ID, e.g. line one becomes cbind(id=1,data.frame(variable=c('a','b','d'),value=c(1,2,5))) then rbind all the lines and use the reshape package function 'cast'. Is there a more straightforward way? (I'd have thought rbind would line up columns by name, but it doesn't.) -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Composing two n-dimensional arrays into one n+1-dimensional array
If I have 2 n-dimensional arrays, how do I compose them into a n+1-dimension array? Is there a standard R function that's something like the following, but that gives clean errors, handles all the edge cases, etc. abind - function(a,b) structure( c(a,b), dim = c(dim(a), 2) ) m1 - array(1:6,c(2,3)) m2 - m1 + 10 abind(m1,m2) == , , 1 [,1] [,2] [,3] [1,]135 [2,]246 , , 2 [,1] [,2] [,3] [1,] 11 13 15 [2,] 12 14 16 Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Approximate name matching
Is there R software available for doing approximate matching of personal names? I have data about the same people produced by different organizations and the only matching key I have is the name. I know that commercial solutions exist, and I know I code code this from scratch, but I'd prefer to build on some existing free solution if it exists. Unfortunately, the names are not standardized, and there is also a certain level of error: Danny Williams (nickname) Dan Williams (nickname) Daniel Williams (nickname) Dan William (spelling error) D. Williams (initials) Daniel Danny Williams (formal + nickname) Dan P. Williams (includes middle initial) Williams, Daniel (different convention) William Daniel (wrong order or missing comma + misspelling) Is there any R software available to find likely matches, ideally with some estimate of accuracy of match? Levenshtein distance as implemented in agrep is a useful solution for some of these cases; I was wondering if there is something that covers more cases. For this particular application, I am not concerned with issues such as variant latinizations/transliterations (e.g. Tsung-Dao Lee ~ T.D. Lee ~ Li Zhengdao; Ghaddafi ~ Qaddhaffi), but of course if someone handles that as well Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] General binary search?
Martin, Thank you for your exploration of implementations of bsearch! In my application, length(val) is very small (typically 2), so vectorization over val doesn't help -- though vectorization over tab could work by doing n-ary instead of 2-ary splits with something like match(TRUE, val tab[L+(H-L)*(1:9/10)]) and (when H-L becomes small) match(TRUE,val tab[L:H]) Then there are approaches like tries... but though I love this sort of programming, I'm trying to reuse as much well-tested, well-tuned library code as I can. Thanks again for your ideas! -s On Wed, Apr 6, 2011 at 12:59, Martin Morgan mtmor...@fhcrc.org wrote: On 04/04/2011 01:50 PM, William Dunlap wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Stavros Macrakis Sent: Monday, April 04, 2011 1:15 PM To: r-help Subject: [R] General binary search? Is there a generic binary search routine in a standard library which a) works for character vectors b) runs in O(log(N)) time? I'm aware of findInterval(x,vec), but it is restricted to numeric vectors. xtfrm(x) will convert a character (or other) vector to a numeric vector with the same ordering. findInterval can work on that. E.g., f0- function(x, vec) { tmp- xtfrm(c(x, vec)) findInterval(tmp[seq_along(x)], tmp[-seq_along(x)]) } f0(c(Baby, Aunt, Dog), LETTERS) [1] 2 1 4 I've never looked at its speed. For a little progress (though no 'generic binary searchin a standard library'), here's the 'one-liner' bsearch1 - function(val, tab, L=1L, H=length(tab)) { while (H = L) { M - L + (H - L) %/% 2L if (tab[M] val) H - M - 1L else if (tab[M] val) L - M + 1L else return(M) } return(L - 1L) } It seems like a good candidate for the new (R-2.13) 'compiler' package, so library(compiler) bsearch2 - cmpfun(bsearch1) And Bill's suggestion bsearch3 - function(val, tab) { tmp - xtfrm(c(val, tab)) findInterval(tmp[seq_along(val)], tmp[-seq_along(val)]) } which will work best when 'val' is a vector to be looked up. A quick look at data.table:::sortedmatch seemed to return matches, whereas Stavros is looking for lower bounds. It seems that one could shift the weight more to C code by 'vectorizing' the one-liner, first as bsearch5 - function(val, tab, L=1L, H=length(tab)) { b - cbind(L=rep(L, length(val)), H=rep(H, length(val))) i0 - seq_along(val) repeat { M - b[i0,L] + (b[i0,H] - b[i0,L]) %/% 2L i - tab[M] val[i0] b[i0 + i * length(val)] - ifelse(i, M - 1L, ifelse(tab[M] val[i0], M + 1L, M)) i0 - which(b[i0, H] = b[i0, L]) if (!length(i0)) break; } b[,L] - 1L } and then a little more thoughtfully (though more room for improvement) as bsearch7 - function(val, tab, L=1L, H=length(tab)) { b - cbind(L=rep(L, length(val)), H=rep(H, length(val))) i0 - seq_along(val) repeat { updt - M - b[i0,L] + (b[i0,H] - b[i0,L]) %/% 2L tabM - tab[M] val0 - val[i0] i - tabM val0 updt[i] - M[i] + 1L i - tabM val0 updt[i] - M[i] - 1L b[i0 + i * length(val)] - updt i0 - which(b[i0, H] = b[i0, L]) if (!length(i0)) break; } b[,L] - 1L } none of bsearch 3, 5, or 7 is likely to benefit substantially from compilation. Here's a little test data set converting numeric to character as an easy cheat. set.seed(123L) x - sort(as.character(rnorm(1e6))) y - as.character(rnorm(1e4)) There seems to be some significant initial overhead, so we warm things up (and also introduce the paradigm for multiple look-ups in bsearch 1, 2) warmup - function(y, x) { lapply(y, bsearch1, x) lapply(y, bsearch2, x) bsearch3(y, x) bsearch5(y, x) bsearch7(y, x) } replicate(3, warmup(y, x)) and then time system.time(res1 - unlist(lapply(y, bsearch1, x), use.names=FALSE)) user system elapsed 2.692 0.000 2.696 system.time(res2 - unlist(lapply(y, bsearch2, x), use.names=FALSE)) user system elapsed 1.379 0.000 1.380 identical(res1, res2) [1] TRUE system.time(res3 - bsearch3(y, x)); identical(res1, res3) user system elapsed 8.339 0.001 8.350 [1] TRUE system.time(res5 - bsearch5(y, x)); identical(res1, res5) user system elapsed 0.700 0.000 0.702 [1] TRUE system.time(res7 - bsearch7(y, x)); identical(res1, res7) user system elapsed 0.222 0.000 0.222 [1] TRUE Martin Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com I'm also aware of various hashing solutions (e.g. new.env(hash=TRUE) and fastmatch), but I need the greatest-lower-bound match in my application. findInterval is also slow for large N=length(vec) because of the O(N) checking it does, as Duncan Murdoch has pointed outhttps://stat.ethz.ch/pipermail/r
[R] General binary search?
Is there a generic binary search routine in a standard library which a) works for character vectors b) runs in O(log(N)) time? I'm aware of findInterval(x,vec), but it is restricted to numeric vectors. I'm also aware of various hashing solutions (e.g. new.env(hash=TRUE) and fastmatch), but I need the greatest-lower-bound match in my application. findInterval is also slow for large N=length(vec) because of the O(N) checking it does, as Duncan Murdoch has pointed outhttps://stat.ethz.ch/pipermail/r-help/2008-September/174584.html: though its documentation says it runs in O(n * log(N)), it actually runs in O(n * log(N) + N), which is quite noticeable for largish N. But that is easy enough to work around by writing a variant of findInterval which calls find_interv_vec without checking. -s PS Yes, binary search is a one-liner in R, but I always prefer to use standard, fast native libraries when possible binarysearch - function(val,tab,L,H) {while (H=L) { M=L+(H-L) %/% 2; if (tab[M]val) H-M-1 else if (tab[M]val) L-M+1 else return(M)}; return(L-1)} [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stricter read.table?
read.table gives idiosyncratic results when the input is formatted strangely, for example: read.table(textConnection(a'b\nc'd\n),header=FALSE,fill=TRUE,sep=,quote=') = c'd a'b c'd read.table(textConnection(a'b\nc'd\nf'\n'\n),header=FALSE,fill=TRUE,sep=,quote=') = f' \na b c'd f' \n Though read.table doesn't specify the syntax of its input precisely, these results don't seem particularly useful or consistent. Is there a stricter version of read.table (perhaps in a package) that gives errors or warnings if it finds quotation marks in the middle of fields or encounters other such peculiar situations? Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quantile with discrete types
I don't understand why 'quantile' works in this case: tt - rep(c('a','b'),c(10,3)) sapply(0:6/6,function(q) quantile(tt,probs=q,type=1)) 0% 16.7% 33.3% 50% 66.7% 83.3% 100% a a a a a b b and also quantile(tt,0:5/5,type=1) 0% 20% 40% 60% 80% 100% a a a a b b but gives an error in this, which I would have thought equivalent to the first case above: quantile(tt,0:6/6,type=1) Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : argument is not a numeric vector I could of course write something like sort(tt)[seq(1,length(tt),length.out=7)] -- but I'm wondering why quantile fails in this case. Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] solving cubic/quartic equations non-iteratively
On Tue, Jan 5, 2010 at 5:25 PM, Carl Witthoft c...@witthoft.com wrote: quote: There are certainly formulas for solving polynomials numerically up to 4th degree non-iteratively, but you will almost certainly get better results using iterative methods. I must be missing something here. Why not use the analytic formulas for polynomials below 5th degree? Once you do so, your answer is as precise as the level of precision you enter for the coefficients. Why do you believe that? Are you assuming you can perform *exact* arithmetic? Did you read the references I gave? * George Forsythe, How do you solve a quadratic equation? * Yves Nievergelt, How (Not) to Solve Quadratic Equations They show that that isn't even true for quadratic equations without a lot of care. Let's try a cubic: p = 100*x^3-998000*x^2-1001999*x+99 That factors exactly over the integers to: (x-1001)*(x-1000)*(x-999) but plugging the floating-point coefficients (which are exactly representable as floats) into (one version of) the cubic formula (using Maxima), I get the roots x = 966.1329834413779+58.65086897690403i x = 966.1329834413779-58.65086897690403i x = 1067.734033117244 On the other hand, using an interative approach, I get: x = 999.000278754 x = 999.926817675 x = 1001.07290357 Which looks better to you? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] solving cubic/quartic equations non-iteratively
There are certainly formulas for solving polynomials numerically up to 4th degree non-iteratively, but you will almost certainly get better results using iterative methods. Even the much more trivial formula for the 2nd degree (quadratic) is tricky to implement correctly and accurately, see: * George Forsythe, How do you solve a quadratic equation? * Yves Nievergelt, How (Not) to Solve Quadratic Equations Hope this helps. -s On Tue, Jan 5, 2010 at 10:11 AM, Mads Jeppe Tarp-Johansen s02m...@math.ku.dk wrote: To R-helpers, R offers the polyroot function for solving mentioned equations iteratively. However, Dr Math and Mathworld (and other places) show in detail how to solve mentioned equations non-iteratively. Do implementations for R that are non-iterative and that solve mentioned equations exists? Regards, Mads Jeppe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with expand.grid
Unfortunately, expand.grid doesn't validate the class of its argument, so it is reporting an internal error rather than something more intelligible. On Tue, Dec 22, 2009 at 11:19 AM, Keith Jewell k.jew...@campden.co.ukwrote: Just confirming it isn't the bug fixed in 2.11.0dev, and giving an even simpler example: R version 2.11.0 Under development (unstable) (2009-12-20 r50794) expand.grid(data.frame(y=1:10, t=1:10)) Error in `[[-.data.frame`(`*tmp*`, i, value = c(1L, 2L, 3L, 4L, 5L, 6L, : replacement has 100 rows, data has 10 Keith Jewell k.jew...@campden.co.uk wrote in message news:hgqqja$rk...@ger.gmane.org... Hi All, This example code dDF - structure(list(y = c(4.75587, 4.8451, 5.04139, 4.85733, 5.20412, 5.92428, 5.69897, 4.78958, 4, 4), t = c(0, 48, 144, 192, 240, 312, 360, 0, 48, 144), Batch = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ), T = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2), pH = c(4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6), S = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), N = c(0, 0, 0, 0, 0, 0, 0, 80, 80, 80)), .Names = c(y, t, Batch, T, pH, S, N), row.names = c(NA, 10L), class = data.frame) str(dDF) expand.grid(dDF) 'hangs' for a while and then gives an error Error in `[[-.data.frame`(`*tmp*`, i, value = c(4.75587, 4.8451, 5.04139, : replacement has 1000 rows, data has 10 In NEWS.R-2.11.0dev I read: o The new (in 2.9.0) 'stringsAsFactors' argument to expand.grid() was not working: it now does work but has default TRUE for backwards compatibility. but I don't think that's relevant, I have no factors. I'm probably being silly. Can anyone point out where? Best... Keith Jewell --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 10.1 year = 2009 month = 12 day = 21 svn rev = 50796 language = R version.string = R version 2.10.1 Patched (2009-12-21 r50796) Windows Server 2003 x64 (build 3790) Service Pack 2 Locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Method dispatch for function
How can I determine what S3 method will be called for a particular first-argument class? I was imagining something like functionDispatch('str','numeric') = utils:::str.default , but I can't find anything like this. For that matter, I was wondering if anyone had written a version of `methods` which gave their fully qualified names if they were not visible, e.g. methods('str') = utils:::str.data.frameutils:::str.default stats:::str.dendrogramstats:::str.logLikutils:::str.POSIXt or methods('str') = $utils str.data.frame str.defaultstr.POSIXt $stats str.dendrogram str.logLik Thank you, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Suppressing final spaces in data.frame printouts
When printing data.frames, R aligns columns by padding with spaces. For example, print(data.frame(x=c('a','bb','ccc')),right=FALSE) x 1 a |-- vertical bar shows end of line 2 bb |-- vertical bar shows end of line 3 ccc|-- vertical bar shows end of line Is there some way to suppress the padding for the final column? I often have data frames which contain a handful of long strings in the final column which, when printed out, cause wraparound on all the rows, even those not containing long strings, something like this: print(data.frame(q=1:3,x=c('a','bb','this is a very long string')),right=FALSE) q x | | 1 1 a | | 2 2 bb | | 3 3 this is a very l| ong string| where I'd rather have print(data.frame(q=1:3,x=c('a','bb','this is a very long string')),right=FALSE) q x| 1 1 a| 2 2 bb| 3 3 this is a very l| ong string| I could of course write my own print function for this, but was wondering if there was a standard way of doing it. If not in R, perhaps there is some way to have ESS delete the final spaces? Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suppressing final spaces in data.frame printouts
Thanks for the suggestion. I'mm familiar with the truncate-lines variable, but that's not quite what I was looking for. I don't want the padding spaces displayed, but I do want to see long strings at the end of the line. Thanks anyway, -s On Wed, Nov 11, 2009 at 5:40 PM, Richard M. Heiberger r...@temple.eduwrote: Stavros Macrakis wrote: I could of course write my own print function for this, but was wondering if there was a standard way of doing it. If not in R, perhaps there is some way to have ESS delete the final spaces? ESS, or more precisely emacs, can handle that. Use the M-x toggle-truncate-lines command: Toggle whether to fold or truncate long lines for the current buffer. Rich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suppressing final spaces in data.frame printouts
I'm adding ess-help to the addressees because apparently this needs to be solved in ESS, not in R. Thanks! So I guess you're suggesting something like (add-hook 'comint-output-filter-functions (lambda (s) (save-restriction (narrow-to-region comint-last-output-start (+ -1 (process-mark (get-buffer-process (current-buffer) ;; stop one char before the end of the output region to avoid ;; deleting the space after the R prompt (delete-trailing-whitespace I have almost succeeded in making this work right. But if it is called for an output chunk which isn't the last one (with the prompt), it can suppress spaces in the middle of the line. Test with for (i in 1:1000) print( ) for example. Any ideas? This is the sort of niggling little edge-case complication which made me hope that someone had already solved the problem in R or ESS -s On Wed, Nov 11, 2009 at 8:43 PM, RICHARD M. HEIBERGER r...@temple.eduwrote: On Wed, Nov 11, 2009 at 8:12 PM, Stavros Macrakis macra...@alum.mit.edu wrote: Thanks for the suggestion. I'mm familiar with the truncate-lines variable, but that's not quite what I was looking for. I don't want the padding spaces displayed, but I do want to see long strings at the end of the line. Then we can use a different emacs trick. delete-trailing-whitespaceM-x ... RET Command: Delete all the trailing whitespace across the current buffer. ess-nuke-trailing-whitespace M-x ... RET Command: Nuke all trailing whitespace in the buffer. whitespace-toggle-trailing-check M-x ... RET Command: Toggle the check for trailing space in the local buffer. Rich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Book on R programming
I recommend you skim the Chambers book at Google Books or Amazon before buying it as a guide to programming in R. It is a fascinating book, but is more a discursive reflection on the history and philosophy of R than a practical guide to programming in R. It certainly explains the rationale for many of the design decisions in R, which is great for those of us who are interested in the history of programming languages, and even the practical consequences of those design decisions, but I'm not sure it's useful as a handbook for programming in R. -s On Mon, Aug 31, 2009 at 8:33 AM, [Ricardo Rodriguez] Your XEN ICT Team webmas...@xen.net wrote: Hi, ANJAN PURKAYASTHA wrote: Most books on R I come across describe running statistical procedures in R. Any suggestions on a good book that teaches *programming* in R? Thanks, Anjan This is being really useful for me... John M. Chambers (2008) Software for Data Analysis. Programming with R. Springer. http://tinyurl.com/lg7g8n HTH -- Ricardo Rodríguez Your XEN ICT Team __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a construct for conditional comment?
On Thu, Aug 20, 2009 at 1:27 PM, David Winsemiusdwinsem...@comcast.net wrote: ... But an extremely simple modification succeeds: if ( 0 ) { commented with zero } else { commented with one } Returns: [1] \ncommented with one\n Yes, but of course that executes neither one nor the other. This works, though: eval(parse(textConnection(if (FALSE) syntactically incorrect ' code must not use double-quotes, though else print('this is a test') ))) though it is horribly ugly, so I second the suggestion to do this in your text editor if you must do it at all. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Keeping track of memory usage
How can I determine how much memory a given piece of my code is allocating (directly or indirectly)? -- essentially, the space analogue of system.time, something like this: system.space( x - rnorm(1) ) 1 Vcells system.space( for (i in 1:1000) x - rnorm(1) ) 1000 Vcells I'm not looking for anything as fine-grained as Rprofmem or tracemem, just the overall allocations. I'm also not looking for the amount of *live* memory (that is, net of garbage collection) as reported by memory.profile or gc. Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Object equality for S4 objects
On Thu, Jul 30, 2009 at 12:01 PM, Martin Morganmtmor...@fhcrc.org wrote: S4 objects do not have the semantics of environments, but of lists (or of most other R objects), so it is as meaningful to ask why identical(s1, s2) returns TRUE as it is to ask why identical(list(x=1), list(x=1)) returns TRUE. Thanks for the clarification. For some reason, I thought that S4 objects (unlike S3 objects) were objects in the conventional computer science sense, that is, mutable. Compare proto objects, which *are* objects in the usual sense: proto1 - proto(expr= {x=23}) proto2 - proto1 proto1$x - 45 proto2$x [1] 45# proto1 and proto2 are the same object setClass(test,representation(a=logical)) [1] test s41 - new(test) s42 - s41 s...@a - TRUE s...@a # s41 and s42 are different objects logical(0) It would thus perhaps be clearer to speak of S4 values rather than S4 objects. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Object equality for S4 objects
On Thu, Jul 30, 2009 at 4:03 PM, Martin Morganmtmor...@fhcrc.org wrote: S4 objects are mutable in the sense that one can write replacement methods for them Understood, but I don't think that's the usual meaning of 'mutable'. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Object equality for S4 objects
To test two environments for object equality (Lisp EQ), I can use 'identity': e1 - environment(local(function()x)) e2 - environment(local(function()x)) identical(e1,e2) # compares object identity [1] FALSE identical(as.list(e1),as.list(e2))# compares values as name-value mapping [1] TRUE# (is there a better way to do this?) What is the corresponding function for testing whether two S4 objects are the same object? It appears that 'identity' for S4 objects compares the *value*, not the *object identity*: setClass(simple,representation(a=logical)) [1] simple s1 - new(simple); s2 - new(simple) identical(s1,s1) [1] TRUE # not surprising identical(s1,s2) [1] TRUE # ? not comparing object identity s...@a - TRUE s...@a - TRUE identical(s1,s2) [1] TRUE s...@a - TRUE s...@a - FALSE identical(s1,s2) [1] FALSE Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dereferencing in R
What do you mean by 'passing an array reference' and 'dereferencing' and what do you mean by an 'R script'? What language(s) are you accustomed to? If you mean 'passing an array value' to an 'R function', you just use the argument name. Since R uses call-by-value (modulo the substitute mechanism, which as a beginner you should avoid), modifying the array within your function does not modify the global value. Normally you'd return the value, e.g. ar - array( 1:12,c(3,4)) ar [,1] [,2] [,3] [,4] [1,]147 10 [2,]258 11 [3,]369 12 sum12 - function(a) { a[1,] + a[2,] } sum12(ar) [1] 3 9 15 21 returned value If you want to *modify* the array ar, you should do something like this: ar[1,] - sum12(ar) ar[1,] - sum12(ar) ar [,1] [,2] [,3] [,4] [1,]39 15 21 [2,]258 11 [3,]369 12 Does this answer your question? -s On Thu, Jul 16, 2009 at 9:04 AM, xin liu liux...@yahoo.com wrote: Hi, All, I passed an array reference to the R script and do not know how to do dereferencing in the R script. Anybody has some suggestion? Many thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple cat statement - output truncated
Kevin, The habitués of this mailing list get irritated when users mail in problem reports which don't include enough information to reproduce the problem, as requested in the standard footer of r-help mail (PLEASE ... provide commented, minimal, self-contained, reproducible code.) This irritation is sometimes expressed aggressively and sometimes humorously. Be thankful that you drew humorously. So... please provide minimal, self-contained code that allows us to reproduce your problem. What is meant by self-contained? It is code that if you type it in to a fresh R, elicits your problem. This includes setting any necessary variables to appropriate values etc. -s On Thu, Jul 16, 2009 at 10:21 AM, rkevinbur...@charter.net wrote: So then I am to assume that the output of 'cat' can be truncated by passing it bad arrays. That is the only difference between the reproducible code you show and mine. It is just a theory but say that the components array is not dimmensioned for 4 elements. It seems a little strange if that is the case that a reference error is not thrown and just the output of the cat call is affected. Kevin Duncan Murdoch murd...@stats.uwo.ca wrote: On 7/15/2009 9:53 AM, rkevinbur...@charter.net wrote: I have a statement: cat(myforecast ETS(, paste(object$components[1], object$components[2], object$components[3], object$components[4], sep = ,), ) , n, \n) That generates: cast ETS( A,N,N,FALSE ) 3 Anyone guess as to why the first 5 letters are truncated/missing? You are probably being punished for posting non-reproducible code*. When I try a reproducible version of the line above, things look fine: cat(myforecast ETS(, paste(A,N,N,FALSE, sep = ,), ) , 3, \n) myforecast ETS( A,N,N,FALSE ) 3 Duncan Murdoch * R has a new predictive punishment module. It punishes you for things it knows you will do later. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] quoting expressions in a list
On Thu, Jul 16, 2009 at 4:44 PM, Erik Iversoneiver...@nmdp.org wrote: I have a list of logical expressions, and I would really like it if the names of the components of the list were identical to the corresponding logical expression. So, as an example: df.example - data.frame(a = 1:10, b = rnorm(10, 5)) list.example - list(df.example$a 7, df.example$b 4) Now what I'd really like is to name the components, and get the results of the following line without having to specify the right-hand side individually for each component: names(list.example) - c(df.example$a 7, df.example$b 4) Something like this, perhaps?: listx - function(...) structure(list(...),names=tail(as.list(substitute(c(...))),-1)) list.example - list(df.example$a 7, df.example$b 4) listx(df.example$a 7, df.example$b 4) $`df.example$a 7` [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE $`df.example$b 4` [1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trig functions strange results
On Tue, Jul 14, 2009 at 1:45 PM, Nair, Murlidharan T mn...@iusb.edu wrote: I am trying to calculate coordinate transformations and in the process of debugging my code using debug I found the following Browse[1] direction[i] [1] -1.570796 Browse[1] cos(direction[i]) [1] 6.123032e-17 Browse[1] cos(-1.570796) [1] 3.267949e-07 ... I am not sure why I am getting one values when I am using a variable that stores the value and another when I use the value directly. Am I missing something here? Because you are not using the same value. You say in a later message that your variable direction[i] was set to (0-90)*pi/180. So let's look at that: x - (0-90)*pi/180 x - (-1.570796) [1] -3.267949e-07 That is, (0-90)*pi/180 is not exactly equal to -1.570796, but rather to -1.570796326794897: print(x,digits=16) [1] -1.570796326794897 And that is equal to the calculated value. Well, almost: print(x,digits=17) [1] -1.570796326794897 the most digits R will print for a float -1.570796326794897 - x [1] -4.440892e-16 a very tiny difference See the R FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f By the way, there is a bug in the R print routine which does not print out the full precision even if you specify it -1.5707963267948965 - xone more digit is actually needed [1] 0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange strsplit gsub problem 0 is this a bug or a string length limitation?
On Fri, Jul 10, 2009 at 8:58 AM, Marc Schwartz marc_schwa...@me.com wrote: Review the Note in ?as.character: as.character truncates components of language objects to 500 characters (was about 70 before 1.3.1). If this limitation is too hard to fix, shouldn't it at least give a warning or an error? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Numbering sequences of non-NAs in a vector
Here's one possibility: vv - c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9) (1+cumsum(diff(is.na(c(vv[1],vv)))==1)) * !is.na(vv) [1] 1 1 1 1 1 1 0 0 0 0 2 2 2 0 0 0 3 3 3 3 On Tue, Jul 7, 2009 at 5:08 PM, Krishna Tateneni taten...@gmail.com wrote: Greetings, I have a vector of the form: [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a combination of sequences of non-missing values and missing values, with each sequence possibly of a different length. I'd like to create another vector which will help me pick out the sequences of non-missing values. For the example above, this would be: [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal ultimately is to calculate means separately for each sequence. Your help is appreciated. If I'm making this more complicated than necessary, I'd appreciate knowing that as well! Many thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question in using e1071 svm routine
Isn't the initial value of the variable T equal to the constant TRUE? So unless he's modified the value of T, shouldn't it work? -s On 7/7/09, Max Kuhn mxk...@gmail.com wrote: Unlike Splus, R does not use T for TRUE. On Tue, Jul 7, 2009 at 6:05 PM, Michaelcomtech@gmail.com wrote: Hi all, I've got the following error message in using e1071 svm routine... Could anybody please help me? Thank you! - model - svm(y=factor(mytraindata[, 1]), x=mytraindata[, -1], probability=T) Error in if (any(co)) { : missing value where TRUE/FALSE needed In addition: Warning message: In FUN(newX[, i], ...) : NAs introduced by coercion __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatically placing a legend in an area with the most white space...
install.packages('plotrix') On Sun, Jun 28, 2009 at 3:51 PM, Jason Rupert jasonkrup...@yahoo.comwrote: ... Error in legend(emptyspace(rep(x_vals_1, 3), c(y1_vals, y2_vals, y3_vals)), : could not find function emptyspace I've searched via RSeek, but I have not been able to find anything on this function. Is emptyspace part of a package that I need to install? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to avoid ifelse statement converting factor to character
It gives me a headache, too! I think you'll have to wait for a more expert user than me to supply explanations of these behaviors and their rationales. -s On 6/26/09, Craig P. Pyrame crap...@gmail.com wrote: Stavros Macrakis wrote: On Thu, Jun 25, 2009 at 12:47 PM, Craig P. Pyramecrap...@gmail.com wrote: The man page Stavros quotes states that the class attribute of the result is taken from 'test', which clearly is not the case: Actually, the behavior is documented pretty clearly: The mode of the answer will be coerced from logical to accommodate first any values taken from 'yes' and then any values taken from 'no'. Whether this is a good design or not is another issue Perhaps the justification is that it avoids evaluating the yes or no arguments (to determine their class) in cases where their value is not needed. Thank you for pointing me to this. Now I get a headache from trying to figure out what does mode have to do with class - I thought that the class of the result should be that of test, and that the mode is something entirely different. Why does coercing the mode also affect the class? If the man page said The class attribute is taken from test, and it will be coerced ... or The mode of the result is taken from test, and it will be coreced ..., would this be wrong? What is the class-mode mixture about? Why does this fail: r = as.raw(TRUE) ifelse(TRUE, r, r) = error This gives an error which I take for saying that raw cannot be coerced to logical, but yes it can: as.logical(r) = TRUE and raw can even be used as the condition vector in ifelse: ifelse(r, 1, 2) = 1 Best regards, Craig Example: ifelse(c(T,F),1,a) = c(1,a) This has the same effect as res - c(T,F) res[1] - 1 res[2] - a which is in fact pretty much the way it is implemented. And also, I find myself incapable of making sense of the may in the mode of the result may depend on the value of 'test' - may in what sense? See the examples at the end of ? ifelse -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I get just the two last tokens of each string in a vector?
One way is: a - c( %L H*L L*H H%, %L H* H%, %L L*H %, %L L*H % ) sub(^.*(^| )([^ ]+ [^ ]+$),\\2,a) [1] L*H H% H* H% L*H % L*H % Just be aware that this is not terribly efficient for very large strings. -s On Fri, Jun 26, 2009 at 7:21 AM, Fredrik Karlssondargo...@gmail.com wrote: Dear list, Sorry for asking this very silly question on the list, but I seem to have made my life complicated by going into string manipulation in vectors. What I need is to get the last part of a sting (the two last tokens, separated by a space), and of course, this should be done for all strings in a vector, creating a new vector of exual size. So, a - c( %L H*L L*H H%, %L H* H%, %L L*H %, %L L*H % ) should be made into a vector c( L*H H%, H* H%, L*H %, L*H % ) I have tried strsplit, but it seems to produce a structure I cannot get to work in this context. Any ideas on how to solve this? Thankful for all the help I can get. /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to avoid ifelse statement converting factor to character
On Wed, Jun 24, 2009 at 9:04 PM, Rolf Turnerr.tur...@auckland.ac.nz wrote: Do not get your knickers in a twist. R works simply and straightforwardly in simple straightforward situations. Though I find R an incredibly useful tool, alas, it is simply not true that R works simply and straightforwardly in simple straightforward situations. No doubt this is for understandable historical reasons and backwards compatibility, but there it is. Some examples of simple straightforward situations: I think it is reasonable to expect that appending a list/vector of class X to another list/vector of class X would result in a list/vector of class X. Similarly for the union of a list/vector of class X. But in fact, not only is this not true for some of R's important classes (factors, date/time, and delta-date/time), but the result class is inconsistent by function and by class: ff - factor(b) c(ff,ff)= 1 1# class integer union(ff,ff) = b# class character tt - as.POSIXct('2009-01-01') c(tt,tt) = 2009-01-01 EST 2009-01-01 EST # class POSIXt/POSIXct union(tt,tt) 1230786000# class numeric dt - tt - tt # class difftime c(dt,dt) = 0 0 # class numeric union(dt,dt) = 0 # class numeric Similarly, the simplest, most straightforward situation I can think of for ifelse is when the yes and no arguments are identical, and in that case, I would (I think reasonably) expect that the result is of the same class as the arguments, but it is not: ifelse(TRUE,factor(b),factor(b)) = 1 (integer) ifelse(TRUE,dd,dd) = 1230786000 (class numeric) I hope you will agree that all of these are very simple and straightforward situations, and that R is not working simply and straightforwardly in them. The less simple and less straightforward situations are of course more complicated. In respect of the current discussion of ifelse() --- the original problem arose because the values of ``yes'' and ``no'' were of different modes. It is obvious that in such instances a decision will have to be made about the mode of the result. The appropriateness of the designers' decision may be disputed, Indeed. If you don't understand what's going on, then just stick to using ifelse() only when ``yes'' and ``no'' have the same mode. That's not enough. They have to be of a basic class as well. See above. Bottom line: R is easy to use at any level, but in order to use it a ``high'' level you need to understand the high level. Don't attempt to run before you can crawl. Bottom line: Some very basic things in R violate users' reasonable expectations and moreover are internally inconsistent. You have to be careful about this whenever you work in R, even at an elementary level. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to avoid ifelse statement converting factor to character
Erratum: ifelse(TRUE,dd,dd) = 1230786000 (class numeric) should be ifelse(TRUE,tt,tt) = 1230786000 (class numeric) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to avoid ifelse statement converting factor to character
On Thu, Jun 25, 2009 at 12:47 PM, Craig P. Pyramecrap...@gmail.com wrote: The man page Stavros quotes states that the class attribute of the result is taken from 'test', which clearly is not the case: Actually, the behavior is documented pretty clearly: The mode of the answer will be coerced from logical to accommodate first any values taken from 'yes' and then any values taken from 'no'. Whether this is a good design or not is another issue Perhaps the justification is that it avoids evaluating the yes or no arguments (to determine their class) in cases where their value is not needed. Example: ifelse(c(T,F),1,a) = c(1,a) This has the same effect as res - c(T,F) res[1] - 1 res[2] - a which is in fact pretty much the way it is implemented. And also, I find myself incapable of making sense of the may in the mode of the result may depend on the value of 'test' - may in what sense? See the examples at the end of ? ifelse -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to avoid ifelse statement converting factor to character
On Wed, Jun 24, 2009 at 12:34 PM, Mark Namtb...@gmail.com wrote: The problem is that after running the ifelse statement, data$SOCIAL_STATUS is converted from a factor to a character. Is there some way I can avoid this conversion? I'm afraid that ifelse has very bizarre semantics when the yes and no arguments don't have the same, atomic vector, type. The quick workaround for the bizarre semantics (though it can have a significant efficiency cost) is this: unlist( ifelse ( condition, as.list( yes ), as.list( no ) ) ) (This isn't perfect, either, but...) Take a look at the man page for details and the warning: The mode of the result may depend on the value of 'test', and the class attribute of the result is taken from 'test' and may be inappropriate for the values selected from 'yes' and 'no'. Some consequences of the definition of ifelse are: Even if the classes of the yes and no arguments are identical, the result does not necessarily have that class: ifelse(TRUE,as.raw(4),as.raw(5)) = error ifelse(TRUE,factor('x'),factor('x')) = 1 (integer) dates - as.POSIXct(c('1990-1-1','2000-1-1')) ifelse(c(TRUE,FALSE),dates,dates) = 63117 946702800 (double) ifelse(c(TRUE,FALSE),factor(c('x','y')),factor(c('y','x'))) = 1 1 If they have different classes, things get stranger: ifelse(c(TRUE,FALSE),c(a,b),factor(c(c,d))) = a 2 ifelse(c(TRUE,FALSE),list(1,2),as.raw(4)) [[1]] [1] 1 [[2]] [1] 04 Result is order-dependent: ifelse(c(TRUE,FALSE),as.raw(4),list(1,2)) Error in ans[test !nas] - rep(yes, length.out = length(ans))[test : incompatible types (from raw to logical) in subassignment type fix Welcome to R! -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] first value...
I think what you mean is that you want to find the position of the first non-NA value in the vector. is.na returns a boolean vector of the NA values, so: xx - c(NA,NA,NA,2,3,NA,4) which(!is.na(xx))[1] [1] 4 The other proposed solution, which(diff(is.na(inc)) 0) is incorrect: which(diff(is.na(xx))0) [1] 3 6 -s On Tue, Jun 23, 2009 at 10:00 AM, Alfredo Alessandrini alfreal...@gmail.com wrote: Hi, I've a vector like this: inc [1]NANANANANANA NA... [71]NANANANANANANA [78]NANANANA 13.095503 10.140119 7.989186 ... I must obtain the position of first value of the vector... In this case is 82. inc[82] [1] 13.09550 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)
(I am replacing R-devel and r-bugs with r-help as addressees.) On Sat, Jun 20, 2009 at 9:45 AM, Dr. D. P. Kreil dpkr...@gmail.com wrote: So if I request a calculation of 0.3-0.1-0.1-0.1 and I do not get 0, that is not an issue of rounding / underflow (or whatever the correct technical term would be for that behaviour)? No. Let's start from the beginning. In binary floating point arithmetic, all numbers are represented as a*2^b, where a and b have a fixed number of digits, so input conversion from decimal form to binary form inherently loses some precision -- that is, it rounds to the nearest binary fraction. For example, representation(0.3) is 5404319552844595 * 2^-54, about 1e-17 less than exactly 3/10, which is of course not representable in the form a*2^b. The EXACT difference (calculating with rationals -- no roundoff errors etc.) between representation(0.3) and 3*representation(0.1) is 2^-55 (about 1e-17); the EXACT difference between representation(0.3) and representation(3*representation(0.1)) is 2^-54. As it happens, in this case, there is no rounding error at all -- the floating-point result of 0.3 - 3*0.1 is exactly -2^-54. I thought that guard digits would mean that 0.3-0.1*3 should be calculated in higher precision than the final representation of the result, i.e., avoiding that this is not equal to 0? Guard digits and sticky bits are techniques for more accurate rounding of individual arithmetic operations, and do not persist beyond each individual operation. They cannot create precise results out of imprecise inputs (except when they get lucky!). And even with precise inputs, they cannot create correctly rounded results with multiple operations. Consider for example (1.0 + 1.0e-15) - 1.0. The correctly rounded result of (1.0+1.0e-15) is 1.0011... And the correctly rounded result of (1.0+1.0e-15)-1.0 is 1.11e-15, which is 11% different than the mathematical result. Perhaps you are thinking about the case where intermediate results are accumulated in higher-than-normal precision. This technique only applies in very specialized circumstances, and it not available to user code in most programming languages (including R). I don't know whether R's sum function uses this technique or some other (e.g. Kahan summation), but it does manage to give higher precision than summation with individual arithmetic operators: sum(c(2^63,1,-2^63)) = 1 but Reduce(`+`,c(2^63,1,-2^63)) = 0 I am sorry if I am not from the field... If you can suggest an online resource to help me use the right vocabulary and better understand the fundamental concepts, I am of course grateful. I would suggest What every computer scientist should know about floating-point arithmetic *ACM Computing Surveys* *23*:1 (March 1991) for the basics. Anything by Kahan (http://www.cs.berkeley.edu/~wkahan/) is interesting. Beyond elementary floating-point arithmetic, there is of course the vast field of numerical analysis, which underlies many of the algorithms used by R and other statistical systems. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)
On Sat, Jun 20, 2009 at 4:10 PM, Dr. D. P. Kreil dpkr...@gmail.com wrote: Ah, that's probably where I went wrong. I thought R would take the 0.1, the 0.3, the 3, convert them to extended precision binary representations, do its calculations, an the reduction to normal double precision binary floats would only happen when the result was stored or printed. This proposal is problematic in many ways. For example, it would *still* not guarantee that 0.3 - 3*0.1 == 0, since extended-precision floats have the same characteristics as normal-precision floats. Would you round to normal precision when passing arguments? Then sqrt could not produce extended-precision results. etc. etc. I suppose R could support an extended-precision floating-point type, but that would require that the *user* choose which operations were in extended-precision and which in normal precision. (And of course it would be a lot of work to add in a complete and consistent way.) -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tapply with cbinded x
On Tue, Jun 16, 2009 at 5:16 AM, Stefan Uhmann stefan.uhm...@googlemail.com wrote: why does this not work? df - data.frame(var1 = c(3,2,1), var2 = c(6,5,4), var3 = c(9,8,7), fac = c('A', 'A', 'B')) tapply(cbind(df$var1, df$var2, df$var3), df$fac, mean) Because tapply is defined for atomic vectors and not for data frames. Why? I don't know. Does this do what you want?: df - data.frame(var1 = c(3,2,1), var2 = c(6,5,4), var3 = c(9,8,7)) fac - c('a','a','b') do.call(rbind, lapply(split(df,fac),mean)) var1 var2 var3 a 2.5 5.5 8.5 b 1.0 4.0 7.0 Alternatively, you can use sapply, which returns the result in matrix form. sapply(split(df,fac),mean) a b var1 2.5 1 var2 5.5 4 var3 8.5 7 as.data.frame(t(sapply(split(df,fac),mean))) var1 var2 var3 a 2.5 5.5 8.5 b 1.0 4.0 7.0 Note that sapply's matrix output form (the so-called 'simplification') needs to be transposed. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function inside ifelse
Of course functions can be used inside ifelse. They should return vectors. Be careful of the effect of recycling: ifelse(c(F,T,F,T,F,T),1:3,10:20) [1] 10 2 12 1 14 3 with functions: f- function(x) x/mean(x) ifelse(c(F,T,F,T,F,T),sqrt(1:3),f(10:20)) [1] 0.667 1.4142136 0.800 1.000 0.933 1.7320508 -s On Mon, Jun 15, 2009 at 10:39 AM, Grze¶ gregori...@gmail.com wrote: Could you tell me, if it's possible to create ifelse and put function inside, for example: code{ ifelse ((is.na(vek)), call_fun_1(arguments), call_fun_2(arguments)) call_fun_1 - function(arguments) { sth... } } -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Referencing data frames
On Mon, Jun 15, 2009 at 12:38 PM, Payam Minoofar payam.minoo...@meissner.com wrote: ...I would like to have a function acquire an object by reference, and within the function create new objects based on the original object and then use the name of the original object as the base for the names of the newly created objects. It seems to me that the optimal way of doing this is to have the function acquire the name of the object as a string, and then use get() to access the object, and then to use the same string to do the name formation of the new objects Instead of creating new names through string manipulation, I'd think it would be cleaner and simpler to use the list mechanism to return a structured object, e.g. ddd - function (obj) list( new1 = makenew1(obj), new2 = makenew2(obj), new3 = makenew3(obj) ) Then you'd write, e.g. ddx - ddd(oldobj) ddx$new1 names new1 Perhaps this will work for you -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tables without names
On Fri, Jun 12, 2009 at 6:09 AM, Duncan Murdoch murd...@stats.uwo.cawrote: On 11/06/2009 5:35 PM, Stavros Macrakis wrote: A table without names displays like a vector: unname(table(2:3)) [1] 1 1 1 and preserves the table class (as with unname in general): dput(unname(table(2:3))) structure(c(1L, 1L), .Dim = 2L, class = table) Does that make sense? R is not consistent in its treatment of such unname'd tables: One of the complaints about the S3 object system is that anything can claim to be of class foo, even if it doesn't have the right structure so that foo methods work for it. Yes, that is one of its flaws. More specifically, in this case, operations on S3 objects can change them from being valid to being invalid. I think that's all you're seeing here: you've got something that is mislabelled as being of class table. Yes. The solution is don't do that. Agreed! But it's not clear to me how unname can *know* how not to do that in the general case. After all, unname on a vector of POSIXct's leaves a valid POSIXct object. ... PS What is the standard way of extracting just the underlying vector? c(unname(...)) works -- is that what is recommended? I would use as.numeric(), but I don't claim it's standard. Makes sense, as does the suggestion as.vector. So I guess the summary of 'stripping' operations is: c --- strip all attributes (including most but not all classes) except for names unname -- strip name attributes, but no other attributes (including class) unclass -- strip only class attribute as.vector -- strip all attributes including class and name; convert generic vectors to atomic vectors Am I missing others? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tables without names
A table without names displays like a vector: unname(table(2:3)) [1] 1 1 1 and preserves the table class (as with unname in general): dput(unname(table(2:3))) structure(c(1L, 1L), .Dim = 2L, class = table) Does that make sense? R is not consistent in its treatment of such unname'd tables: In plot, they are considered erroneous input: plot(unname(table(2:3))) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ but in melt, they act as though they have names 1:n: melt(unname(table(2:3))) indicies value 11 1 22 1 (By the way, is the spelling error built into too much code to be corrected?) -s PS What is the standard way of extracting just the underlying vector? c(unname(...)) works -- is that what is recommended? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
Various people have provided technical solutions to your problem. May I suggest, though, that 'splice' isn't quite the right word for this operation? Splicing two pieces of rope / movie film / audio tape / wires / etc. means connecting them at their ends, either at an extremity or in the middle, e.g. X: Y: Extremity splice: xx or yyxx Middle splice: xxxyyyx or yyyxxx The splice itself is the point of connection (xy or yx) between two things. In normal English, splicing never refers to interspersing alternate members of X and Y. This may seem like a minor point, but I think it is worthwhile using descriptive names for functions. -s On Tue, Jun 9, 2009 at 5:12 AM, Titus von der Malsburg malsb...@gmail.comwrote: An operation that I often need is splicing two vectors: splice(1:3, 4:6) [1] 1 4 2 5 3 6 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splicing factors without losing levels
On Tue, Jun 9, 2009 at 11:16 AM, Titus von der Malsburg malsb...@gmail.comwrote: On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote: This may seem like a minor point, but I think it is worthwhile using descriptive names for functions. Makes sense. I thought I've seen this use somewhere else (probably in Lisp?). What better name do you suggest for this operation? The two meanings I can think of in Lisp for splicing are 1) The backquote operator ,@X, which means to insert the value of X as part of the surrounding list rather than as an element of the list, e.g. `(a b ,@'(c d) e f) == (append '(a b) '(c d) '(e f)) = (a b c d e f), as opposed to `(a b ,'(c d) e f) == (append '(a b) (list '(c d)) '(e f)) = (a b (c d) e f). 2) The notion of inserting (typically destructively) one list in the middle of another. I would suggest a name like 'intersperse' or 'alternate'. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using regular expressions to retrieve a digit-digit-dot structure from a string
On Tue, Jun 9, 2009 at 7:44 AM, Mark Heckmann mark.heckm...@gmx.de wrote: Thanks for your help. Your answers solved the problem I posted and that is just when I noticed that I misspecified the problem ;) My problem is to separate a German texts by sentences. Unfortunately I haven't found an R package doing this kind of text separation in German, so I try it manually. Just using the dot as separator fails in occasions like: txt - One January 1. I saw Rick. He was born in the 19. century. Sentence boundary disambiguation is a non-trivial problem, as you can see in your above example (cf. I arrived on January 1. I saw Rick.). You can get ~95% accuracy fairly straightforwardly, but the last 5% are hard. Take a look at http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation, which points to other good resources. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else
On Mon, Jun 8, 2009 at 1:48 PM, Cecilia Carmo cecilia.ca...@ua.pt wrote: I have the following dataframe: firm-c(rep(1:3,4)) year-c(rep(2001:2003,4)) X1-rep(c(10,NA),6) X2-rep(c(5,NA,2),4) data-data.frame(firm, year,X1,X2) data So I want to obtain the same dataframe with a variable X3 that is: X1, if X2=NA X2, if X1=NA X1+X2 if X1 and X2 are not NA So my final data is X3-c(15,NA,12,5,10,2,15,NA,12,5,10,2) finaldata-data.frame(firm, year,X1,X2,X3) I've tried this finaldata-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data$varvendas==NA,X1,X1+X2)) But I got just NA in X3. Anyone could help me with this? The problem here is that comparing NA to anything always gives NA, even for NA==NA. To check for NA, you need to use is.na, e.g. data$X3 - ifelse( is.na(data$X1), data$X2, ifelse( is.na(data$X2), data$X1, data$X1+data$X2 ) (you don't need to handle the is.na(X1) is.na(X2) case specially) which you can make more compact using 'with': data$X3 - with(data, ifelse( is.na(X1), X2, ifelse( is.na(X2), X1, X1+X2 ))) Hope this helps, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else
On Mon, Jun 8, 2009 at 3:36 PM, Don MacQueen m...@llnl.gov wrote: Though I do agree that the way you've written the general case with any/ is.na and sum/na.rm is cleaner and clearer because more general, I don't agree at all with what you say about nested ifelse's vs. a series of assignments: In my opinion, nested ifelse() expressions are difficult to read and understand, and therefore difficult to get right. Easier to write one expression for each of your criteria. But do the last one first In the ifelse case, it is easy to trace exactly what happens in each case, because all the cases are disjoint. This becomes especially clear if written with a lot of whitespace and proper indentation: ifelse( is.na(X1), X2, # the is.na(X1) case ifelse( is.na(X2), # the !is.na(X1) case X1, # the !is.na(X1) is.na(X2) case X1+X2 ))) # the !is.na(X1) !is.na(X2) case I suppose it might be clearer for some users at least if you wrote out *all* the cases, even though they're not necessary: ifelse( is.na(X1), ifelse( is.na(X2),# the is.na(X1) cases NA, # the is.na(X1) is.na(X2) case X2 )))# the is.na(X1) !is.na(X2) case ifelse( is.na(X2),# the !is.na(X1) cases X1, # the !is.na(X1) is.na(X2) case X1+X2 ))) # the !is.na(X1) !is.na(X2) case On the other hand, with the multiple assignment case, if you're not careful, it's easy to have different statements overwriting each other's results in unintended ways. For those who've been around programming for a while, they may recall Dijkstra's goto considered harmful letter -- which is echoed by functional programming's assignment considered harmful! -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Done: Fast way of finding top-n values of a long vector
On Fri, Jun 5, 2009 at 4:09 AM, Allan Engelhardt all...@cybaea.com wrote: I'm all done now. The max2 version below is what I went with in the end for my proposed change to caret::nearZeroVar (which used the sort method). Max Kuhn will make it available on CRAN soon. It speeds up that routine by a factor 2-5 on my test cases and uses much less memory. You can save a little in max2 like this: max2a = {w-which.max(x); x[w]/max(x[-w], na.rm=TRUE);} If you don't need to handle NA's (or if you know a priori how many there are), you can also speed up part: parta = {sel - length(x)+c(-1,0); a-sort.int(x, partial=sel, na.last=NA)[2:1]; a[1]/a[2];} which becomes about as fast as max2. library(rbenchmark) set.seed(1); x - runif(1e7, max=1e8); benchmark( replications=20, columns=c(test,elapsed), order=elapsed , sort = {a-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];} , qsrt = {a-sort(x, decreasing=TRUE, na.last=NA, method=quick)[1:2]; a[1]/a[2];} , part = {a-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];} , parta = {end-length(x)+c(-1,0); a-sort.int(x, partial=end, na.last=FALSE)[end]; a[1]/a[2]; } , max1 = {m-max(x, na.rm=TRUE); w-which(x==m)[1]; m/max(x[-w],na.rm=TRUE);} , max2 = {w-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);} , max2a = {w-which.max(x); x[w]/max(x[-w], na.rm=TRUE);} ) test elapsed 7 max2a7.80 6 max28.94 4 parta9.05 3 part 10.72 5 max1 20.21 2 qsrt 49.33 1 sort 94.18 For what it is worth, I also made a C version (cmax below) which of course is faster yet again and scales nicely for returning the top n values of the array: cmax - function (v) {max - vector(double,2); max - .C(test, as.double(v), as.integer(length(v)), max, NAOK=TRUE)[[3]]; return(max[1]/max[2]);} library(rbenchmark) set.seed(1); x - runif(1e7, max=1e8); x[1] - NA; benchmark( replications=20, columns=c(test,elapsed), order=elapsed , sort = {a-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];} , qsrt = {a-sort(x, decreasing=TRUE, na.last=NA, method=quick)[1:2]; a[1]/a[2];} , part = {a-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];} , max1 = {m-max(x, na.rm=TRUE); w-which(x==m)[1]; m/max(x[-w],na.rm=TRUE);} , max2 = {w-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);} , cmax = {cmax(x);} ) # test elapsed # 6 cmax 4.394 # 5 max2 8.954 # 4 max1 18.835 # 3 part 21.749 # 2 qsrt 46.692 # 1 sort 77.679 Thanks for all the suggestions and comments. Allan. PS: Slightly off-topic but is there a way within the syntax of R to set up things so that 'sort' (or any function) would know it is called in a partial list context in sort(x)[1:2] and it therefore could choose to use the partial argument automatically for small [] lists? The R interpreter of course knows full well that it is going to drop all but the first two values of the result before it calls 'sort'. Perl has 'use Want' where howmany() and want(n) provides a subset of this functionality (essentially for [] lists of the form 1:n). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] all.equal(0,0i)
all.equal(0,0i) [1] Modes: numeric, complex [2] target is numeric, current is complex all.equal(1,1+0i) [1] Modes: numeric, complex [2] target is numeric, current is complex Is this the intended behavior? In general, all.equal is strict about argument mode, thus TRUE/1 and 1/'1' do not compare equal (unlike ==). On the other hand, 1L and 1.0 do compare equal (unlike identical). ? all.equal discusses the 'numerical' case, and mentions what metric is used for complex arguments, but doesn't make it clear whether 'complex' is considered 'numerical' (as opposed to 'numeric', which in R terms means integer or double). -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error:non-numeric argument in my function
Agreed, that's even better, e.g. Error in 1 * a : character argument not allowed for arithmetic operator * For some reason (does anyone know the rationale?), in the case of factors, you don't get an error, but a more explicit warning and an NA result: 2*factor(3) [1] NA Warning message: In Ops.factor(2, factor(3)) : * not meaningful for factors This seems hazardous, especially since the user has to be sophisticated enough to know about options(warn=2) to get a traceback for this. As for data frames, arithmetic operators seem to work if all the values are numeric: 2*data.frame(a=1) a 1 2 It's a hard problem to make useful error messages for beginning users -s On Mon, Jun 1, 2009 at 4:34 AM, Patrick Burns pbu...@pburns.seanet.comwrote: I thought Stavros' suggestion was going to be to have the error message say what type of offending object was found. If the message said that a list of class 'data.frame' was found (probably the leading case), then that would be much more helpful. Patrick Burns patr...@burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of The R Inferno and A Guide for the Unwilling S User) Stavros Macrakis wrote: On Sun, May 31, 2009 at 6:10 PM, jim holtman jholt...@gmail.com wrote: Message is very clear: 1 * 'a' Error in 1 * a : non-numeric argument to binary operator Though the user should have been able to figure this out, perhaps the error message could be improved? After all, it is not the fact that the operator is *binary* that implies that its argument must be numeric, but that it is *arithmetic*. The binary operator %in%, for example, takes non-numeric arguments. Suggested replacement error message: non-numeric argument to arithmetic operator -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error:non-numeric argument in my function
On Sun, May 31, 2009 at 6:10 PM, jim holtman jholt...@gmail.com wrote: Message is very clear: 1 * 'a' Error in 1 * a : non-numeric argument to binary operator Though the user should have been able to figure this out, perhaps the error message could be improved? After all, it is not the fact that the operator is *binary* that implies that its argument must be numeric, but that it is *arithmetic*. The binary operator %in%, for example, takes non-numeric arguments. Suggested replacement error message: non-numeric argument to arithmetic operator -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] max.col specification
I'm not sure I understand the max.col spec or its rationale. In particular: * What is the significance and effect of assuming that the entries are probabilities, as they do not seem to be limited to the interval [0,1]? * In what contexts is it useful for max.col to consider numbers within a certain tolerance equal? * Why is a fixed relative tolerance of 1e-5 useful? That seems many orders of magnitude greater than typical rounding errors, but arbitrary in terms of data analysis, where different data sets or statistics may have widely varying error distributions. And I'd have thought a tolerance of 0 natural in many cases. My guess is that there is some particular kind of analysis where these are all natural background assumptions, but it is not clear what that analysis is. Also, max.col is part of 'base', so the authors must have thought that these assumptions were generally applicable. Can someone clarify? Thanks, -s On Thu, May 28, 2009 at 5:02 PM, Bert Gunter gunter.ber...@gene.com wrote: Try reading the man page, which says: Details When ties.method = random, as per default, ties are broken at random. In this case, the determination of a tie assumes that the entries are probabilities: there is a relative tolerance of 1e-5, relative to the largest (in magnitude, omitting infinity) entry in the row. Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Daryl Morris Sent: Thursday, May 28, 2009 1:47 PM To: r-help@r-project.org Subject: [R] max.col weirdness Hi, I think there's some rounding issue with returning the max column. (running 2.9.0 on an Apple, but my buddy found it on his PC) x - matrix(c(1234.568,1234.569,1234.567),1) max.col(x) [1] 2 x - matrix(c(12345.568,12345.569,12345.567),1) max.col(x) [1] 3 x - matrix(c(112345.568,112345.569,112345.567),1) max.col(x) [1] 3 max.col(-x) [1] 1 Thanks, Daryl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maxtrix to permutation vector
Not sure what you mean by permutations here. I think what you mean is that given a matrix m, you want a matrix whose rows are c(i,j,m[i,j]) for all i and j. You can use the `melt` function in the `reshape` package for this. See below. Hope this helps, -s library(reshape) melt(matrix(1:4,2,2)) X1 X2 value 1 1 1 1 2 2 1 2 3 1 2 3 4 2 2 4 big - matrix(1:700^2,700,700) head(melt(big)) X1 X2 value 1 1 1 1 2 2 1 2 3 3 1 3 4 4 1 4 5 5 1 5 6 6 1 6 system.time(melt(big)) user system elapsed 0.080.000.08 On Fri, May 29, 2009 at 2:08 PM, Ian Coe i...@connectcap.com wrote: Hi, Is there a way to convert a matrix into a vector representing all permutations of values and column/row headings with native R functions? I did this with 2 nested for loops and it took about 25 minutes to run on a ~700x700 matrix. I'm assuming there must be a smarter way to do this with R's vector commands, but being new to R, I'm having trouble making it work. Thanks, Ian [a] [b] [c] [d]147 [e]258 [f]369 a d 1 a e 2 a f 3 b d 4 b e 5 b f 6 c d 7 c e 8 c f 9 Ian Coe Connective Capital Management, LLC 385 Homer Ave. Palo Alto, CA 94301 (650) 321-4826 ext. 03 CONFIDENTIALITY NOTICE: This e-mail communication (inclu...{{dropped:23}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] maxtrix to permutation vector
Oh, I should have mentioned that the result of melt is a data.frame, not a matrix. You can convert with as.matrix if you like. I should also have shown that dimnames are carried along: m - matrix(1:4,2,2,dimnames=list(x=c('a','b'),y=c('x','y'))) m y x x y a 1 3 b 2 4 melt(m) x y value 1 a x 1 2 b x 2 3 a y 3 4 b y 4 as.matrix(melt(m)) x y value [1,] a x 1 [2,] b x 2 [3,] a y 3 [4,] b y 4 On Fri, May 29, 2009 at 2:53 PM, Stavros Macrakis macra...@alum.mit.eduwrote: Not sure what you mean by permutations here. I think what you mean is that given a matrix m, you want a matrix whose rows are c(i,j,m[i,j]) for all i and j. You can use the `melt` function in the `reshape` package for this. See below. Hope this helps, -s library(reshape) melt(matrix(1:4,2,2)) X1 X2 value 1 1 1 1 2 2 1 2 3 1 2 3 4 2 2 4 big - matrix(1:700^2,700,700) head(melt(big)) X1 X2 value 1 1 1 1 2 2 1 2 3 3 1 3 4 4 1 4 5 5 1 5 6 6 1 6 system.time(melt(big)) user system elapsed 0.080.000.08 On Fri, May 29, 2009 at 2:08 PM, Ian Coe i...@connectcap.com wrote: Hi, Is there a way to convert a matrix into a vector representing all permutations of values and column/row headings with native R functions? I did this with 2 nested for loops and it took about 25 minutes to run on a ~700x700 matrix. I'm assuming there must be a smarter way to do this with R's vector commands, but being new to R, I'm having trouble making it work. Thanks, Ian [a] [b] [c] [d]147 [e]258 [f]369 a d 1 a e 2 a f 3 b d 4 b e 5 b f 6 c d 7 c e 8 c f 9 Ian Coe Connective Capital Management, LLC 385 Homer Ave. Palo Alto, CA 94301 (650) 321-4826 ext. 03 CONFIDENTIALITY NOTICE: This e-mail communication (inclu...{{dropped:23}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] custom sort?
I agree that it is surprising that R doesn't provide a sort function with a comparison function as argument. Perhaps that is partly because calling out to a function for each comparison is relatively expensive; R prefers vector operations. That said, many useful custom sorts are easy to define by reordering, possibly using the 'order' function, e.g. rr - function (v) v[order( v %% 10 , v 500, - v ) ] # sort first by last digit (ascending), then by whether 500, then by magnitude (descending) set.seed(2009) rr(sample(1000,30)) [1] 840 670 580 140 100 10 991 901 881 561 231 71 722 662 432 222 32 473 53 [20] 24 645 796 86 697 607 567 397 257 77 818 568 428 198 619 569 479 439 299 Hope this helps, -s On Thu, May 28, 2009 at 6:06 PM, Steve Jaffe sja...@riskspan.com wrote: hmm, that is what I was afraid of. I considered that but thought to myself, surely there must be an easier way. I wonder why this feature isn't available. It's there in scripting languages, like perl, but also in hardcore languages like C++ where std::sort and sorted containers allow the user to provide a comparison function (even for builtin types like int). It's hard to believe that you have to jump through more hoops to do a custom sort in R than in C++ ... You put a class on the vector... -- View this message in context: http://www.nabble.com/custom-sort--tp23770565p23770964.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC package: how to check whether connection is open
What is the recommended way of checking whether an RODBC connection is open? Since odbcValidChannel is not exported from namespace RODBC, I suppose I shouldn't be using it. This is the best I could come up with, but it seems a bit 'dirty' to be using a tryCatch for something like this: odbcOpenp - function(conn) tryCatch({odbcGetInfo(conn);TRUE},error=function(...)FALSE) Suggestions? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] custom sort?
I couldn't get your suggested method to work: `==.foo` - function(a,b) unclass(a)==unclass(b) `.foo` - function(a,b) unclass(a) unclass(b) # invert comparison is.na.foo - function(a)is.na(unclass(a)) sort(structure(sample(5),class=foo)) #- 1:5 -- not reversed What am I missing? -s On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.cawrote: On 28/05/2009 5:34 PM, Steve Jaffe wrote: Sounds simple but haven't been able to find it in docs: is it possible to sort a vector using a user-defined comparison function? Seems it must be, but sort doesn't seem to provide that option, nor does order sfaics You put a class on the vector (e.g. using class(x) - myvector), then define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison methods (you'll need ==.myvector, .myvector, and is.na.myvector). Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to exclude a column by name?
On Wed, May 27, 2009 at 6:37 AM, Zeljko Vrba zv...@ifi.uio.no wrote: Given an arbitrary data frame, it is easy to exclude a column given its index: df[,-2]. How to do the same thing given the column name? A naive attempt df[,-name] did not work :) Various ways: Boolean index vector: df[ , names(df) != name ] List of wanted column names: df[ , setdiff(names(df), name) ] Negated list of unwanted column indexes: df[ , -match(name,names(df)) ] df[ , -which(names(df) == name) ] The special 'subset' hack for column names; beware, I think this is the only place in R where you can negate a column name. subset(df , select = -a ) Hope this helps, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining functions - an interesting problem
The 'ties.method' argument to 'rank' is the *third* positional argument to 'rank', so either you need to put it in the third position or you need to use a named argument. The fact that the variable you're using to represent ties.method is called ties.method is irrelevant. That is, this: rank(x,ties.method) is equivalent to rank(x, na.last = ties.method) which is not what you want. You need to write rank(x, ties.method = ties.method) or (more concise but not as clear): rank(x, , ties.method) Hope this helps, -s On Wed, May 27, 2009 at 10:11 AM, utkarshsinghal utkarsh.sing...@global-analytics.com wrote: I define the following function: (Please don't wonder about the use of this function, this is just a simplified version of my actual function. And please don't spend your time in finding an alternate way of doing the same as the following does not exactly represent my function. I am only interested in a good explanation) f1 = function(x,ties.method=average)rank(x,ties.method) f1(c(1,1,2,4), ties.method=min) [1] 1.5 1.5 3.0 4.0 I don't know why it followed ties.method=average. Anyways I randomly tried the following: f2 = function(x,ties.method=average)rank(x,ties.method=ties.method) f2(c(1,1,2,4), ties.method=min) [1] 1 1 3 4 Now, it follows the ties.method=min I don't see any explanation for this, however, I somehow mugged up that if I define it as in f1, the ties.method in rank function takes its default value which is average and if I define as in f2, it takes the value which is passed in f2. But even all my mugging is wasted when I tested the following: h = function(x, a=1)x^a g1 = function(x, a=1)h(x,a) g1(x=5, a=2) [1] 25 g2 = function(x, a=1)h(x,a=a) g2(x=5, a=2) [1] 25 Here in both the cases, h is taking the value passed through g1, and g2. Any comments/hints can be helpful. Regards Utkarsh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Books listing on R-Project
I was wondering what the criteria were for including books on the Books Related to R page http://www.r-project.org/doc/bib/R-books.html. (There is no maintainer listed on this page.) In particular, I was wondering why the following two books are not listed: * Andrew Gelman, Jennifer Hill, *Data Analysis Using Regression and Multilevel/Hierarchical Models*. (CRAN package 'arm') * Michael J. Crawley, *The R Book*. (reviewed, rather negatively, in *R News * *7*:2) Is the list more or less arbitrary? Does it reflect some editorial judgment about the value of these books? If so, it might be more useful to include the books, but with critical reviews. It doesn't seem to be a matter of up-to-dateness, because 38/87 of the listed books were published in a more recent year than Gelman or Crawley. The list is currently in reverse chronological order. I wonder if it would be useful to group the entries thematically -- I'd be happy to help on that project. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] OWL (Web Ontology Language) in R?
Is anyone working on an R package for manipulating OWL (Web Ontology Language), either natively or via an external library? I don't see anything obviously relevant in CRAN, though of course OWL functionality could be built up starting with the XML package. Thanks, -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML parse error
On Sun, May 24, 2009 at 12:28 PM, kulwinder banipal kbani...@hotmail.comwrote: It is for sure little complicated then a plain XML file. The format of binary file is according to XML schema. I have been able to get C parser going to get information from binary with one caveat - I have to manually read the XML schema and figure out which byte means what in binary and then code it in C. There are many ways of encoding XML in a compact binary form (cf. http://en.wikipedia.org/wiki/Binary_XML), none widely accepted yet. The XML schema does not specify the binary form. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Class for time of day?
On Thu, May 21, 2009 at 8:28 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: It uses hours/minutes/seconds for values 1 day and uses days and fractions of a day otherwise. Yes, my examples were documenting this idiosyncracy. For values and operations that it has not considered it falls back to the internal representation. Yes, my examples were documenting this bad behavior. Most of your examples start to make sense once you realize this. Of course I realize this. That was the point of my examples. I understand perfectly well what is *causing* the bad behavior. That doesn't make it less bad. What is the point of a class system if functions ignore the class and perform meaningless calculations on the internal representation? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Class for time of day?
On Fri, May 22, 2009 at 10:03 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Regarding division you could contribute that to the chron package. I've contributed a few missing items and they were incorporated. Good to know. Maybe I'll do that Giving an error when it does not understand something would be dangerous as it could break much existing code so that would probably not be possible at this stage. But would it break any existing *correct* code? I find it hard to imagine any cases where adding 1 hour of difftime to times(12:00:00) should return 1.5 days rather than 13:00:00. The idea of defaulting to internal representations is based on the idea that you get many features for free since the way the internal representations work gives the right answer in many cases. I must admit I am rather shocked by this approach. Getting something for free is a bad bargain if what you get is nonsense. Its best to stick with the implicit philosophy that underlies a package. If you want a different philosophy then its really tantamount to creating a new package. I don't think that one is right and the other wrong but simply represent different viewpoints. So you would defend the viewpoint that 1 hour is the same thing as 1 day? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Class for time of day?
On Fri, May 22, 2009 at 12:28 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: ...The way this might appear in code is if someone wanted to calculate the number of one hour intervals in 18 hours. One could write: t18 - times(18:00:00) t1 - times(1:00:00) as.numeric(t18) / as.numeric(t1) but since we all know that it uses internal representations unless it indicates otherwise Um, yes, I suppose that was the attitude in the 60's and 70's, but I think we have moved on from there. cf. http://en.wikipedia.org/wiki/Data_abstraction a typical code snippet might shorten it to: as.numeric(t18 / t1) and all such code would break if one were to cause that to generate an error. (18/24 day)/(1/24 day) is the perfectly meaningful dimensionless number 18, so this code should not break with a correct implementation of '/'. (cf. http://en.wikipedia.org/wiki/Dimensional_analysis). Alas, chron gives the nonsense result of 18 days. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Class for time of day?
On Wed, May 20, 2009 at 12:28 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There is a times class in the chron package. Perfect! Just what I was looking for. On Wed, May 20, 2009 at 12:19 PM, jim holtman jholt...@gmail.com wrote: If you want the hours from a POSIXct, here is one way of doing it... y - difftime(x, trunc(x, units='days'), units='hours') Ah, trunc.POSIXt -- I missed that one, thanks. It depends on what type of computations you want to do with it. You can leave it as POSIXct and carry out a lot of them. Can you specify what you want? I am comparing irregular time series from different days, looking at the differences in intraday patterns. So I want to put them on a common 0-24h scale and then do various kinds of plots and analyses, keeping the conventional display form (10:30 etc.) when specific times display or print. It looks as though chron:::times combined with trunc.POSIXt pretty much solves my problem, except that `times` ignores the time units: as.POSIXct('2009-3-23 12:23')-trunc(as.POSIXct('2009-3-23 12:23'),day) Time difference of 12.38333 hours times(as.POSIXct('2009-3-23 12:23')-trunc(as.POSIXct('2009-3-23 12:23'),day)) Time in days: seems to treat difftimes as raw numbers!! [1] 12.38333 Obviously I can work around this, but shouldn't `times` give an error when it encounters an object of unknown class rather than unsafely using its internal representation? Of course, better still if `times` converted correctly In general, `times` has other inconsistent and peculiar behavior: times(2) = Time in days: 2Allows specifying multi-day periods, OK times(1.5) = Time in days: 1.5 Allows specifying fractional multi-day periods, OK times(0.5) = 12:00:00 Inconsistent format compared to times(1.5) times(18:00:00) + times(18:00:00) = Time in days: 1.5, OK times(36:00:00) = error Why does it allow times(1.5) and times(18:00:00) + times(18:00:00) to specify 1.5 days, but not 36 hours? times(-0.5) = -0.5 Why doesn't it print Time in days: -0.5? times(18:00:00)/times(1:00:00) = Time in days: 18Incorrect dimensions; meaningless result -- should be dimensionless times(18:00:00) * times(10:00:00) = 07:30:00 Incorrect dimensions; meaningless result. sin(times(18:00:00)) = 16:21:34 Meaningless result -- should be error It's nice that R has a class system, but if code ignores the class There is an article on dates and times in R News 4/1. Thanks for the pointer. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Functions returning functions
On Wed, May 20, 2009 at 7:21 AM, Paulo Grahl pgr...@gmail.com wrote: A - function(parameters) { # calculations w/ parameters returning 'y' tmpf - function(x) { # function of 'y' } return(tmpf) } The value of the parameters are stored in an environment local to the function. Then I call x- something B-A(x) When R executes this last statement, does it perform all the calculations inside function A again (i.e., all the calculations that yield 'y') or the value of 'y' is already stored in the function's local environment ? A - function(q) { print(calculating y) y - q+1 function(x) print(paste(value of x:,x,value of y:,y)) } A(5) [1] calculating y function(x) print(paste(value of x:,x,value of y:,y)) environment: 0x07abe2a8 A(5)(4) [1] calculating y [1] value of x: 4 value of y: 6 A5 - A(5) [1] calculating y A5(4) [1] value of x: 4 value of y: 6 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Too large a data set to be handled by R?
On Tue, May 19, 2009 at 11:59 PM, tsunhin wong thjw...@gmail.com wrote: In order to save time, I am planning to generate a data set of size 1500 x 2 with each data point a 9-digit decimal number, in order to save my time. I know R is limited to 2^31-1 and that my data set is not going to exceed this limit. But my laptop only has 2 Gb and is running 32-bit Windows / XP or Vista. 32-bit R on Windows XP with 2GB RAM has no problem with a matrix this size (not just integers, but also numerics): system.time(mm - matrix( numeric(1500 * 2), 1500, 2)) user system elapsed 0.590.231.87 system.time(nn - matrix( runif(1500 * 2), 1500, 2)) user system elapsed 2.660.64 13.39 system.time(oo - nn + 3) user system elapsed 0.240.170.41 system.time(pp - oo - oo) user system elapsed 0.150.130.28 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Class for time of day?
What is the recommended class for time of day (independent of calendar date)? And what is the recommended way to get the time of day from a POSIXct object? (Not a string representation, but a computable representation.) I have looked in the man page for DateTimeClasses, in the Time Series Analysis Task View and in Spector's Data Manipulation book but haven't found these. Clearly I can create my own Time class and hack around with the internal representation of POSIXct, e.g. days - unclass(d)/(24*3600) days-floor(days) and write print.Time, `-.Time`, etc. etc. but I expect there is already a standard class or CRAN package. -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exists function on list objects gives always a FALSE
On Tue, May 19, 2009 at 12:07 PM, routík zrou...@gmail.com wrote: SmoothData - list(exists=TRUE, span=0.001) exists(SmoothData$span) FALSE As others have said, this just checks for the existence of a variable with the (strange) name SmoothData$span. In some sense, in R semantics, xxx$yyy *always* exists if xxx is a list (or other recursive object): xxx - list() xxx$hello NULL You might think that you can check names(xxx) to see if the slot has been explicitly set, but it depends on *how* you have explicitly set the slot to NULL: xxx$hello - 3 xxx$hello - NULL names(xxx) character(0) # no names -- assigning to NULL kills slot xxx - list(hello=NULL) names(xxx) [1] hello# 1 name -- constructing with NULL-valued slot Welcome to R! -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Concatenating two vectors into one
If you want to concatenate the *vectors*, you need 'c', which will also coerce the elements to a common type. If you want to concatenate the corresponding *elements* of the vectors, you need 'paste', which will coerce them to character strings. -s On 5/18/09, Henning Wildhagen hwildha...@gmx.de wrote: Dear users, a very simple question: Given two vectors x and y x-as.character(c(A,B,C,D,E,F)) y-as.factor(c(1,2,3,4,5,6)) i want to combine them into a single vector z as A1, B2, C3 and so on. z-x*y is not working, i tried several others function, but did not get to the solution. Thanks for your help, Henning -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Generic 'diff'
I would like to apply a function 'f' to the lagged version of a vector and the vector itself. This is easy to do explicitly: mapply( f, v[-1], v[-length(v)] ) or in the case of a pointwise vector function, simply f( v[-1], v[-length(v)] ) This is essentially the same as 'diff' but with an arbitrary function, not '-'. Is there a standard way to do this? Is there any particular reason that 'diff' should not have an 'f' argument? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generic 'diff'
I guess I wasn't very clear. The goal is not to define diff on a different object type, but to have a different 'subtraction' operator with the same lag logic. An easy example would be quotient instead of subtraction. Of course I could do that by simply cutting and pasting diff.default and replacing '-'(a,b) with f(a,b), but it's cleaner to use a standard function if there is one. -s On Mon, May 18, 2009 at 5:05 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: You can define a new class for the object diff operates on and then define your own diff method for that. For some examples see: methods(diff) On Mon, May 18, 2009 at 4:24 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I would like to apply a function 'f' to the lagged version of a vector and the vector itself. This is easy to do explicitly: mapply( f, v[-1], v[-length(v)] ) or in the case of a pointwise vector function, simply f( v[-1], v[-length(v)] ) This is essentially the same as 'diff' but with an arbitrary function, not '-'. Is there a standard way to do this? Is there any particular reason that 'diff' should not have an 'f' argument? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generic 'diff'
On Mon, May 18, 2009 at 6:00 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: I understood what you were asking but R is an oo language so that's the model to use to do this sort of thing. I am not talking about creating a new class with an analogue to the subtraction function. I am talking about a function which applies another function to a sequence and its lagged version. Functional arguments are used all over the place in R's base package (Xapply, sweep, outer, by, not to mention Map, Reduce, Filter, etc.) and they seem perfectly natural here. Or perhaps I am not understanding your objection. -s On Mon, May 18, 2009 at 5:48 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I guess I wasn't very clear. The goal is not to define diff on a different object type, but to have a different 'subtraction' operator with the same lag logic. An easy example would be quotient instead of subtraction. Of course I could do that by simply cutting and pasting diff.default and replacing '-'(a,b) with f(a,b), but it's cleaner to use a standard function if there is one. -s On Mon, May 18, 2009 at 5:05 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: You can define a new class for the object diff operates on and then define your own diff method for that. For some examples see: methods(diff) On Mon, May 18, 2009 at 4:24 PM, Stavros Macrakis macra...@alum.mit.edu wrote: I would like to apply a function 'f' to the lagged version of a vector and the vector itself. This is easy to do explicitly: mapply( f, v[-1], v[-length(v)] ) or in the case of a pointwise vector function, simply f( v[-1], v[-length(v)] ) This is essentially the same as 'diff' but with an arbitrary function, not '-'. Is there a standard way to do this? Is there any particular reason that 'diff' should not have an 'f' argument? -s [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] newbie: closing unused connection + readline
On Sat, May 16, 2009 at 8:34 AM, Aval Sarri aval.sa...@gmail.com wrote: # Create a socket from which to read lines - one at a time (record) reader.socket - socketConnection( host = 'localhost', 5000, server = TRUE, blocking = TRUE, open = r, encoding = getOption(encoding) ); # now read each record and split/validate it using read.table repeat { # here for each line I am opening new connection! how to avoid it? line.raw - textConnection(readLines( reader.socket, n = 1, ok = TRUE)); What is the function of textConnection here? Is read.table incompatible with socketConnection for some reason? line.raw - read.table(line.raw, sep=,); ...at the end of script I am getting closing unused connection warning This is not a problem in itself. For some reason, R gives a warning when connections are garbage collected. Of course, that can be a symptom of poor connection management, but not necessarily. In the present case, you are creating many unnecessary textConnections, and R correctly garbage collects them. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] newbie: closing unused connection + readline
On Sat, May 16, 2009 at 9:11 AM, Aval Sarri aval.sa...@gmail.com wrote: ...I tried something line this also: mydataframe - read.table (socket, sep=,); but does not work says no input lines. this also. mydataframe - read.table (readLine(socket), sep=,); Sorry, I didn't see this before my last email. This seems to be the real problem I don't understand why read.table would have a problem reading directly from a socket instead of a textConnection. Is this a bug? Some subtlety in the semantics of socketConnection as opposed to textConnection? Incorrect parameters when opening the socketConnection? -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gamma function
What exactly is the R code you wrote for your function f? Without that, it will be hard to help you. -s On Sat, May 16, 2009 at 2:48 AM, Kon Knafelman konk2...@hotmail.com wrote: Hi Guy, I am having trouble graphing the following function √2Γ(n/2)/[√n - 1Γ((n - 1)/2 for the values of n between 2 and 50. i know that Γ(n) = (n-1)!, which in R is factorial(n-1) When i type that into R, using y - function(n). and then plot(y,2,50), it doesnt give me anything meaningful, in fact, it comes up with a message saying something like in gamma(n+1) ploted or something along those lines. Can anyone please help? thanks you _ Looking to change your car this year? Find car news, reviews and more http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F859641_t=762955845_r=tig_OCT07_m=EXT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On Thu, May 14, 2009 at 3:36 PM, G. Jay Kerns gke...@ysu.edu wrote: set.seed(something) x - rnorm(100) y - runif(500) # bunch of other stuff ... Now, I give you a copy of my script.R (with the set.seed statement removed, of course) together with the .RData file that was generated by the save.image() command. ... 1) can you tell me what my original set.seed() value was?... 2) is it possible *in principle* to figure out what set.seed was, given the above? Set.seed takes an integer argument, that is, 2^32-1 distinct values (cf NA_integer_), so the very simplest approach, brute-force search, has a hope of working: whatseed - function (v) { i - as.integer(-2^31+1); max - as.integer(2^31-1) while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 } } (OK, being able to figure it out in 2*10^68 years doesn't count, but within a couple months is acceptable.) set.seed(-2^31+10) system.time(whatseed(runif(1))) user system elapsed 1.530.001.53 2^32*(1.53/10)/3600 = 18.25 18 hours 3) does the answer change if there is a remove(.Random.seed) command right before the save.image() command? Depending on which RNG algorithm (RNGkind) you use, there may be cryptographic techniques that are more efficient than brute-force search, especially if the full internal state (.Random.seed) is preserved. This all assumes that the seed is set *only* with set.seed. If .Random.seed is modified directly, there are many more possibilities for most of the RNGs. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can you tell what .Random.seed *was*?
On Fri, May 15, 2009 at 12:07 PM, Stavros Macrakis macra...@alum.mit.edu wrote: system.time(whatseed(runif(1))) Sorry, though I got lucky and my overall result is roughly correct, this is an incorrect time measure. It should be r - runif(1); system.time(whatseed(r)) because R's call-by-need semantics don't evaluate the runif before it starts running whatseed. The correct time (on my machine) is then 28 hours, not 18. Better to avoid side-effect functions as arguments -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting numbers to and from raw
How can I convert an integer or double to and from their internal representation as raws of length 4 and 8? The following works for positive integers (including those represented as floats): # Convert integer (represented as integer or double) to sequence # of raw bytes, least-significant byte first. # intToRaw(0) = raw(0) # intToRaw(17^9) = 91 64 63 9c 1b # intToRaw(2^60/3) = 40 55 55 55 55 55 55 05 (note effect of finite precision) intToRaw - function(x, n=max(0,floor(log(x)/log(256)+1))) { stopifnot(x=0) suppressWarnings( as.raw( floor( x / 2^(8*seq(0,length=n)) ) %% 256)) } but I'd think there was a simpler version that just casts the integer as a bytestring internally (for type integer at least). Also, of course, it doesn't help for getting the bit-pattern of a double. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inconsistency in representation of variables
In stats::D, I was wondering why variables are represented as symbols in expressions, but as strings in lists of variables: D(quote(x^2),x) = 2*x D(quote(x^2),quote(x)) = error Variable must be a character string Strings are not allowed in the expression to denote variables: D(quote(x),quote(x)) == D(k,x) = NA (why not an error?) -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beyond double-precision?
On Sat, May 9, 2009 at 12:17 PM, Berwin A Turlach ber...@maths.uwa.edu.au wrote: log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n) ...But we need to calculate the logarithm of a sum from the logarithms of the individual terms. ...The way to calculate log(x+y) from lx=log(x) and ly=log(y) ... max(lx,ly) + log1p(exp(-abs(lx-ly))) Agreed completely so far. But instead of calculating the logsum pairwise, you can do it all in one go, which is both more efficient and more accurate. Here are some timing and accuracy measurements of the one-shot logsum compared to the loop and the Reduce versions. (Full code at the bottom of this email.) The vector version is much faster and much more accurate in general. There must be cases where the log1p method increases accuracy, but I couldn't find them. -s Large examples to test accuracy and speed Test case: runif(1e+06) function. timeerror 1logsum 0.22 9.31e-16 2 logsum_s 0.15 9.31e-16 3 logsum_r 9.75 3.10e-13 Test case: rexp(1e+06) function. time error 1logsum 0.21 -1.40e-15 2 logsum_s 0.15 -1.40e-15 3 logsum_r 10.13 -1.38e-14 Test case: abs(rnorm(1e+06)) function. time error 1logsum 0.24 -4.38e-16 2 logsum_s 0.14 -4.38e-16 3 logsum_r 10.01 -8.74e-14 Test case: rep(1, 1e+05) function. timeerror 1logsum 0.01 1.46e-16 2 logsum_s 0.01 1.46e-16 3 logsum_r 0.96 6.24e-14 Test case: rep(10^-(1:10), each = 1) function. time error 1logsum 0.02 6.14e-16 2 logsum_s 0.01 6.14e-16 3 logsum_r 0.95 -6.96e-12 More accurate even for small cases Test case: 1:100 function. time error 1logsum0 -3.60e-16 2 logsum_s0 -3.60e-16 3 logsum_r0 3.24e-15 Test case: abs(rnorm(100)) function. time error 1logsum0 -3.48e-16 2 logsum_s0 -3.48e-16 3 logsum_r0 -2.09e-15 ## # Fast, accurate sum in log space # logsum - function(l) { maxi - which.max(l) maxl - l[maxi] maxl + log1p(sum(exp(l[-maxi]-maxl))) } ## ## # Simpler, perhaps less accurate sum in log space # logsum_s - function(l) { maxl - max(l) maxl + log(sum(exp(l-maxl))) } ## # Pairwise reduction logsum_r - function(x) Reduce( function(lx, ly) max(lx, ly) + log1p(exp(-abs(lx-ly))), x ) function_names - c(logsum,logsum_s,logsum_r) logsum_test - function(l) { cat(\nTest case:,deparse(substitute(l)),\n) realsum - sum(l) logl - log(l) results - times - list() lapply( function_names, function(f) times[[f]] - system.time( results[[f]] - getFunction(f)(logl))[1]) data.frame(`function`=function_names, time=as.numeric(times), error=(exp(as.numeric(results))-realsum)/realsum ) } set.seed(1) cat(\n\nLarge examples to test accuracy and speed\n\n) logsum_test(runif(100)) logsum_test(rexp(100)) logsum_test(abs(rnorm(100))) logsum_test(rep(1,10)) logsum_test(rep(10^-(1:10),each=1)) cat(\n\nMore accurate even for small cases\n\n) logsum_test(1:100) logsum_test(abs(rnorm(100))) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integrate lgamma from 0 to Inf
On Wed, Apr 22, 2009 at 3:28 AM, Andreas Wittmann andreas_wittm...@gmx.de wrote: i try to integrate lgamma from 0 to Inf. Both gamma and log are positive and monotonically increasing for large arguments. What can you conclude about the integrability of log(gamma(x))? -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The assign(paste(...,i),...) idiom
Judging from the traffic on this mailing list, a lot of R beginners are trying to write things like assign( paste( myvar, i), ...) where they really should probably be writing myvar[i] - ... Do we have any idea where this bizarre habit comes from? -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
mylist - c( 2,1,3,5,4 ) make a vector of numbers sort(mylist) [1] 1 2 3 4 5in sorted order mylist - c( this, is, a, test) sort(mylist) [1] ais test this in sorted order order(mylist) [1] 3 2 4 1 original positions, e.g. mylist[3] is a On Sat, Apr 18, 2009 at 10:46 AM, Dan Cary daniel_c...@hotmail.co.uk wrote: ...all i want to know is how to arrange a set of numbers in size order without putting them in a table. just arranging them from for e.g. 2,1,3,5,4 into 1,2,3,4,5 - it must be simple but i cant find how to do it anywhere __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loop question
On Fri, Apr 17, 2009 at 10:12 PM, Brendan Morse morse.bren...@gmail.com wrote: ...I would like to automatically generate a series of matrices and give them successive names. Here is what I thought at first: t1-matrix(0, nrow=250, ncol=1) for(i in 1:10){ t1[i]-rnorm(250) } What I intended was that the loop would create 10 different matrices with a single column of 250 values randomly selected from a normal distribution, and that they would be labeled t11, t12, t13, t14 etc. Very close. But since you've started out with a *matrix* t1, your assignments to t1[i] will assign to parts of the matrix. To correct this, all you need to do is initialize t1 as a *list of matrices* or (even better) as an *empty list*, like this: t1 - list() and then assign to *elements* of the list (using [[ ]] notation), not to *sublists* of the list (which is what [ ] notation means in R), like this: for(i in 1:10){ t1[[i]] - rnorm(250) } Is that what you had in mind? -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using trace
Well, yes, of course I could add the code to the function by hand. I could also calculate square roots by hand. But -- as in every other basic programming environment -- there exists an R function 'trace' which appears to automate the process, and I can't figure out how to use it to handle this most elementary and standard case. Clearly I'm missing something. -s On Thu, Apr 16, 2009 at 9:26 PM, ronggui ronggui.hu...@gmail.com wrote: Can you just print what you need to know? For example: fact - function(x) { + if(x1) ans - 1 else ans - x*fact(x-1) + print(sys.call()) + cat(sprintf(X is %i\n,x)) + print(ans) + } fact(4) fact(x - 1) X is 0 [1] 1 fact(x - 1) X is 1 [1] 1 fact(x - 1) X is 2 [1] 2 fact(x - 1) X is 3 [1] 6 fact(4) X is 4 [1] 24 2009/4/13 Stavros Macrakis macra...@alum.mit.edu: I would like to trace functions, displaying their arguments and return value, but I haven't been able to figure out how to do this with the 'trace' function. After some thrashing, I got as far as this: fact - function(x) if(x1) 1 else x*fact(x-1) tracefnc - function() dput(as.list(parent.frame()), # parent.frame() holds arg list control=NULL) trace(fact,tracer=tracefnc,print=FALSE) but I couldn't figure out how to access the return value of the function in the 'exit' parameter. The above also doesn't work for ... arguments. (More subtly, it forces the evaluation of promises even if they are otherwise unused -- but that is, I suppose, a weird and obscure case.) Surely someone has solved this already? What I'm looking for is something very simple, along the lines of old-fashioned Lisp trace: defun fact (i) (if ( i 1) 1 (* i (fact (+ i -1) FACT (trace fact) (FACT) (fact 3) 1 (FACT 3) 2 (FACT 2) 3 (FACT 1) 4 (FACT 0) 4 (FACT 1) 3 (FACT 1) 2 (FACT 2) 1 (FACT 6) 6 Can someone help? Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HUANG Ronggui, Wincent PhD Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using trace
Yes, that is similar to the solution in my original posting, but doesn't solve the problem I was having with that solution, namely reporting on the return value. -s On Fri, Apr 17, 2009 at 11:28 AM, ronggui ronggui.hu...@gmail.com wrote: Here is a partial solution: trace(fact,quote({cat(sprintf(x= %i\n,x));return}),print=T) [1] fact fact(4) Tracing fact(4) on entry x= 4 Tracing fact(x - 1) on entry x= 3 Tracing fact(x - 1) on entry x= 2 Tracing fact(x - 1) on entry x= 1 Tracing fact(x - 1) on entry x= 0 [1] 24 2009/4/17 Stavros Macrakis macra...@alum.mit.edu: Well, yes, of course I could add the code to the function by hand. I could also calculate square roots by hand. But -- as in every other basic programming environment -- there exists an R function 'trace' which appears to automate the process, and I can't figure out how to use it to handle this most elementary and standard case. Clearly I'm missing something. -s On Thu, Apr 16, 2009 at 9:26 PM, ronggui ronggui.hu...@gmail.com wrote: Can you just print what you need to know? For example: fact - function(x) { + if(x1) ans - 1 else ans - x*fact(x-1) + print(sys.call()) + cat(sprintf(X is %i\n,x)) + print(ans) + } fact(4) fact(x - 1) X is 0 [1] 1 fact(x - 1) X is 1 [1] 1 fact(x - 1) X is 2 [1] 2 fact(x - 1) X is 3 [1] 6 fact(4) X is 4 [1] 24 2009/4/13 Stavros Macrakis macra...@alum.mit.edu: I would like to trace functions, displaying their arguments and return value, but I haven't been able to figure out how to do this with the 'trace' function. After some thrashing, I got as far as this: fact - function(x) if(x1) 1 else x*fact(x-1) tracefnc - function() dput(as.list(parent.frame()), # parent.frame() holds arg list control=NULL) trace(fact,tracer=tracefnc,print=FALSE) but I couldn't figure out how to access the return value of the function in the 'exit' parameter. The above also doesn't work for ... arguments. (More subtly, it forces the evaluation of promises even if they are otherwise unused -- but that is, I suppose, a weird and obscure case.) Surely someone has solved this already? What I'm looking for is something very simple, along the lines of old-fashioned Lisp trace: defun fact (i) (if ( i 1) 1 (* i (fact (+ i -1) FACT (trace fact) (FACT) (fact 3) 1 (FACT 3) 2 (FACT 2) 3 (FACT 1) 4 (FACT 0) 4 (FACT 1) 3 (FACT 1) 2 (FACT 2) 1 (FACT 6) 6 Can someone help? Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HUANG Ronggui, Wincent PhD Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html -- HUANG Ronggui, Wincent PhD Candidate Dept of Public and Social Administration City University of Hong Kong Home page: http://asrr.r-forge.r-project.org/rghuang.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Intersection of two sets of intervals
There is a very nice intervals package in CRAN. It is impressively efficient even for intersections of many millions of intervals. If I remember correctly, it is purely in-core, so on a 32-bit R you'll be limited to something like 100 million intervals. Is that enough for your application? -s On Wed, Apr 15, 2009 at 8:59 AM, Thomas Meyer t...@cornell.edu wrote: Hi, Algorithm question: I have two sets of intervals, where an interval is an ordered pair [a,b] of two numbers. Is there an efficient way in R to generate the intersection of two lists of same? For concreteness: I'm representing a set of intervals with a data.frame: list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) list1 open close 1 1 2 2 5 10 list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) list2 open close 1 1.5 2.5 2 3.0 10.0 How do I get the intersection which would be something like: open close 1 1.5 2.0 2 5.0 10.0 I wonder if there's some ready-built functionality that might help me out. I'm new to R and am still learning to vectorize my code and my thinking. Or maybe there's a package for interval arithmetic that I can just pull off the shelf. Thanks, -tom -- Thomas Meyer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automating object creation
It is certainly possible to create x2, x4, etc. using something like assign( sprintf(x%d,i), ...value... ). But are you sure you need separate *variables* x2, x4, etc.? Why not create a list of vectors addressible as x[2] etc.? You can do that with x - list() (to define the data type of x as allowing generic objects) then x[2] - ... value ... etc. -s On Tue, Apr 14, 2009 at 1:32 PM, Zachary Patterson zak.patter...@gmail.com wrote: I am new to R. I would like to automate the creation of a number of vectors but can't seem to get the string formatting to work. Here's what I would like to be able to do: Suppose we have a vector: x - c(2,4,5) I would like to be able to create a set of vectors whose names are associated with the values in x - e.g. x2 - 0 x4 - 0 x5 - 0 I have tried with a for loop and eval and sprintf, paste, etc. but end up with the following error: Error in sprintf(%s%i, x, 1) - 0 : target of assignment expands to non-language object How can I assign a string formatted name to a vector? Any help appreciated, Zak __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Forcing the extrapolation of loess through the origin
On Tue, Apr 14, 2009 at 1:08 PM, jimm-pa...@gmx.de wrote: I'm fitting a line to my dataset. Later I want to predict missing values that exceed the [min,max] interval of my empirical data, therefore I choose surface=direct for extrapolation. l1-loess(y1~x1,span=0.1,data.frame(x=x1,y=y1),control=loess.control(surface=direct)) In my application it is highly important that the fitted line intercepts at the point of origin. Is it possible to do this in R? Well, you could always add lots of artificial data points x=0, y=0 ..., like this: l1-loess(y1~x1,span=0.1,data.frame(x=c(rep(0,100),x1),y=c(rep(0,100),y1)),control=loess.control(surface=direct)) which will eventually drive f(0) to near 0, but surely that will create fitting artifacts. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Physical Units in Calculations
On Sun, Apr 12, 2009 at 11:01 PM, bill.venab...@csiro.au wrote: It is, however, an interesting problem and there are the tools there to handle it. Basically you need to create a class for each kind of measure you want to handle (length, area, volume, weight, and so on) and then overload the arithmetic operators so that they can handle arguments of the appropriate class. I'd think it would be far simpler and cleaner to have a single dimensioned-units class with a slot for magnitude and one for the power of each dimension -- M, L, T are uncontroversial, pick your system for electromagnetism and thermodynamics Once you have that, you have not just mass, length, and time, but also area, volume, density, acceleration, viscosity, etc. etc. It would of course be nice if the existing difftime class could be fit into this, as it is currently pretty much a second-class citizen. For example, c of two time differences is currently a numeric vector, losing its units (hours, days, etc.) completely. One of the difficulties of adding units would be, I suspect, making them work nicely with the rest of the system. For example, although sum is defined abstractly in terms of '+', as far as I can tell sum.units would have to be overloaded explicitly. Similarly for mean, cumsum, rle, var, %*%, etc. etc. -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding the 5th percentile
quantile( dsamp100, 0.05 ) On Mon, Apr 13, 2009 at 10:41 AM, Henry Cooper henry.1...@hotmail.co.uk wrote: dsamp100-coef(100,39.83,5739,2869.1,49.44) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Concatenation, was Re: Physical Units in Calculations
On Mon, Apr 13, 2009 at 5:15 AM, Peter Dalgaard p.dalga...@biostat.ku.dk wrote: Stavros Macrakis wrote: ...c of two time differences is currently a numeric vector, losing its units (hours, days, etc.) completely. That's actually a generic feature/issue of c(). ... There is some potential for redesigning this, using a concat() generic which should do the Right Thing for all classed vector-like objects. (There is such a function in Splus, but I don't their data frame code is using it.) That would be a very good thing. The current design is very confusing and difficult to learn for new users, especially for factors. I would be very happy to have a 'logical' concatenation as well as a 'physical' one. For instance, I'd expect the levels of factors to be merged: concat(factor(1:3),factor(3:4)) should be factor(c(1,2,3,3,4)), not c(1,2,3,1,2). -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using trace
I would like to trace functions, displaying their arguments and return value, but I haven't been able to figure out how to do this with the 'trace' function. After some thrashing, I got as far as this: fact - function(x) if(x1) 1 else x*fact(x-1) tracefnc - function() dput(as.list(parent.frame()), # parent.frame() holds arg list control=NULL) trace(fact,tracer=tracefnc,print=FALSE) but I couldn't figure out how to access the return value of the function in the 'exit' parameter. The above also doesn't work for ... arguments. (More subtly, it forces the evaluation of promises even if they are otherwise unused -- but that is, I suppose, a weird and obscure case.) Surely someone has solved this already? What I'm looking for is something very simple, along the lines of old-fashioned Lisp trace: defun fact (i) (if ( i 1) 1 (* i (fact (+ i -1) FACT (trace fact) (FACT) (fact 3) 1 (FACT 3) 2 (FACT 2) 3 (FACT 1) 4 (FACT 0) 4 (FACT 1) 3 (FACT 1) 2 (FACT 2) 1 (FACT 6) 6 Can someone help? Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.