from:"Stavros Macrakis"

[R] xmlToDataFrame very slow

2013-07-30 Thread Stavros Macrakis

I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame
(package XML).

I have successfully read it into R by splitting the file 10 ways then
running xmlToDataFrame on each part, then rbind.fill (package plyr) on the
result. This takes about 530 s total, and results in a data.frame with 71k
rows and object.size of 21MB.

But trying to run xmlToDataFrame on the whole file takes forever ( 1 s
so far). xmlParse of this file takes only 0.8 s.

I tried running xmlToDataFrame on the first 10% of the file, then the first
10% repeated twice, then three times (with the outer tags adjusted of
course). Timings:

1 copy: 111 s = 111 per copy
2 copy: 311 s = 155
3 copy: 626 s = 209

The runtime is superlinear.  What is going on here? Is there a better
approach?

Thanks,

  -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] execute array of functions

2012-02-14 Thread Stavros Macrakis

That won't work because R has special rules for evaluating things in the
function position.  Examples:

*OK*

min(1:2)
min(1:2)
f-min; f(1:2)
do.call(min,list(1:2))
do.call(min,list(1:2))  # do.call converts string-function


*Not OK*

(min)(1:2)  # string in function position is not converted
f-min; f(1:2)   # ditto
f- c(min,max);  f[1](1:2)  # ditto


What you need to do is make 'f' a list of *function values, *not a vector
of strings:

f- c(min,max)


and then select the element of f with [[ ]] (select one element), not [ ]
(select sublist):

f[[1]](1:2)


Thus your example becomes

type- c(min,max)
n   - 1:10
for (a in 1:2){
print(type[[a]](n)) }

Another (uglier) approach is with do.call:

type- c(min,max)
n   - 1:10
for (a in 1:2){
print(do.call(type[a],list(n))) }


Does that help?

 -s

On Tue, Feb 14, 2012 at 14:02, Muhammad Rahiz
muhammad.ra...@ouce.ox.ac.ukwrote:

 Hi all,

 I'm trying to get the min and max of a sequence of number using a loop
 like the folllowing. Can anyone point me to why it doesn't work.

 Thanks.

 type- c(min,max)
 n   - 1:10
 for (a in 1:2){
 print(type[a](n)) }


 --
 Muhammad

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R development master class: NYC, Dec 12-13

2011-11-15 Thread Stavros Macrakis

 Last time, I was told that I couldn't list my R package and associated
papers as a research activity with
 substantial impact because it was outside my official scope of work.
(Even though I wrote it so I could
 *do* my work.)

That seems wrong.  My impression is that method papers were frequent
citation
classics http://garfield.library.upenn.edu/classics.html.  Why should a
software method paper be treated worse than a (e.g.) chemical method paper?

   -s

On Sun, Nov 13, 2011 at 15:58, Sarah Goslee sarah.gos...@gmail.com wrote:

 On Sun, Nov 13, 2011 at 2:55 PM, Steve Lianoglou
 mailinglist.honey...@gmail.com wrote:

  Some of the money I earn from these courses goes to pay for my summer
  salary and supports student research. It also gives me confidence that
  if I don't get tenure because I've been writing R packages instead of
  papers, I can keep doing the work I love.
 
  If that actually happens, that would be an amazing/colossal (not in a
  good way) testament to how well the rating system works in academia.

 I'm not in academia, but government research. I do go through a review
 very similar to the tenure process. Last time, I was told that I couldn't
 list
 my R package and associated papers as a research activity with substantial
 impact because it was outside my official scope of work. (Even though I
 wrote it so I could *do* my work.) I have no trouble seeing academic
 administrators do the same thing.

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Kleinberg's burst detection algorithm

2011-10-21 Thread Stavros Macrakis

Has anyone here implemented Jon Kleinberg's burst detection algorithm
(Bursty and Hierarchical Structure in Streams
http://www.cs.cornell.edu/home/kleinber/bhs.pdf)?

I'd rather not reimplement if there's already running code available

Thanks,

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading name-value data

2011-07-29 Thread Stavros Macrakis

Perfect!  Thanks!

By the way, I see that, unlike base rbind, it does not work for vectors and
lists:

rbind(c(a=1),c(b=2)) = matrix(1:2,2,1,dimnames=list(NULL,a))
== as.matrix(data.frame(a=1:2))

but

 rbind.fill(c(a=1),c(b=2)) = NULL

Shouldn't it give something like

 matrix(c(1,NA,NA,2),2,2,dimnames=list(NULL,c(a,b)))
or
 data.frame(a=c(1,NA),b=c(NA,2))

If, on the other hand, it insists on data.frames as input, it should err out
if give non-data-frames.

-s


On Thu, Jul 28, 2011 at 19:30, Hadley Wickham had...@rice.edu wrote:

 Use plyr::rbind.fill?   That does match up columns by name.
 Hadley

 On Thu, Jul 28, 2011 at 5:23 PM, Stavros Macrakis macra...@alum.mit.edu
 wrote:
  I have a file of data where each line is a series of name-value pairs,
 but
  where the names are not necessarily the same from line to line, e.g.
 a=1,b=2,d=5
 b=4,c=3,e=3
 a=5,d=1
  I would like to create a data frame which lines up the data in the
  corresponding columns.  In this case, this would be
 data.frame( a = (1, NA, 4), b = (2, 4, NA), c = (NA, 3, NA), d = (5,
 NA,
  1), e = (NA, 3, 1) )
  One way I can think of doing this is to read in the data as one 'long'
 data
  frame per line with a unique ID, e.g. line one becomes
   cbind(id=1,data.frame(variable=c('a','b','d'),value=c(1,2,5)))
  then rbind all the lines and use the reshape package function 'cast'.
  Is there a more straightforward way?  (I'd have thought rbind would line
 up
  columns by name, but it doesn't.)
  -s
 
  --
  You received this message because you are subscribed to the Google Groups
  manipulatr group.
  To post to this group, send email to manipul...@googlegroups.com.
  To unsubscribe from this group, send email to
  manipulatr+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/manipulatr?hl=en.
 



 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading name-value data

2011-07-28 Thread Stavros Macrakis

I have a file of data where each line is a series of name-value pairs, but
where the names are not necessarily the same from line to line, e.g.

   a=1,b=2,d=5
   b=4,c=3,e=3
   a=5,d=1

I would like to create a data frame which lines up the data in the
corresponding columns.  In this case, this would be

   data.frame( a = (1, NA, 4), b = (2, 4, NA), c = (NA, 3, NA), d = (5, NA,
1), e = (NA, 3, 1) )

One way I can think of doing this is to read in the data as one 'long' data
frame per line with a unique ID, e.g. line one becomes

 cbind(id=1,data.frame(variable=c('a','b','d'),value=c(1,2,5)))

then rbind all the lines and use the reshape package function 'cast'.

Is there a more straightforward way?  (I'd have thought rbind would line up
columns by name, but it doesn't.)

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Composing two n-dimensional arrays into one n+1-dimensional array

2011-06-13 Thread Stavros Macrakis

If I have 2 n-dimensional arrays, how do I compose them into a n+1-dimension
array?

Is there a standard R function that's something like the following, but that
gives clean errors, handles all the edge cases, etc.

abind - function(a,b)  structure( c(a,b), dim = c(dim(a), 2) )

m1 - array(1:6,c(2,3))
m2 - m1 + 10
abind(m1,m2)

==

, , 1

 [,1] [,2] [,3]
[1,]135
[2,]246

, , 2

 [,1] [,2] [,3]
[1,]   11   13   15
[2,]   12   14   16

Thanks,

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Approximate name matching

2011-05-09 Thread Stavros Macrakis

Is there R software available for doing approximate matching of personal
names?

I have data about the same people produced by different organizations and
the only matching key I have is the name. I know that commercial solutions
exist, and I know I code code this from scratch, but I'd prefer to build on
some existing free solution if it exists.

Unfortunately, the names are not standardized, and there is also a certain
level of error:

   Danny Williams (nickname)
   Dan Williams (nickname)
   Daniel Williams (nickname)
   Dan William (spelling error)
   D. Williams (initials)
   Daniel Danny Williams (formal + nickname)
   Dan P. Williams (includes middle initial)
   Williams, Daniel (different convention)
   William Daniel (wrong order or missing comma + misspelling)

Is there any R software available to find likely matches, ideally with some
estimate of accuracy of match?  Levenshtein distance as implemented in agrep
is a useful solution for some of these cases; I was wondering if there is
something that covers more cases.

For this particular application, I am not concerned with issues such as
variant latinizations/transliterations (e.g. Tsung-Dao Lee ~ T.D. Lee ~ Li
Zhengdao; Ghaddafi ~ Qaddhaffi), but of course if someone handles that as
well

Thanks,

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] General binary search?

2011-04-06 Thread Stavros Macrakis

Martin,

Thank you for your exploration of implementations of bsearch!

In my application, length(val) is very small (typically 2), so vectorization
over val doesn't help -- though vectorization over tab could work by doing
n-ary instead of 2-ary splits with something like

 match(TRUE, val  tab[L+(H-L)*(1:9/10)])

and (when H-L becomes small)

 match(TRUE,val  tab[L:H])

Then there are approaches like tries... but though I love this sort of
programming, I'm trying to reuse as much well-tested, well-tuned library
code as I can.

Thanks again for your ideas!

 -s

On Wed, Apr 6, 2011 at 12:59, Martin Morgan mtmor...@fhcrc.org wrote:

 On 04/04/2011 01:50 PM, William Dunlap wrote:

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Stavros Macrakis
 Sent: Monday, April 04, 2011 1:15 PM
 To: r-help
 Subject: [R] General binary search?

 Is there a generic binary search routine in a standard library which

a) works for character vectors
b) runs in O(log(N)) time?

 I'm aware of findInterval(x,vec), but it is restricted to
 numeric vectors.


 xtfrm(x) will convert a character (or other) vector to
 a numeric vector with the same ordering.  findInterval
 can work on that.  E.g.,
  f0- function(x, vec) {
tmp- xtfrm(c(x, vec))
findInterval(tmp[seq_along(x)], tmp[-seq_along(x)])
  }
  f0(c(Baby, Aunt, Dog), LETTERS)
[1] 2 1 4
 I've never looked at its speed.


 For a little progress (though no 'generic binary searchin a standard
 library'), here's the 'one-liner'

 bsearch1 -
function(val, tab, L=1L, H=length(tab))
 {
while (H = L) {
M - L + (H - L) %/% 2L
if (tab[M]  val) H - M - 1L
else if (tab[M]  val) L - M + 1L
else return(M)
}
return(L - 1L)
 }

 It seems like a good candidate for the new (R-2.13) 'compiler' package, so

 library(compiler)
 bsearch2 - cmpfun(bsearch1)

 And Bill's suggestion

 bsearch3 - function(val, tab) {
tmp - xtfrm(c(val, tab))
findInterval(tmp[seq_along(val)], tmp[-seq_along(val)])
 }

 which will work best when 'val' is a vector to be looked up.

 A quick look at data.table:::sortedmatch seemed to return matches, whereas
 Stavros is looking for lower bounds.

 It seems that one could shift the weight more to C code by 'vectorizing'
 the one-liner, first as

 bsearch5 -
function(val, tab, L=1L, H=length(tab))
 {
b - cbind(L=rep(L, length(val)), H=rep(H, length(val)))
i0 - seq_along(val)
repeat {
M - b[i0,L] + (b[i0,H] - b[i0,L]) %/% 2L
i - tab[M]  val[i0]
b[i0 + i * length(val)] -
ifelse(i, M - 1L, ifelse(tab[M]  val[i0], M + 1L, M))
i0 - which(b[i0, H] = b[i0, L])
if (!length(i0)) break;
}
b[,L] - 1L
 }

 and then a little more thoughtfully (though more room for improvement) as

 bsearch7 -
function(val, tab, L=1L, H=length(tab))
 {
b - cbind(L=rep(L, length(val)), H=rep(H, length(val)))
i0 - seq_along(val)
repeat {
updt - M - b[i0,L] + (b[i0,H] - b[i0,L]) %/% 2L
tabM - tab[M]
val0 - val[i0]
i - tabM  val0
updt[i] - M[i] + 1L
i - tabM  val0
updt[i] - M[i] - 1L
b[i0 + i * length(val)] - updt
i0 - which(b[i0, H] = b[i0, L])
if (!length(i0)) break;
}
b[,L] - 1L
 }

 none of bsearch 3, 5, or 7 is likely to benefit substantially from
 compilation.

 Here's a little test data set converting numeric to character as an easy
 cheat.

 set.seed(123L)
 x - sort(as.character(rnorm(1e6)))
 y - as.character(rnorm(1e4))

 There seems to be some significant initial overhead, so we warm things up
 (and also introduce the paradigm for multiple look-ups in bsearch 1, 2)

 warmup - function(y, x) {
lapply(y, bsearch1, x)
lapply(y, bsearch2, x)
bsearch3(y, x)
bsearch5(y, x)
bsearch7(y, x)
 }
 replicate(3, warmup(y, x))

 and then time

  system.time(res1 - unlist(lapply(y, bsearch1, x), use.names=FALSE))
   user  system elapsed
  2.692   0.000   2.696
  system.time(res2 - unlist(lapply(y, bsearch2, x), use.names=FALSE))
   user  system elapsed
  1.379   0.000   1.380
  identical(res1, res2)
 [1] TRUE
  system.time(res3 - bsearch3(y, x)); identical(res1, res3)
   user  system elapsed
  8.339   0.001   8.350
 [1] TRUE
  system.time(res5 - bsearch5(y, x)); identical(res1, res5)
   user  system elapsed
  0.700   0.000   0.702
 [1] TRUE
  system.time(res7 - bsearch7(y, x)); identical(res1, res7)
   user  system elapsed
  0.222   0.000   0.222
 [1] TRUE

 Martin



 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 I'm also aware of various hashing solutions (e.g.
 new.env(hash=TRUE) and
 fastmatch), but I need the greatest-lower-bound match in my
 application.

 findInterval is also slow for large N=length(vec) because of the O(N)
 checking it does, as Duncan Murdoch has pointed
 outhttps://stat.ethz.ch/pipermail/r

[R] General binary search?

2011-04-04 Thread Stavros Macrakis

Is there a generic binary search routine in a standard library which

   a) works for character vectors
   b) runs in O(log(N)) time?

I'm aware of findInterval(x,vec), but it is restricted to numeric vectors.

I'm also aware of various hashing solutions (e.g. new.env(hash=TRUE) and
fastmatch), but I need the greatest-lower-bound match in my application.

findInterval is also slow for large N=length(vec) because of the O(N)
checking it does, as Duncan Murdoch has pointed
outhttps://stat.ethz.ch/pipermail/r-help/2008-September/174584.html:
though
its documentation says it runs in O(n * log(N)), it actually runs in O(n *
log(N) + N), which is quite noticeable for largish N.  But that is easy
enough to work around by writing a variant of findInterval which calls
find_interv_vec without checking.

-s

PS Yes, binary search is a one-liner in R, but I always prefer to use
standard, fast native libraries when possible

binarysearch - function(val,tab,L,H) {while (H=L) { M=L+(H-L) %/% 2; if
(tab[M]val) H-M-1 else if (tab[M]val) L-M+1 else return(M)};
return(L-1)}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Stricter read.table?

2010-12-10 Thread Stavros Macrakis

read.table gives idiosyncratic results when the input is formatted
strangely, for example:

read.table(textConnection(a'b\nc'd\n),header=FALSE,fill=TRUE,sep=,quote=')
  = c'd a'b c'd

read.table(textConnection(a'b\nc'd\nf'\n'\n),header=FALSE,fill=TRUE,sep=,quote=')
  = f'  \na b   c'd f'  \n

Though read.table doesn't specify the syntax of its input precisely, these
results don't seem particularly useful or consistent.

Is there a stricter version of read.table (perhaps in a package) that gives
errors or warnings if it finds quotation marks in the middle of fields or
encounters other such peculiar situations?

Thanks,

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Quantile with discrete types

2010-12-10 Thread Stavros Macrakis

I don't understand why 'quantile' works in this case:

 tt - rep(c('a','b'),c(10,3))
 sapply(0:6/6,function(q) quantile(tt,probs=q,type=1))
   0% 16.7% 33.3%   50% 66.7% 83.3%  100%
  a   a   a   a   a   b   b

and also

 quantile(tt,0:5/5,type=1)
  0%  20%  40%  60%  80% 100%
 a  a  a  a  b  b

but gives an error in this, which I would have thought equivalent to the
first case above:

 quantile(tt,0:6/6,type=1)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
  argument is not a numeric vector

I could of course write something like
sort(tt)[seq(1,length(tt),length.out=7)] -- but I'm wondering why quantile
fails in this case.

Thanks,

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] solving cubic/quartic equations non-iteratively

2010-01-22 Thread Stavros Macrakis

On Tue, Jan 5, 2010 at 5:25 PM, Carl Witthoft c...@witthoft.com wrote:

 quote:
  There are certainly formulas for solving polynomials numerically up to
 4th degree non-iteratively, but you will almost certainly get better results
 using iterative methods.



 I must be missing something here.  Why not use the analytic formulas for
 polynomials below 5th degree?  Once you do so, your answer is as precise as
 the level of precision you enter for the coefficients.


Why do you believe that? Are you assuming you can perform *exact*
arithmetic?  Did you read the references I gave?

* George Forsythe, How do you solve a quadratic equation?
* Yves Nievergelt, How (Not) to Solve Quadratic Equations

They show that that isn't even true for quadratic equations without a lot of
care.

Let's try a cubic:

 p =  100*x^3-998000*x^2-1001999*x+99

That factors exactly over the integers to:

(x-1001)*(x-1000)*(x-999)

but plugging the floating-point coefficients (which are exactly
representable as floats) into (one version of) the cubic formula (using
Maxima), I get the roots

 x = 966.1329834413779+58.65086897690403i
 x = 966.1329834413779-58.65086897690403i
 x = 1067.734033117244

On the other hand, using an interative approach, I get:

x = 999.000278754
x = 999.926817675
x = 1001.07290357

Which looks better to you?

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] solving cubic/quartic equations non-iteratively

2010-01-05 Thread Stavros Macrakis

There are certainly formulas for solving polynomials numerically up to 4th
degree non-iteratively, but you will almost certainly get better results
using iterative methods.

Even the much more trivial formula for the 2nd degree (quadratic) is tricky
to implement correctly and accurately, see:

* George Forsythe, How do you solve a quadratic equation?
* Yves Nievergelt, How (Not) to Solve Quadratic Equations

Hope this helps.

   -s

On Tue, Jan 5, 2010 at 10:11 AM, Mads Jeppe Tarp-Johansen 
s02m...@math.ku.dk wrote:

 To R-helpers,

 R offers the polyroot function for solving mentioned equations iteratively.

 However, Dr Math and Mathworld (and other places) show in detail how to
 solve mentioned equations non-iteratively.

 Do implementations for R that are non-iterative and that solve mentioned
 equations exists?

 Regards, Mads Jeppe

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with expand.grid

2009-12-22 Thread Stavros Macrakis

Unfortunately, expand.grid doesn't validate the class of its argument, so it
is reporting an internal error rather than something more intelligible.

On Tue, Dec 22, 2009 at 11:19 AM, Keith Jewell k.jew...@campden.co.ukwrote:

 Just confirming it isn't the bug fixed in 2.11.0dev, and giving an even
 simpler example:

 R version 2.11.0 Under development (unstable) (2009-12-20 r50794)

  expand.grid(data.frame(y=1:10, t=1:10))
 Error in `[[-.data.frame`(`*tmp*`, i, value = c(1L, 2L, 3L, 4L, 5L, 6L,  :
  replacement has 100 rows, data has 10

 Keith Jewell k.jew...@campden.co.uk wrote in message
 news:hgqqja$rk...@ger.gmane.org...
  Hi All,
 
  This example code
  
  dDF - structure(list(y = c(4.75587, 4.8451, 5.04139, 4.85733, 5.20412,
  5.92428, 5.69897, 4.78958, 4, 4), t = c(0, 48, 144, 192, 240,
  312, 360, 0, 48, 144), Batch = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1
  ), T = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2), pH = c(4.6, 4.6, 4.6,
  4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6), S = c(0, 0, 0, 0, 0, 0, 0,
  0, 0, 0), N = c(0, 0, 0, 0, 0, 0, 0, 80, 80, 80)), .Names = c(y,
  t, Batch, T, pH, S, N), row.names = c(NA, 10L), class =
  data.frame)
  str(dDF)
  expand.grid(dDF)
  
  'hangs' for a while and then gives an error
 
  Error in `[[-.data.frame`(`*tmp*`, i, value = c(4.75587, 4.8451,
 5.04139,
  :
   replacement has 1000 rows, data has 10
 
  In NEWS.R-2.11.0dev I read:
 o The new (in 2.9.0) 'stringsAsFactors' argument to expand.grid()
  was not working: it now does work but has default TRUE for
  backwards compatibility.
 
  but I don't think that's relevant, I have no factors.
 
  I'm probably being silly. Can anyone point out where?
 
  Best...
 
  Keith Jewell
 
  --please do not edit the information below--
 
  Version:
  platform = i386-pc-mingw32
  arch = i386
  os = mingw32
  system = i386, mingw32
  status = Patched
  major = 2
  minor = 10.1
  year = 2009
  month = 12
  day = 21
  svn rev = 50796
  language = R
  version.string = R version 2.10.1 Patched (2009-12-21 r50796)
 
  Windows Server 2003 x64 (build 3790) Service Pack 2
 
  Locale:
  LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
  Kingdom.1252;LC_MONETARY=English_United
  Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
 
  Search Path:
  .GlobalEnv, package:stats, package:graphics, package:grDevices,
  package:utils, package:datasets, package:methods, Autoloads, package:base
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Method dispatch for function

2009-11-18 Thread Stavros Macrakis

How can I determine what S3 method will be called for a particular
first-argument class?

I was imagining something like functionDispatch('str','numeric') =
utils:::str.default , but I can't find anything like this.

For that matter, I was wondering if anyone had written a version of
`methods` which gave their fully qualified names if they were not visible,
e.g.

methods('str') =
utils:::str.data.frameutils:::str.default
stats:::str.dendrogramstats:::str.logLikutils:::str.POSIXt

or

methods('str') =
 $utils
   str.data.frame str.defaultstr.POSIXt
 $stats
   str.dendrogram str.logLik

Thank you,

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Suppressing final spaces in data.frame printouts

2009-11-11 Thread Stavros Macrakis

When printing data.frames, R aligns columns by padding with spaces.
For example,

print(data.frame(x=c('a','bb','ccc')),right=FALSE)
  x
1 a  |-- vertical bar shows end of line
2 bb |-- vertical bar shows end of line
3 ccc|-- vertical bar shows end of line

Is there some way to suppress the padding for the final column? I
often have data frames which contain a handful of long strings in the
final column which, when printed out, cause wraparound on all the
rows, even those not containing long strings, something like this:

print(data.frame(q=1:3,x=c('a','bb','this is a very long string')),right=FALSE)
  q x   |
  |
1 1 a   |
  |
2 2 bb  |
  |
3 3 this is a very l|
ong string|

where I'd rather have

print(data.frame(q=1:3,x=c('a','bb','this is a very long string')),right=FALSE)
  q x|
1 1 a|
2 2 bb|
3 3 this is a very l|
ong string|

I could of course write my own print function for this, but was
wondering if there was a standard way of doing it.  If not in R,
perhaps there is some way to have ESS delete the final spaces?

Thanks,

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Suppressing final spaces in data.frame printouts

2009-11-11 Thread Stavros Macrakis

Thanks for the suggestion. I'mm familiar with the truncate-lines variable,
but that's not quite what I was looking for.  I don't want the padding
spaces displayed, but I do want to see long strings at the end of the line.

Thanks anyway,

   -s

On Wed, Nov 11, 2009 at 5:40 PM, Richard M. Heiberger r...@temple.eduwrote:

 Stavros Macrakis wrote:

 I could of course write my own print function for this, but was
 wondering if there was a standard way of doing it.  If not in R,
 perhaps there is some way to have ESS delete the final spaces?


 ESS, or more precisely emacs, can handle that.  Use the M-x
 toggle-truncate-lines command:
Toggle whether to fold or truncate long lines for the current buffer.

 Rich





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Suppressing final spaces in data.frame printouts

2009-11-11 Thread Stavros Macrakis

I'm adding ess-help to the addressees because apparently this needs to be
solved in ESS, not in R.

Thanks!  So I guess you're suggesting something like

(add-hook 'comint-output-filter-functions
  (lambda (s)
(save-restriction
  (narrow-to-region comint-last-output-start
(+ -1 (process-mark (get-buffer-process (current-buffer)
;; stop one char before the end of the output region to
avoid
;; deleting the space after the R prompt
  (delete-trailing-whitespace

I have almost succeeded in making this work right. But if it is called for
an output chunk which isn't the last one (with the prompt), it can suppress
spaces in the middle of the line.  Test with

  for (i in 1:1000) print(
  )

for example. Any ideas? This is the sort of niggling little edge-case
complication which made me hope that someone had already solved the problem
in R or ESS

-s

On Wed, Nov 11, 2009 at 8:43 PM, RICHARD M. HEIBERGER r...@temple.eduwrote:

 On Wed, Nov 11, 2009 at 8:12 PM, Stavros Macrakis macra...@alum.mit.edu
 wrote:
  Thanks for the suggestion. I'mm familiar with the truncate-lines
 variable,
  but that's not quite what I was looking for.  I don't want the padding
  spaces displayed, but I do want to see long strings at the end of the
 line.

 Then we can use a different emacs trick.

 delete-trailing-whitespaceM-x ... RET
  Command: Delete all the trailing whitespace across the current buffer.
 ess-nuke-trailing-whitespace  M-x ... RET
  Command: Nuke all trailing whitespace in the buffer.
 whitespace-toggle-trailing-check M-x ... RET
  Command: Toggle the check for trailing space in the local buffer.

 Rich


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Book on R programming

2009-08-31 Thread Stavros Macrakis

I recommend you skim the Chambers book at Google Books or Amazon before
buying it as a guide to programming in R.

It is a fascinating book, but is more a discursive reflection on the history
and philosophy of R than a practical guide to programming in R.  It
certainly explains the rationale for many of the design decisions in R,
which is great for those of us who are interested in the history of
programming languages, and even the practical consequences of those design
decisions, but I'm not sure it's useful as a handbook for programming in R.

-s

On Mon, Aug 31, 2009 at 8:33 AM, [Ricardo Rodriguez] Your XEN ICT Team 
webmas...@xen.net wrote:

 Hi,

 ANJAN PURKAYASTHA wrote:

 Most books on R I come across describe running statistical procedures in
 R.
 Any suggestions on a good book that teaches *programming* in R?
 Thanks,
 Anjan


 This is being really useful for me...

 John M. Chambers (2008) Software for Data Analysis. Programming with R.
 Springer.

 http://tinyurl.com/lg7g8n


 HTH

 --
 Ricardo Rodríguez
 Your XEN ICT Team


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a construct for conditional comment?

2009-08-21 Thread Stavros Macrakis

On Thu, Aug 20, 2009 at 1:27 PM, David Winsemiusdwinsem...@comcast.net wrote:
...
 But an extremely simple modification succeeds:

  if ( 0 ) {
  commented with zero
  } else {
  commented with one
  }

 Returns:
 [1] \ncommented with one\n

Yes, but of course that executes neither one nor the other.  This works, though:

eval(parse(textConnection(if (FALSE) 
  syntactically  incorrect ' code must not use double-quotes, though
 else 
 print('this is a test')
)))

though it is horribly ugly, so I second the suggestion to do this in
your text editor if you must do it at all.

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Keeping track of memory usage

2009-08-20 Thread Stavros Macrakis

How can I determine how much memory a given piece of my code is
allocating (directly or indirectly)? -- essentially, the space
analogue of system.time, something like this:

  system.space( x - rnorm(1) )
  1 Vcells

  system.space( for (i in 1:1000) x - rnorm(1) )
  1000 Vcells

I'm not looking for anything as fine-grained as Rprofmem or tracemem,
just the overall allocations.  I'm also not looking for the amount of
*live* memory (that is, net of garbage collection) as reported by
memory.profile or gc.

Thanks,

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Object equality for S4 objects

2009-07-30 Thread Stavros Macrakis

On Thu, Jul 30, 2009 at 12:01 PM, Martin Morganmtmor...@fhcrc.org wrote:
 S4 objects do not have the semantics of environments, but of lists (or of 
 most other R objects), so it is as meaningful to ask why identical(s1, s2) 
 returns TRUE as it is to ask why identical(list(x=1), list(x=1)) returns TRUE.

Thanks for the clarification.

For some reason, I thought that S4 objects (unlike S3 objects) were
objects in the conventional computer science sense, that is, mutable.
Compare proto objects, which *are* objects in the usual sense:

 proto1 - proto(expr= {x=23})
 proto2 - proto1
 proto1$x - 45
 proto2$x
[1] 45# proto1 and proto2 are the same object

 setClass(test,representation(a=logical))
[1] test
 s41 - new(test)
 s42 - s41
 s...@a - TRUE
 s...@a  # s41 and s42 are different objects
logical(0)

It would thus perhaps be clearer to speak of S4 values rather than
S4 objects.

-s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Object equality for S4 objects

2009-07-30 Thread Stavros Macrakis

On Thu, Jul 30, 2009 at 4:03 PM, Martin Morganmtmor...@fhcrc.org wrote:
 S4 objects are mutable in the sense that one can write replacement methods 
 for them

Understood, but I don't think that's the usual meaning of 'mutable'.

-s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Object equality for S4 objects

2009-07-29 Thread Stavros Macrakis

To test two environments for object equality (Lisp EQ), I can use 'identity':

 e1 - environment(local(function()x))
 e2 - environment(local(function()x))
 identical(e1,e2)  # compares object identity
[1] FALSE
 identical(as.list(e1),as.list(e2))# compares values as name-value mapping
[1] TRUE# (is there a better way to do this?)

What is the corresponding function for testing whether two S4 objects
are the same object?  It appears that 'identity' for S4 objects
compares the *value*, not the *object identity*:

 setClass(simple,representation(a=logical))
[1] simple
 s1 - new(simple); s2 - new(simple)
 identical(s1,s1)
[1] TRUE   # not surprising
 identical(s1,s2)
[1] TRUE   # ? not comparing object identity
 s...@a - TRUE
 s...@a - TRUE
 identical(s1,s2)
[1] TRUE
 s...@a - TRUE
 s...@a - FALSE
 identical(s1,s2)
[1] FALSE

Thanks,

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dereferencing in R

2009-07-16 Thread Stavros Macrakis

What do you mean by 'passing an array reference' and 'dereferencing' and
what do you mean by an 'R script'?  What language(s) are you accustomed to?

If you mean 'passing an array value' to an 'R function', you just use the
argument name.  Since R uses call-by-value (modulo the substitute mechanism,
which as a beginner you should avoid), modifying the array within your
function does not modify the global value.  Normally you'd return the value,
e.g.

 ar - array( 1:12,c(3,4))
 ar
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

 sum12 - function(a) { a[1,] + a[2,] }
 sum12(ar)
[1]  3  9 15 21 returned value

If you want to *modify* the array ar, you should do something like this:

 ar[1,] - sum12(ar)

 ar[1,] - sum12(ar)
 ar
 [,1] [,2] [,3] [,4]
[1,]39   15   21
[2,]258   11
[3,]369   12

Does this answer your question?

   -s
On Thu, Jul 16, 2009 at 9:04 AM, xin liu liux...@yahoo.com wrote:


 Hi, All,

 I passed an array reference to the R script and do not know how to do
 dereferencing in the R script. Anybody has some suggestion?

 Many thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple cat statement - output truncated

2009-07-16 Thread Stavros Macrakis

Kevin,

The habitués of this mailing list get irritated when users mail in
problem reports which don't include enough information to reproduce
the problem, as requested in the standard footer of r-help mail
(PLEASE ... provide commented, minimal, self-contained, reproducible
code.) This irritation is sometimes expressed aggressively and
sometimes humorously. Be thankful that you drew humorously.

So... please provide minimal, self-contained code that allows us to
reproduce your problem.  What is meant by self-contained?  It is
code that if you type it in to a fresh R, elicits your problem.  This
includes setting any necessary variables to appropriate values etc.

-s

On Thu, Jul 16, 2009 at 10:21 AM, rkevinbur...@charter.net wrote:

 So then I am to assume that the output of 'cat' can be truncated by passing 
 it bad arrays. That is the only difference between the reproducible code 
 you show and mine. It is just a theory but say that the components array is 
 not dimmensioned for 4 elements. It seems a little strange if that is the 
 case that a reference error is not thrown and just the output of the cat call 
 is affected.

 Kevin

  Duncan Murdoch murd...@stats.uwo.ca wrote:
  On 7/15/2009 9:53 AM, rkevinbur...@charter.net wrote:
   I have a statement:
  
       cat(myforecast ETS(, paste(object$components[1], 
   object$components[2], object$components[3], object$components[4], sep = 
   ,), ) , n, \n)
  
   That generates:
  
   cast ETS( A,N,N,FALSE )  3
  
   Anyone guess as to why the first 5 letters are truncated/missing?
 
  You are probably being punished for posting non-reproducible code*.
 
  When I try a reproducible version of the line above, things look fine:
 
    cat(myforecast ETS(, paste(A,N,N,FALSE, sep = ,), ) , 3,
  \n)
  myforecast ETS( A,N,N,FALSE )  3
 
 
  Duncan Murdoch
 
  * R has a new predictive punishment module.  It punishes you for things
  it knows you will do later.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quoting expressions in a list

2009-07-16 Thread Stavros Macrakis

On Thu, Jul 16, 2009 at 4:44 PM, Erik Iversoneiver...@nmdp.org wrote:
 I have a list of logical expressions, and I would really like it if the 
 names of the components of the list were identical to the corresponding 
 logical expression.

 So, as an example:

 df.example - data.frame(a = 1:10, b = rnorm(10, 5))

 list.example - list(df.example$a  7,
                     df.example$b  4)

 Now what I'd really like is to name the components, and get the results of 
 the following line without having to specify the right-hand side individually 
 for each component:

 names(list.example) - c(df.example$a  7, df.example$b  4)

Something like this, perhaps?:

 listx - function(...) 
 structure(list(...),names=tail(as.list(substitute(c(...))),-1))
 list.example - list(df.example$a  7, df.example$b  4)
 listx(df.example$a  7, df.example$b  4)
$`df.example$a  7`
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE

$`df.example$b  4`
 [1]  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trig functions strange results

2009-07-14 Thread Stavros Macrakis

On Tue, Jul 14, 2009 at 1:45 PM, Nair, Murlidharan T mn...@iusb.edu wrote:

 I am trying to calculate coordinate transformations and in the process of
 debugging my code using debug I found the following

 Browse[1] direction[i]
 [1] -1.570796
 Browse[1] cos(direction[i])
 [1] 6.123032e-17
 Browse[1] cos(-1.570796)
 [1] 3.267949e-07
 ...
 I am not sure why I am getting one values when I am using a variable that
 stores the value and another when I use the value directly.  Am I missing
 something here?


Because you are not using the same value.  You say in a later message that
your variable direction[i] was set to (0-90)*pi/180.  So let's look at that:

 x - (0-90)*pi/180
 x - (-1.570796)
[1] -3.267949e-07

That is, (0-90)*pi/180 is not exactly equal to -1.570796, but rather to
-1.570796326794897:

 print(x,digits=16)
[1] -1.570796326794897

And that is equal to the calculated value.

Well, almost:

 print(x,digits=17)
[1] -1.570796326794897 the most digits R will print for a float
 -1.570796326794897 - x
[1] -4.440892e-16  a very tiny difference

See the R FAQ:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

By the way, there is a bug in the R print routine which does not print out
the full precision even if you specify it

 -1.5707963267948965 - xone more digit is actually needed
[1] 0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange strsplit gsub problem 0 is this a bug or a string length limitation?

2009-07-10 Thread Stavros Macrakis

On Fri, Jul 10, 2009 at 8:58 AM, Marc Schwartz marc_schwa...@me.com wrote:


 Review the Note in ?as.character:
 as.character truncates components of language objects to 500 characters
 (was about 70 before 1.3.1).


If this limitation is too hard to fix, shouldn't it at least give a warning
or an error?

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Numbering sequences of non-NAs in a vector

2009-07-07 Thread Stavros Macrakis

Here's one possibility:

vv - c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)
 (1+cumsum(diff(is.na(c(vv[1],vv)))==1)) * !is.na(vv)
 [1] 1 1 1 1 1 1 0 0 0 0 2 2 2 0 0 0 3 3 3 3



On Tue, Jul 7, 2009 at 5:08 PM, Krishna Tateneni taten...@gmail.com wrote:

 Greetings, I have a vector of the form:
 [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...]  That is, a
 combination
 of sequences of non-missing values and missing values, with each sequence
 possibly of a different length.

 I'd like to create another vector which will help me pick out the sequences
 of non-missing values.  For the example above, this would be:
 [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...].  The goal ultimately
 is
 to calculate means separately for each sequence.

 Your help is appreciated.  If I'm making this more complicated than
 necessary, I'd appreciate knowing that as well!

 Many thanks.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question in using e1071 svm routine

2009-07-07 Thread Stavros Macrakis

Isn't the initial value of the variable T equal to the constant TRUE?

So unless he's modified the value of T, shouldn't it work?

  -s

On 7/7/09, Max Kuhn mxk...@gmail.com wrote:
 Unlike Splus, R does not use T for TRUE.

 On Tue, Jul 7, 2009 at 6:05 PM, Michaelcomtech@gmail.com wrote:
 Hi all,

 I've got the following error message in using e1071 svm routine...

 Could anybody please help me?

 Thank you!

 -
 model - svm(y=factor(mytraindata[, 1]), x=mytraindata[, -1],
 probability=T)
 Error in if (any(co)) { : missing value where TRUE/FALSE needed
 In addition: Warning message:
 In FUN(newX[, i], ...) : NAs introduced by coercion

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Max

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automatically placing a legend in an area with the most white space...

2009-06-28 Thread Stavros Macrakis

install.packages('plotrix')

On Sun, Jun 28, 2009 at 3:51 PM, Jason Rupert jasonkrup...@yahoo.comwrote:

 ...
 Error in legend(emptyspace(rep(x_vals_1, 3), c(y1_vals, y2_vals, y3_vals)),
  :
  could not find function emptyspace

 I've searched via RSeek, but I have not been able to find anything on this
 function.

 Is emptyspace part of a package that I need to install?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to avoid ifelse statement converting factor to character

2009-06-26 Thread Stavros Macrakis

It gives me a headache, too!  I think you'll have to wait for a more
expert user than me to supply explanations of these behaviors and
their rationales.

 -s


On 6/26/09, Craig P. Pyrame crap...@gmail.com wrote:
 Stavros Macrakis wrote:
 On Thu, Jun 25, 2009 at 12:47 PM, Craig P. Pyramecrap...@gmail.com
 wrote:

 The man page Stavros quotes states that the class attribute of the result
 is
 taken from 'test', which clearly is not the case:


 Actually, the behavior is documented pretty clearly:

  The mode of the answer will be coerced from logical to
  accommodate first any values taken from 'yes' and then
  any values taken from 'no'.

 Whether this is a good design or not is another issue  Perhaps the
 justification is that it avoids evaluating the yes or no arguments (to
 determine their class) in cases where their value is not needed.


 Thank you for pointing me to this.  Now I get a headache from trying to
 figure out what does mode have to do with class - I thought that the
 class of the result should be that of test, and that the mode is
 something entirely different.  Why does coercing the mode also affect
 the class?  If the man page said The class attribute is taken from
 test, and it will be coerced ... or The mode of the result is taken
 from test, and it will be coreced ..., would this be wrong?  What is
 the class-mode mixture about?

 Why does this fail:

   r = as.raw(TRUE)
   ifelse(TRUE, r, r) = error

 This gives an error which I take for saying that raw cannot be coerced
 to logical, but yes it can:

   as.logical(r) = TRUE

 and raw can even be used as the condition vector in ifelse:

   ifelse(r, 1, 2) = 1

 Best regards,
 Craig


 Example:

  ifelse(c(T,F),1,a) = c(1,a)

 This has the same effect as

 res - c(T,F)
 res[1] - 1
 res[2] - a

 which is in fact pretty much the way it is implemented.


 And also, I find myself incapable of making sense of the may in the
 mode
 of the result may depend on the value of 'test' - may in what sense?


 See the examples at the end of ? ifelse

  -s




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I get just the two last tokens of each string in a vector?

2009-06-26 Thread Stavros Macrakis

One way is:

a - c( %L H*L L*H H%, %L H* H%,  %L L*H %,   %L L*H % )

 sub(^.*(^| )([^ ]+ [^ ]+$),\\2,a)
[1] L*H H% H* H%  L*H %  L*H %

Just be aware that this is not terribly efficient for very large strings.

-s

On Fri, Jun 26, 2009 at 7:21 AM, Fredrik Karlssondargo...@gmail.com wrote:
 Dear list,

 Sorry for asking this very silly question on the list, but I seem to
 have made my life complicated by going into string manipulation in
 vectors.
 What I need is to get the last part of a sting (the two last tokens,
 separated by a space), and of course, this should be done for all
 strings in a vector, creating a new vector of exual size.

 So,

 a - c( %L H*L L*H H%, %L H* H%,  %L L*H %,   %L L*H % )

 should be made into a vector

  c( L*H H%, H* H%,  L*H %,   L*H % )

 I have tried strsplit, but it seems to produce a structure I cannot
 get to work in this context. Any ideas on how to solve this?

 Thankful for all the help I can get.

 /Fredrik


 --
 Life is like a trumpet - if you don't put anything into it, you don't
 get anything out of it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to avoid ifelse statement converting factor to character

2009-06-25 Thread Stavros Macrakis

On Wed, Jun 24, 2009 at 9:04 PM, Rolf Turnerr.tur...@auckland.ac.nz wrote:
  Do not get your knickers in a twist.  R works simply and straightforwardly
  in simple straightforward situations.

Though I find R an incredibly useful tool, alas, it is simply not true
that R works simply and straightforwardly in simple straightforward
situations.  No doubt this is for understandable historical reasons
and backwards compatibility, but there it is.

Some examples of simple straightforward situations:

I think it is reasonable to expect that appending a list/vector of
class X to another list/vector of class X would result in a
list/vector of class X.  Similarly for the union of a list/vector of
class X. But in fact, not only is this not true for some of R's
important classes (factors, date/time, and delta-date/time), but the
result class is inconsistent by function and by class:

ff - factor(b)
c(ff,ff)= 1 1# class integer
union(ff,ff) = b# class character

tt - as.POSIXct('2009-01-01')
c(tt,tt) = 2009-01-01 EST 2009-01-01 EST # class POSIXt/POSIXct
union(tt,tt) 1230786000# class numeric

dt - tt - tt   # class difftime
c(dt,dt)  = 0 0  # class numeric
union(dt,dt) =  0  # class numeric

Similarly, the simplest, most straightforward situation I can think of
for ifelse is when the yes and no arguments are identical, and in that
case, I would (I think reasonably) expect that the result is of the
same class as the arguments, but it is not:

 ifelse(TRUE,factor(b),factor(b)) = 1 (integer)
 ifelse(TRUE,dd,dd) = 1230786000 (class numeric)

I hope you will agree that all of these are very simple and
straightforward situations, and that R is not working simply and
straightforwardly in them.

The less simple and less straightforward situations are of course more
complicated.

  In respect of the current discussion of ifelse() --- the original problem 
 arose
  because the values of ``yes'' and ``no'' were of different modes. It is 
 obvious
  that in such instances a decision will have to be made about the mode
  of the result.  The appropriateness of the designers' decision may be
 disputed,

Indeed.

 If you don't understand what's going on, then just stick to using
 ifelse() only when ``yes'' and ``no'' have the same mode.

That's not enough.  They have to be of a basic class as well.  See above.

 Bottom line:  R is easy to use at any level, but in order to use it a
 ``high'' level you need to understand the high level.  Don't attempt
 to run before you can crawl.

Bottom line: Some very basic things in R violate users' reasonable
expectations and moreover are internally inconsistent.  You have to be
careful about this whenever you work in R, even at an elementary
level.

   -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to avoid ifelse statement converting factor to character

2009-06-25 Thread Stavros Macrakis

Erratum:
     ifelse(TRUE,dd,dd) = 1230786000 (class numeric)
should be
 ifelse(TRUE,tt,tt) = 1230786000 (class numeric)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to avoid ifelse statement converting factor to character

2009-06-25 Thread Stavros Macrakis

On Thu, Jun 25, 2009 at 12:47 PM, Craig P. Pyramecrap...@gmail.com wrote:
 The man page Stavros quotes states that the class attribute of the result is
 taken from 'test', which clearly is not the case:

Actually, the behavior is documented pretty clearly:

 The mode of the answer will be coerced from logical to
 accommodate first any values taken from 'yes' and then
 any values taken from 'no'.

Whether this is a good design or not is another issue  Perhaps the
justification is that it avoids evaluating the yes or no arguments (to
determine their class) in cases where their value is not needed.

Example:

 ifelse(c(T,F),1,a) = c(1,a)

This has the same effect as

res - c(T,F)
res[1] - 1
res[2] - a

which is in fact pretty much the way it is implemented.

 And also, I find myself incapable of making sense of the may in the mode
 of the result may depend on the value of 'test' - may in what sense?

See the examples at the end of ? ifelse

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to avoid ifelse statement converting factor to character

2009-06-24 Thread Stavros Macrakis

On Wed, Jun 24, 2009 at 12:34 PM, Mark Namtb...@gmail.com wrote:
 The problem is that after running the ifelse statement, data$SOCIAL_STATUS
 is converted from a factor to a character.
 Is there some way I can avoid this conversion?

I'm afraid that ifelse has very bizarre semantics when the yes and no
arguments don't have the same, atomic vector, type.

The quick workaround for the bizarre semantics (though it can have a
significant efficiency cost) is this:

   unlist( ifelse ( condition, as.list( yes ), as.list( no ) ) )

(This isn't perfect, either, but...)

Take a look at the man page for details and the warning:

 The mode of the result may depend on the value of 'test', and the
 class attribute of the result is taken from 'test' and may be
 inappropriate for the values selected from 'yes' and 'no'.

Some consequences of the definition of ifelse are:

Even if the classes of the yes and no arguments are identical, the
result does not necessarily have that class:

ifelse(TRUE,as.raw(4),as.raw(5)) = error

ifelse(TRUE,factor('x'),factor('x')) = 1  (integer)

dates - as.POSIXct(c('1990-1-1','2000-1-1'))
ifelse(c(TRUE,FALSE),dates,dates)  =  63117 946702800  (double)

ifelse(c(TRUE,FALSE),factor(c('x','y')),factor(c('y','x'))) = 1 1

If they have different classes, things get stranger:

ifelse(c(TRUE,FALSE),c(a,b),factor(c(c,d)))  =  a 2

ifelse(c(TRUE,FALSE),list(1,2),as.raw(4))
[[1]]
[1] 1

[[2]]
[1] 04

Result is order-dependent:

ifelse(c(TRUE,FALSE),as.raw(4),list(1,2))
Error in ans[test  !nas] - rep(yes, length.out = length(ans))[test   :
incompatible types (from raw to logical) in subassignment type fix

Welcome to R!

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] first value...

2009-06-23 Thread Stavros Macrakis

I think what you mean is that you want to find the position of the first
non-NA value in the vector.  is.na returns a boolean vector of the NA
values, so:

xx - c(NA,NA,NA,2,3,NA,4)
 which(!is.na(xx))[1]
[1] 4

The other proposed solution,

which(diff(is.na(inc))  0)

is incorrect:

 which(diff(is.na(xx))0)
[1] 3 6

   -s

On Tue, Jun 23, 2009 at 10:00 AM, Alfredo Alessandrini alfreal...@gmail.com
 wrote:

 Hi,

 I've a vector like this:

  inc
  [1]NANANANANANA
  NA...
 [71]NANANANANANANA
  [78]NANANANA 13.095503 10.140119  7.989186

...

 I must obtain the position of first value of the vector...

 In this case is 82.

  inc[82]
 [1] 13.09550


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)

2009-06-20 Thread Stavros Macrakis

(I am replacing R-devel and r-bugs with r-help as addressees.)

On Sat, Jun 20, 2009 at 9:45 AM, Dr. D. P. Kreil dpkr...@gmail.com wrote:

 So if I request a calculation of 0.3-0.1-0.1-0.1 and I do not get 0,
 that is not an issue of rounding / underflow (or whatever the correct
 technical term would be for that behaviour)?


No.  Let's start from the beginning.

In binary floating point arithmetic, all numbers are represented as a*2^b,
where a and b have a fixed number of digits, so input conversion from
decimal form to binary form inherently loses some precision -- that is, it
rounds to the nearest binary fraction.

For example, representation(0.3) is 5404319552844595 * 2^-54, about 1e-17
less than exactly 3/10, which is of course not representable in the form
a*2^b.

The EXACT difference (calculating with rationals -- no roundoff errors etc.)
between representation(0.3) and 3*representation(0.1) is 2^-55 (about
1e-17); the EXACT difference between representation(0.3) and
representation(3*representation(0.1)) is 2^-54.  As it happens, in this
case, there is no rounding error at all -- the floating-point result of 0.3
- 3*0.1 is exactly -2^-54.

 I thought that guard digits would mean that 0.3-0.1*3 should be calculated
 in higher precision than the final representation of the result, i.e.,
 avoiding that this is not equal to 0?


Guard digits and sticky bits are techniques for more accurate rounding of
individual arithmetic operations, and do not persist beyond each individual
operation.  They cannot create precise results out of imprecise inputs
(except when they get lucky!).  And even with precise inputs, they cannot
create correctly rounded results with multiple operations.  Consider for
example (1.0 + 1.0e-15) - 1.0.  The correctly rounded result of
(1.0+1.0e-15) is 1.0011...  And the correctly rounded result of
(1.0+1.0e-15)-1.0 is 1.11e-15, which is 11% different than the mathematical
result.

Perhaps you are thinking about the case where intermediate results are
accumulated in higher-than-normal precision.  This technique only applies in
very specialized circumstances, and it not available to user code in most
programming languages (including R).  I don't know whether R's sum function
uses this technique or some other (e.g. Kahan summation), but it does manage
to give higher precision than summation with individual arithmetic
operators:

sum(c(2^63,1,-2^63)) = 1
but
   Reduce(`+`,c(2^63,1,-2^63)) = 0

I am sorry if I am not from the field... If you can suggest an online
 resource to help me use the right vocabulary and better understand the
 fundamental concepts, I am of course grateful.


I would suggest What every computer scientist should know about
floating-point arithmetic *ACM Computing Surveys* *23*:1 (March 1991) for
the basics.  Anything by Kahan (http://www.cs.berkeley.edu/~wkahan/) is
interesting.  Beyond elementary floating-point arithmetic, there is of
course the vast field of numerical analysis, which underlies many of the
algorithms used by R and other statistical systems.

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)

2009-06-20 Thread Stavros Macrakis

On Sat, Jun 20, 2009 at 4:10 PM, Dr. D. P. Kreil dpkr...@gmail.com wrote:

 Ah, that's probably where I went wrong. I thought R would take the
 0.1, the 0.3, the 3, convert them to extended precision binary
 representations, do its calculations, an the reduction to normal
 double precision binary floats would only happen when the result was
 stored or printed.


This proposal is problematic in many ways.  For example, it would *still*
not guarantee that 0.3 - 3*0.1 == 0, since extended-precision floats have
the same characteristics as normal-precision floats.  Would you round to
normal precision when passing arguments?  Then sqrt could not produce
extended-precision results. etc. etc.

I suppose R could support an extended-precision floating-point type, but
that would require that the *user* choose which operations were in
extended-precision and which in normal precision. (And of course it would be
a lot of work to add in a complete and consistent way.)

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tapply with cbinded x

2009-06-16 Thread Stavros Macrakis

On Tue, Jun 16, 2009 at 5:16 AM, Stefan Uhmann stefan.uhm...@googlemail.com
 wrote:

 why does this not work?

 df - data.frame(var1 = c(3,2,1), var2 = c(6,5,4), var3 = c(9,8,7),
fac = c('A', 'A', 'B'))
 tapply(cbind(df$var1, df$var2, df$var3), df$fac, mean)


Because tapply is defined for atomic vectors and not for data frames.  Why?
I don't know.

Does this do what you want?:

 df - data.frame(var1 = c(3,2,1), var2 = c(6,5,4), var3 = c(9,8,7))
 fac - c('a','a','b')
 do.call(rbind, lapply(split(df,fac),mean))
  var1 var2 var3
a  2.5  5.5  8.5
b  1.0  4.0  7.0

Alternatively, you can use sapply, which returns the result in matrix form.

 sapply(split(df,fac),mean)
   a b
var1 2.5 1
var2 5.5 4
var3 8.5 7
 as.data.frame(t(sapply(split(df,fac),mean)))
  var1 var2 var3
a  2.5  5.5  8.5
b  1.0  4.0  7.0

Note that sapply's matrix output form (the so-called 'simplification') needs
to be transposed.

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function inside ifelse

2009-06-15 Thread Stavros Macrakis

Of course functions can be used inside ifelse.  They should return vectors.

Be careful of the effect of recycling:

ifelse(c(F,T,F,T,F,T),1:3,10:20)
[1] 10  2 12  1 14  3

with functions:

 f- function(x) x/mean(x)
 ifelse(c(F,T,F,T,F,T),sqrt(1:3),f(10:20))
[1] 0.667 1.4142136 0.800 1.000 0.933 1.7320508

  -s

On Mon, Jun 15, 2009 at 10:39 AM, Grze¶ gregori...@gmail.com wrote:


 Could you tell me, if it's possible to create ifelse and put function
 inside, for example:

 code{
 ifelse ((is.na(vek)), call_fun_1(arguments), call_fun_2(arguments))

 call_fun_1 - function(arguments)
 { sth...
 }
 }
 --


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Referencing data frames

2009-06-15 Thread Stavros Macrakis

On Mon, Jun 15, 2009 at 12:38 PM, Payam Minoofar 
payam.minoo...@meissner.com wrote:

 ...I would like to have a function acquire an object by reference, and
 within the function create new objects based on the original object and then
 use the name of the original object as the base for the names of the newly
 created objects.

 It seems to me that the optimal way of doing this is to have the function
 acquire the name of the object as a string, and then use get() to access the
 object, and then to use the same string to do the name formation of the new
 objects


Instead of creating new names through string manipulation, I'd think it
would be cleaner and simpler to use the list mechanism to return a
structured object, e.g.

ddd - function (obj) list( new1 = makenew1(obj), new2 = makenew2(obj), new3
= makenew3(obj) )

Then you'd write, e.g.

   ddx - ddd(oldobj)
   ddx$new1 names new1

Perhaps this will work for you

  -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tables without names

2009-06-12 Thread Stavros Macrakis

On Fri, Jun 12, 2009 at 6:09 AM, Duncan Murdoch murd...@stats.uwo.cawrote:

 On 11/06/2009 5:35 PM, Stavros Macrakis wrote:

 A table without names displays like a vector:

 unname(table(2:3))
[1] 1 1 1

 and preserves the table class (as with unname in general):

 dput(unname(table(2:3)))
structure(c(1L, 1L), .Dim = 2L, class = table)

 Does that make sense?  R is not consistent in its treatment of such
 unname'd
 tables:


 One of the complaints about the S3 object system is that anything can claim
 to be of class foo, even if it doesn't have the right structure so that
 foo methods work for it.


Yes, that is one of its flaws.  More specifically, in this case, operations
on S3 objects can change them from being valid to being invalid.


 I think that's all you're seeing here:  you've got something that is
 mislabelled as being of class table.


Yes.


 The solution is don't do that.


Agreed!  But it's not clear to me how unname can *know* how not to do that
in the general case.  After all, unname on a vector of POSIXct's leaves a
valid POSIXct object.

...
 PS What is the standard way of extracting just the underlying vector?
 c(unname(...)) works -- is that what is recommended?


 I would use as.numeric(), but I don't claim it's standard.


Makes sense, as does the suggestion as.vector.  So I guess the summary of
'stripping' operations is:

c  --- strip all attributes (including most but not all classes) except for
names
unname -- strip name attributes, but no other attributes (including class)
unclass -- strip only class attribute
as.vector -- strip all attributes including class and name; convert generic
vectors to atomic vectors

Am I missing others?

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tables without names

2009-06-11 Thread Stavros Macrakis

A table without names displays like a vector:

 unname(table(2:3))
[1] 1 1 1

and preserves the table class (as with unname in general):

 dput(unname(table(2:3)))
structure(c(1L, 1L), .Dim = 2L, class = table)

Does that make sense?  R is not consistent in its treatment of such unname'd
tables:

In plot, they are considered erroneous input:

 plot(unname(table(2:3)))
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ

but in melt, they act as though they have names 1:n:

melt(unname(table(2:3)))
 indicies value
11 1
22 1

(By the way, is the spelling error built into too much code to be
corrected?)

-s

PS What is the standard way of extracting just the underlying vector?
c(unname(...)) works -- is that what is recommended?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splicing factors without losing levels

2009-06-09 Thread Stavros Macrakis

Various people have provided technical solutions to your problem.

May I suggest, though, that 'splice' isn't quite the right word for this
operation?  Splicing two pieces of rope / movie film / audio tape / wires /
etc. means connecting them at their ends, either at an extremity or in the
middle, e.g.

X:  
Y:  
Extremity splice: xx  or
yyxx
Middle splice: xxxyyyx or
yyyxxx

The splice itself is the point of connection (xy or yx) between two things.

In normal English, splicing never refers to interspersing alternate members
of X and Y.

This may seem like a minor point, but I think it is worthwhile using
descriptive names for functions.

 -s


On Tue, Jun 9, 2009 at 5:12 AM, Titus von der Malsburg
malsb...@gmail.comwrote:

 An operation that I often need is splicing two vectors:

   splice(1:3, 4:6)
  [1] 1 4 2 5 3 6


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splicing factors without losing levels

2009-06-09 Thread Stavros Macrakis

On Tue, Jun 9, 2009 at 11:16 AM, Titus von der Malsburg
malsb...@gmail.comwrote:

 On Tue, Jun 09, 2009 at 11:04:03AM -0400, Stavros Macrakis wrote:
  This may seem like a minor point, but I think it is worthwhile using
  descriptive names for functions.

 Makes sense.  I thought I've seen this use somewhere else (probably in
 Lisp?).  What better name do you suggest for this operation?


The two meanings I can think of in Lisp for splicing are

1) The backquote operator ,@X, which means to insert the value of X as part
of the surrounding list rather than as an element of the list, e.g.   `(a b
,@'(c d) e f) == (append '(a b) '(c d) '(e f)) =  (a b c d e f), as opposed
to `(a b ,'(c d) e f) == (append '(a b) (list '(c d)) '(e f)) = (a b (c d)
e f).

2) The notion of inserting (typically destructively) one list in the middle
of another.

I would suggest a name like 'intersperse' or 'alternate'.

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using regular expressions to retrieve a digit-digit-dot structure from a string

2009-06-09 Thread Stavros Macrakis

On Tue, Jun 9, 2009 at 7:44 AM, Mark Heckmann mark.heckm...@gmx.de wrote:

 Thanks for your help. Your answers solved the problem I posted and that is
 just when I noticed that I misspecified the problem ;)
 My problem is to separate a German texts by sentences. Unfortunately I
 haven't found an R package doing this kind of text separation in German, so
 I try it manually.

 Just using the dot as separator fails in occasions like:
 txt - One January 1. I saw Rick. He was born in the 19. century.


Sentence boundary disambiguation is a non-trivial problem, as you can see in
your above example (cf. I arrived on January 1. I saw Rick.).  You can get
~95% accuracy fairly straightforwardly, but the last 5% are hard.  Take a
look at http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation, which
points to other good resources.

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if else

2009-06-08 Thread Stavros Macrakis

On Mon, Jun 8, 2009 at 1:48 PM, Cecilia Carmo cecilia.ca...@ua.pt wrote:

 I have the following dataframe:
 firm-c(rep(1:3,4))
 year-c(rep(2001:2003,4))
 X1-rep(c(10,NA),6)
 X2-rep(c(5,NA,2),4)
 data-data.frame(firm, year,X1,X2)
 data

 So I want to obtain the same dataframe with a variable X3 that is:
 X1, if X2=NA
 X2, if X1=NA
 X1+X2 if X1 and X2 are not NA

 So my final data is
 X3-c(15,NA,12,5,10,2,15,NA,12,5,10,2)
 finaldata-data.frame(firm, year,X1,X2,X3)

 I've tried this

 finaldata-ifelse(data$X1==NA,ifelse(data$X2==NA,NA,X2),ifelse(data$varvendas==NA,X1,X1+X2))
 But I got just NA in X3.
 Anyone could help me with this?


The problem here is that comparing NA to anything always gives NA, even for
NA==NA.  To check for NA, you need to use is.na, e.g.

data$X3 - ifelse( is.na(data$X1), data$X2, ifelse( is.na(data$X2), data$X1,
data$X1+data$X2 )

(you don't need to handle the is.na(X1)  is.na(X2) case specially)

which you can make more compact using 'with':

data$X3 - with(data, ifelse( is.na(X1), X2, ifelse( is.na(X2), X1, X1+X2
)))

Hope this helps,

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if else

2009-06-08 Thread Stavros Macrakis

On Mon, Jun 8, 2009 at 3:36 PM, Don MacQueen m...@llnl.gov wrote:

Though I do agree that the way you've written the general case with any/
is.na and sum/na.rm is cleaner and clearer because more general, I don't
agree at all with what you say about nested ifelse's vs. a series of
assignments:


 In my opinion, nested ifelse() expressions are difficult to read and
 understand, and therefore difficult to get right.
 Easier to write one expression for each of your criteria. But do the last
 one first


In the ifelse case, it is easy to trace exactly what happens in each case,
because all the cases are disjoint.  This becomes especially clear if
written with a lot of whitespace and proper indentation:

ifelse( is.na(X1),
 X2,  # the is.na(X1) case
 ifelse( is.na(X2),   # the !is.na(X1) case
X1,   # the !is.na(X1)  is.na(X2)
case
X1+X2 )))  # the !is.na(X1)  !is.na(X2)
case

I suppose it might be clearer for some users at least if you wrote out *all*
the cases, even though they're not necessary:

ifelse( is.na(X1),
 ifelse( is.na(X2),# the is.na(X1) cases
NA,  # the is.na(X1)  is.na(X2)
case
X2 )))# the is.na(X1)  !is.na(X2)
case
 ifelse( is.na(X2),# the !is.na(X1) cases
X1,   # the !is.na(X1)  is.na(X2)
case
X1+X2 )))  # the !is.na(X1)  !is.na(X2)
case

On the other hand, with the multiple assignment case, if you're not careful,
it's easy to have different statements overwriting each other's results in
unintended ways.  For those who've been around programming for a while, they
may recall Dijkstra's goto considered harmful letter -- which is echoed by
functional programming's assignment considered harmful!

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Done: Fast way of finding top-n values of a long vector

2009-06-05 Thread Stavros Macrakis

On Fri, Jun 5, 2009 at 4:09 AM, Allan Engelhardt all...@cybaea.com wrote:

 I'm all done now.  The max2 version below is what I went with in the end
 for my proposed change to caret::nearZeroVar (which used the sort method).
 Max Kuhn will make it available on CRAN soon.  It speeds up that routine by
 a factor 2-5 on my test cases and uses much less memory.


You can save a little in max2 like this:

max2a = {w-which.max(x); x[w]/max(x[-w], na.rm=TRUE);}

If you don't need to handle NA's (or if you know a priori how many there
are), you can also speed up part:

  parta = {sel - length(x)+c(-1,0); a-sort.int(x, partial=sel,
na.last=NA)[2:1]; a[1]/a[2];}

which becomes about as fast as max2.

 library(rbenchmark)
set.seed(1); x - runif(1e7, max=1e8);
benchmark(
  replications=20,
  columns=c(test,elapsed),
  order=elapsed
, sort = {a-sort(x, decreasing=TRUE, na.last=NA)[1:2];
  a[1]/a[2];}
, qsrt = {a-sort(x, decreasing=TRUE, na.last=NA, method=quick)[1:2];
  a[1]/a[2];}
, part = {a-sort.int(-x, partial=1:2, na.last=NA)[1:2];
  a[1]/a[2];}
, parta = {end-length(x)+c(-1,0);
   a-sort.int(x, partial=end, na.last=FALSE)[end];
   a[1]/a[2]; }
, max1 = {m-max(x, na.rm=TRUE);
  w-which(x==m)[1];
  m/max(x[-w],na.rm=TRUE);}
, max2 = {w-which.max(x);
  max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
, max2a = {w-which.max(x);
  x[w]/max(x[-w], na.rm=TRUE);}
)

   test elapsed
7 max2a7.80
6  max28.94
4 parta9.05
3  part   10.72
5  max1   20.21
2  qsrt   49.33
1  sort   94.18


 For what it is worth, I also made a C version (cmax below) which of
 course is faster yet again and scales nicely for returning the top n values
 of the array:

 cmax - function (v) {max - vector(double,2); max - .C(test,
 as.double(v), as.integer(length(v)), max, NAOK=TRUE)[[3]];
 return(max[1]/max[2]);}

 library(rbenchmark)
 set.seed(1); x - runif(1e7, max=1e8); x[1] - NA;
 benchmark(
 replications=20,
 columns=c(test,elapsed),
 order=elapsed
 , sort = {a-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
 , qsrt = {a-sort(x, decreasing=TRUE, na.last=NA, method=quick)[1:2];
 a[1]/a[2];}
 , part = {a-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];}
 , max1 = {m-max(x, na.rm=TRUE); w-which(x==m)[1];
 m/max(x[-w],na.rm=TRUE);}
 , max2 = {w-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
 , cmax = {cmax(x);}
 )
 #   test elapsed
 # 6 cmax   4.394
 # 5 max2   8.954
 # 4 max1  18.835
 # 3 part  21.749
 # 2 qsrt  46.692
 # 1 sort  77.679

 Thanks for all the suggestions and comments.

 Allan.


 PS: Slightly off-topic but is there a way within the syntax of R to set up
 things so that 'sort' (or any function) would know it is called in a partial
 list context in sort(x)[1:2] and it therefore could choose to use the
 partial argument automatically for small [] lists?  The R interpreter of
 course knows full well that it is going to drop all but the first two values
 of the result before it calls 'sort'.  Perl has 'use Want' where howmany()
 and want(n) provides a subset of this functionality (essentially for []
 lists of the form 1:n).

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] all.equal(0,0i)

2009-06-02 Thread Stavros Macrakis

 all.equal(0,0i)
[1] Modes: numeric, complex
[2] target is numeric, current is complex

 all.equal(1,1+0i)
[1] Modes: numeric, complex
[2] target is numeric, current is complex

Is this the intended behavior?

In general, all.equal is strict about argument mode, thus TRUE/1 and 1/'1'
do not compare equal (unlike ==).  On the other hand, 1L and 1.0 do compare
equal (unlike identical).

? all.equal discusses the 'numerical' case, and mentions what metric is used
for complex arguments, but doesn't make it clear whether 'complex' is
considered 'numerical' (as opposed to 'numeric', which in R terms means
integer or double).

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error:non-numeric argument in my function

2009-06-01 Thread Stavros Macrakis

Agreed, that's even better, e.g.

Error in 1 * a : character argument not allowed for arithmetic
operator *

For some reason (does anyone know the rationale?), in the case of factors,
you don't get an error, but a more explicit warning and an NA result:

 2*factor(3)
[1] NA
Warning message:
In Ops.factor(2, factor(3)) : * not meaningful for factors

This seems hazardous, especially since the user has to be sophisticated
enough to know about options(warn=2) to get a traceback for this.

As for data frames, arithmetic operators seem to work if all the values are
numeric:

 2*data.frame(a=1)
  a
1 2

It's a hard problem to make useful error messages for beginning users

-s


On Mon, Jun 1, 2009 at 4:34 AM, Patrick Burns pbu...@pburns.seanet.comwrote:

 I thought Stavros' suggestion was going
 to be to have the error message say what
 type of offending object was found.  If
 the message said that a list of class
 'data.frame' was found (probably the leading
 case), then that would be much more helpful.

 Patrick Burns
 patr...@burns-stat.com
 +44 (0)20 8525 0696
 http://www.burns-stat.com
 (home of The R Inferno and A Guide for the Unwilling S User)

 Stavros Macrakis wrote:

 On Sun, May 31, 2009 at 6:10 PM, jim holtman jholt...@gmail.com wrote:

  Message is very clear:

  1 * 'a'

 Error in 1 * a : non-numeric argument to binary operator



 Though the user should have been able to figure this out, perhaps the
 error
 message could be improved? After all, it is not the fact that the operator
 is *binary* that implies that its argument must be numeric, but that it is
 *arithmetic*. The binary operator %in%, for example, takes non-numeric
 arguments.

 Suggested replacement error message:

 non-numeric argument to arithmetic operator

   -s

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error:non-numeric argument in my function

2009-05-31 Thread Stavros Macrakis

On Sun, May 31, 2009 at 6:10 PM, jim holtman jholt...@gmail.com wrote:

 Message is very clear:

  1 * 'a'
 Error in 1 * a : non-numeric argument to binary operator


Though the user should have been able to figure this out, perhaps the error
message could be improved? After all, it is not the fact that the operator
is *binary* that implies that its argument must be numeric, but that it is
*arithmetic*. The binary operator %in%, for example, takes non-numeric
arguments.

Suggested replacement error message:

 non-numeric argument to arithmetic operator

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] max.col specification

2009-05-29 Thread Stavros Macrakis

I'm not sure I understand the max.col spec or its rationale.  In particular:

* What is the significance and effect of assuming that the entries are
probabilities, as they do not seem to be limited to the interval [0,1]?
* In what contexts is it useful for max.col to consider numbers within a
certain tolerance equal?
* Why is a fixed relative tolerance of 1e-5 useful? That seems many orders
of magnitude greater than typical rounding errors, but arbitrary in terms of
data analysis, where different data sets or statistics may have widely
varying error distributions. And I'd have thought a tolerance of 0 natural
in many cases.

My guess is that there is some particular kind of analysis where these are
all natural background assumptions, but it is not clear what that analysis
is.

Also, max.col is part of 'base', so the authors must have thought that these
assumptions were generally applicable.  Can someone clarify?

Thanks,

  -s



On Thu, May 28, 2009 at 5:02 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Try reading the man page, which says:

 Details

 When ties.method = random, as per default, ties are broken at random. In
 this case, the determination of a tie assumes that the entries are
 probabilities: there is a relative tolerance of 1e-5, relative to the
 largest (in magnitude, omitting infinity) entry in the row.


 Bert Gunter
 Genentech Nonclinical Biostatistics

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Daryl Morris
 Sent: Thursday, May 28, 2009 1:47 PM
 To: r-help@r-project.org
 Subject: [R] max.col weirdness

 Hi,
 I think there's some rounding issue with returning the max column.
 (running 2.9.0 on an Apple, but my buddy found it on his PC)

   x - matrix(c(1234.568,1234.569,1234.567),1)
   max.col(x)
 [1] 2
   x - matrix(c(12345.568,12345.569,12345.567),1)
   max.col(x)
 [1] 3
   x - matrix(c(112345.568,112345.569,112345.567),1)
   max.col(x)
 [1] 3
   max.col(-x)
 [1] 1

 Thanks, Daryl

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maxtrix to permutation vector

2009-05-29 Thread Stavros Macrakis

Not sure what you mean by permutations here.  I think what you mean is
that given a matrix m, you want a matrix whose rows are c(i,j,m[i,j]) for
all i and j.  You can use the `melt` function in the `reshape` package for
this.  See below.

Hope this helps,

 -s

 library(reshape)
 melt(matrix(1:4,2,2))
  X1 X2 value
1  1  1 1
2  2  1 2
3  1  2 3
4  2  2 4

big - matrix(1:700^2,700,700)
 head(melt(big))
  X1 X2 value
1  1  1 1
2  2  1 2
3  3  1 3
4  4  1 4
5  5  1 5
6  6  1 6
 system.time(melt(big))
   user  system elapsed
   0.080.000.08


On Fri, May 29, 2009 at 2:08 PM, Ian Coe i...@connectcap.com wrote:

 Hi,

   Is there a way to  convert a matrix into a vector representing all
 permutations of values and column/row headings with native R functions?
 I did this with 2 nested for loops and it took about 25 minutes to run
 on a  ~700x700 matrix.  I'm assuming there must be a smarter way to do
 this with R's vector commands, but being new to R, I'm having trouble
 making it work.



 Thanks,

 Ian



 [a] [b] [c]

 [d]147

 [e]258

 [f]369



 a d 1

 a e 2

 a f 3

 b d 4

 b e 5

 b f 6

 c d 7

 c e 8

 c f 9









 Ian Coe



 Connective Capital Management, LLC

 385 Homer Ave.

 Palo Alto, CA 94301

 (650) 321-4826 ext. 03



 CONFIDENTIALITY NOTICE: This e-mail communication (inclu...{{dropped:23}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maxtrix to permutation vector

2009-05-29 Thread Stavros Macrakis

Oh, I should have mentioned that the result of melt is a data.frame, not a
matrix.  You can convert with as.matrix if you like.

I should also have shown that dimnames are carried along:

 m - matrix(1:4,2,2,dimnames=list(x=c('a','b'),y=c('x','y')))

 m
   y
x   x y
  a 1 3
  b 2 4

 melt(m)
  x y value
1 a x 1
2 b x 2
3 a y 3
4 b y 4

 as.matrix(melt(m))
 x   y   value
[1,] a x 1
[2,] b x 2
[3,] a y 3
[4,] b y 4



On Fri, May 29, 2009 at 2:53 PM, Stavros Macrakis macra...@alum.mit.eduwrote:

 Not sure what you mean by permutations here.  I think what you mean is
 that given a matrix m, you want a matrix whose rows are c(i,j,m[i,j]) for
 all i and j.  You can use the `melt` function in the `reshape` package for
 this.  See below.

 Hope this helps,

  -s

  library(reshape)
  melt(matrix(1:4,2,2))
   X1 X2 value
 1  1  1 1
 2  2  1 2
 3  1  2 3
 4  2  2 4

 big - matrix(1:700^2,700,700)
  head(melt(big))
   X1 X2 value
 1  1  1 1
 2  2  1 2
 3  3  1 3
 4  4  1 4
 5  5  1 5
 6  6  1 6
  system.time(melt(big))
user  system elapsed
0.080.000.08


 On Fri, May 29, 2009 at 2:08 PM, Ian Coe i...@connectcap.com wrote:

 Hi,

   Is there a way to  convert a matrix into a vector representing all
 permutations of values and column/row headings with native R functions?
 I did this with 2 nested for loops and it took about 25 minutes to run
 on a  ~700x700 matrix.  I'm assuming there must be a smarter way to do
 this with R's vector commands, but being new to R, I'm having trouble
 making it work.



 Thanks,

 Ian



 [a] [b] [c]

 [d]147

 [e]258

 [f]369



 a d 1

 a e 2

 a f 3

 b d 4

 b e 5

 b f 6

 c d 7

 c e 8

 c f 9









 Ian Coe



 Connective Capital Management, LLC

 385 Homer Ave.

 Palo Alto, CA 94301

 (650) 321-4826 ext. 03



 CONFIDENTIALITY NOTICE: This e-mail communication (inclu...{{dropped:23}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] custom sort?

2009-05-28 Thread Stavros Macrakis

I agree that it is surprising that R doesn't provide a sort function with a
comparison function as argument. Perhaps that is partly because calling out
to a function for each comparison is relatively expensive; R prefers vector
operations.

That said, many useful custom sorts are easy to define by reordering,
possibly using the 'order' function, e.g.

rr - function (v) v[order( v %% 10 , v  500, - v ) ]
# sort first by last digit (ascending), then by whether  500, then by
magnitude (descending)

set.seed(2009)
rr(sample(1000,30))
 [1] 840 670 580 140 100  10 991 901 881 561 231  71 722 662 432 222  32
473  53
[20]  24 645 796  86 697 607 567 397 257  77 818 568 428 198 619 569 479 439
299

Hope this helps,

 -s

On Thu, May 28, 2009 at 6:06 PM, Steve Jaffe sja...@riskspan.com wrote:


 hmm, that is what I was afraid of. I considered that but thought to myself,
 surely there must be an easier way.  I wonder why this feature isn't
 available. It's there in scripting languages, like perl, but also in
 hardcore languages like C++ where std::sort and sorted containers allow
 the user to provide a comparison function (even for builtin types like
 int).
 It's hard to believe that you have to jump through more hoops to do a
 custom
 sort in R than in C++ ...


 You put a class on the vector...

 --
 View this message in context:
 http://www.nabble.com/custom-sort--tp23770565p23770964.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RODBC package: how to check whether connection is open

2009-05-28 Thread Stavros Macrakis

What is the recommended way of checking whether an RODBC connection is open?

Since odbcValidChannel is not exported from namespace RODBC, I suppose I
shouldn't be using it.

This is the best I could come up with, but it seems a bit 'dirty' to be
using a tryCatch for something like this:

  odbcOpenp - function(conn)
 tryCatch({odbcGetInfo(conn);TRUE},error=function(...)FALSE)

Suggestions?

  -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] custom sort?

2009-05-28 Thread Stavros Macrakis

I couldn't get your suggested method to work:

  `==.foo` - function(a,b) unclass(a)==unclass(b)
  `.foo` - function(a,b) unclass(a)  unclass(b) # invert comparison
  is.na.foo - function(a)is.na(unclass(a))

  sort(structure(sample(5),class=foo))  #- 1:5  -- not reversed

What am I missing?

   -s

On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch murd...@stats.uwo.cawrote:

 On 28/05/2009 5:34 PM, Steve Jaffe wrote:

 Sounds simple but haven't been able to find it in docs: is it possible to
 sort a vector using a user-defined comparison function? Seems it must be,
 but sort doesn't seem to provide that option, nor does order sfaics


 You put a class on the vector (e.g. using class(x) - myvector), then
 define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison
 methods (you'll need ==.myvector, .myvector, and is.na.myvector).

 Duncan Murdoch


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to exclude a column by name?

2009-05-27 Thread Stavros Macrakis

On Wed, May 27, 2009 at 6:37 AM, Zeljko Vrba zv...@ifi.uio.no wrote:

 Given an arbitrary data frame, it is easy to exclude a column given its
 index:
 df[,-2].  How to do the same thing given the column name?  A naive attempt
 df[,-name] did not work :)


Various ways:

Boolean index vector:

df[ , names(df) != name ]

List of wanted column names:

df[ , setdiff(names(df), name) ]

Negated list of unwanted column indexes:

   df[ , -match(name,names(df)) ]
   df[ , -which(names(df) == name) ]

The special 'subset' hack for column names; beware, I think this is the only
place in R where you can negate a column name.

   subset(df , select = -a )

Hope this helps,

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Defining functions - an interesting problem

2009-05-27 Thread Stavros Macrakis

The 'ties.method' argument to 'rank' is the *third* positional argument to
'rank', so either you need to put it in the third position or you need to
use a named argument.

The fact that the variable you're using to represent ties.method is called
ties.method is irrelevant.  That is, this:

  rank(x,ties.method)

is equivalent to

 rank(x, na.last = ties.method)

which is not what you want.

You need to write

 rank(x, ties.method = ties.method)

or (more concise but not as clear):

 rank(x, , ties.method)

Hope this helps,

  -s

On Wed, May 27, 2009 at 10:11 AM, utkarshsinghal 
utkarsh.sing...@global-analytics.com wrote:

 I define the following function:
 (Please don't wonder about the use of this function, this is just a
 simplified version of my actual function. And please don't spend your time
 in finding an alternate way of doing the same as the following does not
 exactly represent my function. I am only interested in a good explanation)

  f1 = function(x,ties.method=average)rank(x,ties.method)
  f1(c(1,1,2,4), ties.method=min)
 [1] 1.5 1.5 3.0 4.0

 I don't know why it followed ties.method=average.
 Anyways I randomly tried the following:

  f2 = function(x,ties.method=average)rank(x,ties.method=ties.method)
  f2(c(1,1,2,4), ties.method=min)
 [1] 1 1 3 4
 Now, it follows the ties.method=min

 I don't see any explanation for this, however, I somehow mugged up that if
 I define it as in f1, the ties.method in rank function takes its default
 value which is average and if I define as in f2, it takes the value
 which is passed in f2.

 But even all my mugging is wasted when I tested the following:

  h = function(x, a=1)x^a
  g1 = function(x, a=1)h(x,a)
  g1(x=5, a=2)
 [1] 25

  g2 = function(x, a=1)h(x,a=a)
  g2(x=5, a=2)
 [1] 25

 Here in both the cases, h is taking the value passed through g1, and
 g2.

 Any comments/hints can  be helpful.

 Regards
 Utkarsh

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Books listing on R-Project

2009-05-27 Thread Stavros Macrakis

I was wondering what the criteria were for including books on the Books
Related to R page http://www.r-project.org/doc/bib/R-books.html. (There is
no maintainer listed on this page.)

In particular, I was wondering why the following two books are not listed:

* Andrew Gelman, Jennifer Hill, *Data Analysis Using Regression and
Multilevel/Hierarchical Models*. (CRAN package 'arm')

* Michael J. Crawley, *The R Book*. (reviewed, rather negatively, in *R News
* *7*:2)

Is the list more or less arbitrary?  Does it reflect some editorial judgment
about the value of these books? If so, it might be more useful to include
the books, but with critical reviews.  It doesn't seem to be a matter of
up-to-dateness, because 38/87 of the listed books were published in a more
recent year than Gelman or Crawley.

The list is currently in reverse chronological order.  I wonder if it would
be useful to group the entries thematically -- I'd be happy to help on that
project.

  -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] OWL (Web Ontology Language) in R?

2009-05-26 Thread Stavros Macrakis

Is anyone working on an R package for manipulating OWL (Web Ontology
Language), either natively or via an external library?

I don't see anything obviously relevant in CRAN, though of course OWL
functionality could be built up starting with the XML package.

Thanks,

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] XML parse error

2009-05-24 Thread Stavros Macrakis

On Sun, May 24, 2009 at 12:28 PM, kulwinder banipal kbani...@hotmail.comwrote:

  It is for sure little complicated then a plain XML file.  The format of
 binary file is according to XML schema. I have been able to get C parser
 going to get information from binary with one caveat - I have to manually
 read the XML schema and figure out which byte means what in binary and
 then code it in C.


There are many ways of encoding XML in a compact binary form (cf.
http://en.wikipedia.org/wiki/Binary_XML), none widely accepted yet. The XML
schema does not specify the binary form.

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Class for time of day?

2009-05-22 Thread Stavros Macrakis

On Thu, May 21, 2009 at 8:28 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 It uses hours/minutes/seconds for values  1 day and uses days and
 fractions
 of a day otherwise.


Yes, my examples were documenting this idiosyncracy.


 For values and operations that it has not considered it falls back to
 the internal representation.


Yes, my examples were documenting this bad behavior.


 Most of your examples start to make sense once you realize this.


Of course I realize this.  That was the point of my examples.  I
understand perfectly well what is *causing* the bad behavior.  That doesn't
make it less bad.

What is the point of a class system if functions ignore the class and
perform meaningless calculations on the internal representation?

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Class for time of day?

2009-05-22 Thread Stavros Macrakis

On Fri, May 22, 2009 at 10:03 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 Regarding division you could contribute that to the chron package.
 I've contributed a few missing items and they were incorporated.


Good to know.  Maybe I'll do that


 Giving an error when it does not understand something would be
 dangerous as it could break much existing code so that would
 probably not be possible at this stage.


But would it break any existing *correct* code?  I find it hard to imagine
any cases where adding 1 hour of difftime to times(12:00:00) should return
1.5 days rather than 13:00:00.


 The idea of defaulting to internal representations is based on
 the idea that you get many features for free since the way the
 internal representations work gives the right answer in many
 cases.


I must admit I am rather shocked by this approach.  Getting something for
free is a bad bargain if what you get is nonsense.


 Its best to stick with the implicit philosophy that
 underlies a package.  If you want a different philosophy then
 its really tantamount to creating a new package.  I don't
 think that one is right and the other wrong but simply
 represent different viewpoints.


So you would defend the viewpoint that 1 hour is the same thing as 1 day?

 -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Class for time of day?

2009-05-22 Thread Stavros Macrakis

On Fri, May 22, 2009 at 12:28 PM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

...The way this might appear in code is if someone wanted to calculate the
 number of one hour intervals in 18 hours.  One could write:

 t18 - times(18:00:00)
 t1 - times(1:00:00)
 as.numeric(t18) / as.numeric(t1)

 but since we all know that it uses internal representations unless it
 indicates otherwise


Um, yes, I suppose that was the attitude in the 60's and 70's, but I think
we have moved on from there.  cf.
http://en.wikipedia.org/wiki/Data_abstraction


 a typical code snippet might shorten it to:

 as.numeric(t18 / t1)

 and all such code would break if one were to cause that to generate an
 error.


(18/24 day)/(1/24 day) is the perfectly meaningful dimensionless number 18,
so this code should not break with a correct implementation of '/'.  (cf.
http://en.wikipedia.org/wiki/Dimensional_analysis).  Alas, chron gives the
nonsense result of 18 days.

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Class for time of day?

2009-05-21 Thread Stavros Macrakis

On Wed, May 20, 2009 at 12:28 PM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 There is a times class in the chron package.


Perfect!  Just what I was looking for.

On Wed, May 20, 2009 at 12:19 PM, jim holtman jholt...@gmail.com wrote:

 If you want the hours from a POSIXct, here is one way of doing it...

  y - difftime(x, trunc(x, units='days'), units='hours')


Ah, trunc.POSIXt -- I missed that one, thanks.

It depends on what type of computations you want to do with it.  You can
 leave it as POSIXct and carry out a lot of them.  Can you specify what you
 want?


I am comparing irregular time series from different days, looking at the
differences in intraday patterns.  So I want to put them on a common 0-24h
scale and then do various kinds of plots and analyses, keeping the
conventional display form (10:30 etc.) when specific times display or
print.  It looks as though chron:::times combined with trunc.POSIXt pretty
much solves my problem, except that `times` ignores the time units:

 as.POSIXct('2009-3-23 12:23')-trunc(as.POSIXct('2009-3-23 12:23'),day)
Time difference of 12.38333 hours
 times(as.POSIXct('2009-3-23 12:23')-trunc(as.POSIXct('2009-3-23
12:23'),day))
Time in days: seems to treat difftimes as raw numbers!!
[1] 12.38333

Obviously I can work around this, but shouldn't `times` give an error when
it encounters an object of unknown class rather than unsafely using its
internal representation?  Of course, better still if `times` converted
correctly

In general, `times` has other inconsistent and peculiar behavior:

times(2) = Time in days: 2Allows specifying multi-day periods, OK
times(1.5) = Time in days: 1.5   Allows specifying fractional multi-day
periods, OK
times(0.5) = 12:00:00   Inconsistent format compared to times(1.5)
times(18:00:00) + times(18:00:00)  = Time in days: 1.5, OK
times(36:00:00) = error  Why does it allow times(1.5) and
times(18:00:00) + times(18:00:00) to specify 1.5 days, but not 36 hours?
times(-0.5) = -0.5   Why doesn't it print Time in days: -0.5?
times(18:00:00)/times(1:00:00) = Time in days: 18Incorrect
dimensions; meaningless result -- should be dimensionless
times(18:00:00) * times(10:00:00) = 07:30:00 Incorrect dimensions;
meaningless result.
sin(times(18:00:00)) = 16:21:34 Meaningless result -- should be error

It's nice that R has a class system, but if code ignores the class

There is an article on dates and times in R News 4/1.


Thanks for the pointer.

  -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Functions returning functions

2009-05-20 Thread Stavros Macrakis

On Wed, May 20, 2009 at 7:21 AM, Paulo Grahl pgr...@gmail.com wrote:

 A - function(parameters) {
 # calculations w/ parameters returning 'y'
 tmpf - function(x) { # function of 'y' }
 return(tmpf)
 }

 The value of the parameters are stored in an environment local to the
 function. Then I call
 x- something
 B-A(x)

 When R executes this last statement,  does it perform all the
 calculations inside function A again (i.e., all the calculations that
 yield 'y')
  or the value of 'y' is already stored in the function's local environment
 ?


 A - function(q) {
print(calculating y)
y - q+1
function(x) print(paste(value of x:,x,value of y:,y))
 }
 A(5)
[1] calculating y
function(x) print(paste(value of x:,x,value of y:,y))
environment: 0x07abe2a8
 A(5)(4)
[1] calculating y
[1] value of x: 4 value of y: 6
 A5 - A(5)
[1] calculating y
 A5(4)
[1] value of x: 4 value of y: 6


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Too large a data set to be handled by R?

2009-05-20 Thread Stavros Macrakis

On Tue, May 19, 2009 at 11:59 PM, tsunhin wong thjw...@gmail.com wrote:

 In order to save time, I am planning to generate a data set of size
 1500 x 2 with each data point a 9-digit decimal number, in order
 to save my time.
 I know R is limited to 2^31-1 and that my data set is not going to
 exceed this limit. But my laptop only has 2 Gb and is running 32-bit
 Windows / XP or Vista.


32-bit R on Windows XP with 2GB RAM has no problem with a matrix this size
(not just integers, but also numerics):

 system.time(mm - matrix( numeric(1500 * 2), 1500, 2))
   user  system elapsed
   0.590.231.87
 system.time(nn - matrix( runif(1500 * 2), 1500, 2))
   user  system elapsed
   2.660.64   13.39
 system.time(oo - nn + 3)
   user  system elapsed
   0.240.170.41
 system.time(pp - oo - oo)
   user  system elapsed
   0.150.130.28

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Class for time of day?

2009-05-20 Thread Stavros Macrakis

What is the recommended class for time of day (independent of calendar
date)?

And what is the recommended way to get the time of day from a POSIXct
object? (Not a string representation, but a computable representation.)

I have looked in the man page for DateTimeClasses, in the Time Series
Analysis Task View and in Spector's Data Manipulation book but haven't found
these. Clearly I can create my own Time class and hack around with the
internal representation of POSIXct, e.g.

days - unclass(d)/(24*3600)
days-floor(days)

and write print.Time, `-.Time`, etc. etc. but I expect there is already a
standard class or CRAN package.

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exists function on list objects gives always a FALSE

2009-05-19 Thread Stavros Macrakis

On Tue, May 19, 2009 at 12:07 PM, routík zrou...@gmail.com wrote:

  SmoothData - list(exists=TRUE, span=0.001)
  exists(SmoothData$span)
 FALSE


As others have said, this just checks for the existence of a variable with
the (strange) name SmoothData$span.

In some sense, in R semantics, xxx$yyy *always* exists if xxx is a list (or
other recursive object):

  xxx - list()
  xxx$hello
 NULL

You might think that you can check names(xxx) to see if the slot has been
explicitly set, but it depends on *how* you have explicitly set the slot to
NULL:

xxx$hello - 3
xxx$hello - NULL
names(xxx)
   character(0)  # no names -- assigning to NULL kills slot
xxx - list(hello=NULL)
names(xxx)
   [1] hello# 1 name -- constructing with NULL-valued
slot

Welcome to R!

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Concatenating two vectors into one

2009-05-18 Thread Stavros Macrakis

If you want to concatenate the *vectors*, you need 'c', which will
also coerce the elements to a common type.

If you want to concatenate the corresponding *elements* of the
vectors, you need 'paste', which will coerce them to character
strings.

 -s


On 5/18/09, Henning Wildhagen hwildha...@gmx.de wrote:
 Dear users,

 a very simple question:

 Given two vectors x and y

 x-as.character(c(A,B,C,D,E,F))
 y-as.factor(c(1,2,3,4,5,6))

 i want to combine them into a single vector z as A1, B2, C3 and so on.

 z-x*y is not working, i tried several others function, but did not get to
 the solution.

 Thanks for your help,

 Henning


 --


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Generic 'diff'

2009-05-18 Thread Stavros Macrakis

I would like to apply a function 'f' to the lagged version of a vector and
the vector itself.

This is easy to do explicitly:

  mapply( f, v[-1], v[-length(v)] )

or in the case of a pointwise vector function, simply

  f( v[-1], v[-length(v)] )

This is essentially the same as 'diff' but with an arbitrary function, not
'-'.

Is there a standard way to do this? Is there any particular reason that
'diff' should not have an 'f' argument?

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generic 'diff'

2009-05-18 Thread Stavros Macrakis

I guess I wasn't very clear.  The goal is not to define diff on a different
object type, but to have a different 'subtraction' operator with the same
lag logic.  An easy example would be quotient instead of subtraction. Of
course I could do that by simply cutting and pasting diff.default and
replacing '-'(a,b) with f(a,b), but it's cleaner to use a standard function
if there is one.

  -s

On Mon, May 18, 2009 at 5:05 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 You can define a new class for the object diff operates
 on and then define your own diff method for that. For
 some examples see:

 methods(diff)



 On Mon, May 18, 2009 at 4:24 PM, Stavros Macrakis macra...@alum.mit.edu
 wrote:
  I would like to apply a function 'f' to the lagged version of a vector
 and
  the vector itself.
 
  This is easy to do explicitly:
 
   mapply( f, v[-1], v[-length(v)] )
 
  or in the case of a pointwise vector function, simply
 
   f( v[-1], v[-length(v)] )
 
  This is essentially the same as 'diff' but with an arbitrary function,
 not
  '-'.
 
  Is there a standard way to do this? Is there any particular reason that
  'diff' should not have an 'f' argument?
 
 -s
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generic 'diff'

2009-05-18 Thread Stavros Macrakis

On Mon, May 18, 2009 at 6:00 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 I understood what you were asking but R is an oo language so
 that's the model to use to do this sort of thing.


I am not talking about creating a new class with an analogue to the
subtraction function.  I am talking about a function which applies another
function to a sequence and its lagged version.

Functional arguments are used all over the place in R's base package
(Xapply, sweep, outer, by, not to mention Map,  Reduce, Filter, etc.) and
they seem perfectly natural here.

Or perhaps I am not understanding your objection.

  -s


 On Mon, May 18, 2009 at 5:48 PM, Stavros Macrakis macra...@alum.mit.edu
 wrote:
  I guess I wasn't very clear.  The goal is not to define diff on a
 different
  object type, but to have a different 'subtraction' operator with the same
  lag logic.  An easy example would be quotient instead of subtraction. Of
  course I could do that by simply cutting and pasting diff.default and
  replacing '-'(a,b) with f(a,b), but it's cleaner to use a standard
 function
  if there is one.
 
-s
 
  On Mon, May 18, 2009 at 5:05 PM, Gabor Grothendieck
  ggrothendi...@gmail.com wrote:
 
  You can define a new class for the object diff operates
  on and then define your own diff method for that. For
  some examples see:
 
  methods(diff)
 
 
 
  On Mon, May 18, 2009 at 4:24 PM, Stavros Macrakis 
 macra...@alum.mit.edu
  wrote:
   I would like to apply a function 'f' to the lagged version of a vector
   and
   the vector itself.
  
   This is easy to do explicitly:
  
mapply( f, v[-1], v[-length(v)] )
  
   or in the case of a pointwise vector function, simply
  
f( v[-1], v[-length(v)] )
  
   This is essentially the same as 'diff' but with an arbitrary function,
   not
   '-'.
  
   Is there a standard way to do this? Is there any particular reason
 that
   'diff' should not have an 'f' argument?
  
  -s
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newbie: closing unused connection + readline

2009-05-16 Thread Stavros Macrakis

On Sat, May 16, 2009 at 8:34 AM, Aval Sarri aval.sa...@gmail.com wrote:
 # Create a socket from which to read lines - one at a time (record)
 reader.socket -   socketConnection( host = 'localhost', 5000,
                                     server = TRUE, blocking = TRUE,
                                     open = r, encoding = 
 getOption(encoding) );
 # now read each record and split/validate it using read.table
 repeat {
  # here for each line I am opening new connection! how to avoid it?
  line.raw - textConnection(readLines( reader.socket, n = 1, ok = TRUE));

What is the function of textConnection here?  Is read.table
incompatible with socketConnection for some reason?

  line.raw - read.table(line.raw, sep=,);

 ...at the end of script I am getting closing unused connection warning

This is not a problem in itself.  For some reason, R gives a warning
when connections are garbage collected.  Of course, that can be a
symptom of poor connection management, but not necessarily.

In the present case, you are creating many unnecessary
textConnections, and R correctly garbage collects them.

-s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] newbie: closing unused connection + readline

2009-05-16 Thread Stavros Macrakis

On Sat, May 16, 2009 at 9:11 AM, Aval Sarri aval.sa...@gmail.com wrote:
 ...I tried something line this also:

 mydataframe - read.table (socket, sep=,);

 but does not work says no input lines.

 this also.

 mydataframe - read.table (readLine(socket), sep=,);

Sorry, I didn't see this before my last email.  This seems to be the
real problem

I don't understand why read.table would have a problem reading
directly from a socket instead of a textConnection.  Is this a bug?
Some subtlety in the semantics of socketConnection as opposed to
textConnection?  Incorrect parameters when opening the
socketConnection?

-s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gamma function

2009-05-16 Thread Stavros Macrakis

What exactly is the R code you wrote for your function f?  Without
that, it will be hard to help you.

 -s

On Sat, May 16, 2009 at 2:48 AM, Kon Knafelman konk2...@hotmail.com wrote:

 Hi Guy,

 I am having trouble graphing the following function

 √2Γ(n/2)/[√n - 1Γ((n - 1)/2 for the values of n between 2 and 50.

 i know that Γ(n) = (n-1)!, which in R is factorial(n-1)

 When i type that into R, using y - function(n).
 and then plot(y,2,50), it doesnt give me anything meaningful, in fact, it 
 comes up with a message saying something like in gamma(n+1) ploted or 
 something along those lines.

 Can anyone please help?

 thanks you

 _
 Looking to change your car this year? Find car news, reviews and more
 http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F859641_t=762955845_r=tig_OCT07_m=EXT
[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can you tell what .Random.seed was?

2009-05-15 Thread Stavros Macrakis

On Thu, May 14, 2009 at 3:36 PM, G. Jay Kerns gke...@ysu.edu wrote:
 set.seed(something)
 x - rnorm(100)
 y - runif(500)
 # bunch of other stuff
...
 Now, I give you a copy of my script.R (with the set.seed statement
 removed, of course) together with the .RData file that was generated
 by the save.image() command.
...
 1) can you tell me what my original set.seed() value was?...
 2) is it possible *in principle* to figure out what set.seed was,
 given the above?

Set.seed takes an integer argument, that is, 2^32-1 distinct values
(cf NA_integer_), so the very simplest approach, brute-force search,
has a hope of working:

whatseed - function (v)  {
   i - as.integer(-2^31+1); max - as.integer(2^31-1)
   while (imax) { set.seed(i); if (runif(1)==v) return(i); i-i+1 }
}

 (OK, being able to figure it out in 2*10^68 years
 doesn't count, but within a couple months is acceptable.)

set.seed(-2^31+10)
system.time(whatseed(runif(1)))
   user  system elapsed
   1.530.001.53

2^32*(1.53/10)/3600
= 18.25
18 hours

 3) does the answer change if there is a
 remove(.Random.seed)
 command right before the save.image() command?

Depending on which RNG algorithm (RNGkind) you use, there may be
cryptographic techniques that are more efficient than brute-force
search, especially if the full internal state (.Random.seed) is
preserved.

This all assumes that the seed is set *only* with set.seed.  If
.Random.seed is modified directly, there are many more possibilities
for most of the RNGs.

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can you tell what .Random.seed was?

2009-05-15 Thread Stavros Macrakis

On Fri, May 15, 2009 at 12:07 PM, Stavros Macrakis
macra...@alum.mit.edu wrote:
 system.time(whatseed(runif(1)))

Sorry, though I got lucky and my overall result is roughly correct,
this is an incorrect time measure.  It should be

r - runif(1); system.time(whatseed(r))

because R's call-by-need semantics don't evaluate the runif before it
starts running whatseed.  The correct time (on my machine) is then 28
hours, not 18.

Better to avoid side-effect functions as arguments

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Converting numbers to and from raw

2009-05-15 Thread Stavros Macrakis

How can I convert an integer or double to and from their internal
representation as raws of length 4 and 8?

The following works for positive integers (including those represented
as floats):

# Convert integer (represented as integer or double) to sequence
# of raw bytes, least-significant byte first.
# intToRaw(0) = raw(0)
# intToRaw(17^9) = 91 64 63 9c 1b
# intToRaw(2^60/3) = 40 55 55 55 55 55 55 05 (note effect of finite precision)

intToRaw - function(x, n=max(0,floor(log(x)/log(256)+1))) {
  stopifnot(x=0)
  suppressWarnings(
   as.raw( floor( x / 2^(8*seq(0,length=n)) ) %% 256))
  }

but I'd think there was a simpler version that just casts the integer
as a bytestring internally (for type integer at least).  Also, of
course, it doesn't help for getting the bit-pattern of a double.

   -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Inconsistency in representation of variables

2009-05-11 Thread Stavros Macrakis

In stats::D, I was wondering why variables are represented as symbols
in expressions, but as strings in lists of variables:

D(quote(x^2),x) = 2*x
D(quote(x^2),quote(x)) = error Variable must be a character string

Strings are not allowed in the expression to denote variables:

D(quote(x),quote(x)) == D(k,x) = NA  (why not an error?)

-s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beyond double-precision?

2009-05-11 Thread Stavros Macrakis

On Sat, May 9, 2009 at 12:17 PM, Berwin A Turlach
ber...@maths.uwa.edu.au wrote:
 log(H) = log(n) - log( 1/x_1 + 1/x_2 + ... + 1/x_n)
...But we need to calculate the logarithm of a sum from the logarithms of the 
individual terms.

 ...The way to calculate log(x+y) from lx=log(x) and ly=log(y) ...
  max(lx,ly) + log1p(exp(-abs(lx-ly)))

Agreed completely so far. But instead of calculating the logsum
pairwise, you can do it all in one go, which is both more efficient
and more accurate.

Here are some timing and accuracy measurements of the one-shot logsum
compared to the loop and the Reduce versions. (Full code at the bottom
of this email.) The vector version is much faster and much more
accurate in general.  There must be cases where the log1p method
increases accuracy, but I couldn't find them.

-s

Large examples to test accuracy and speed

Test case: runif(1e+06)
  function. timeerror
1logsum 0.22 9.31e-16
2  logsum_s 0.15 9.31e-16
3  logsum_r 9.75 3.10e-13

Test case: rexp(1e+06)
  function.  time error
1logsum  0.21 -1.40e-15
2  logsum_s  0.15 -1.40e-15
3  logsum_r 10.13 -1.38e-14

Test case: abs(rnorm(1e+06))
  function.  time error
1logsum  0.24 -4.38e-16
2  logsum_s  0.14 -4.38e-16
3  logsum_r 10.01 -8.74e-14

Test case: rep(1, 1e+05)
  function. timeerror
1logsum 0.01 1.46e-16
2  logsum_s 0.01 1.46e-16
3  logsum_r 0.96 6.24e-14

Test case: rep(10^-(1:10), each = 1)
  function. time error
1logsum 0.02  6.14e-16
2  logsum_s 0.01  6.14e-16
3  logsum_r 0.95 -6.96e-12

More accurate even for small cases

Test case: 1:100
  function. time error
1logsum0 -3.60e-16
2  logsum_s0 -3.60e-16
3  logsum_r0  3.24e-15

Test case: abs(rnorm(100))
  function. time error
1logsum0 -3.48e-16
2  logsum_s0 -3.48e-16
3  logsum_r0 -2.09e-15


##
# Fast, accurate sum in log space
#
logsum - function(l) {
 maxi - which.max(l)
 maxl - l[maxi]
 maxl + log1p(sum(exp(l[-maxi]-maxl))) }
##

##
# Simpler, perhaps less accurate sum in log space
#
logsum_s - function(l) {
 maxl - max(l)
 maxl + log(sum(exp(l-maxl))) }
##



# Pairwise reduction
logsum_r - function(x) Reduce( function(lx, ly) max(lx, ly) +
log1p(exp(-abs(lx-ly))), x )


function_names - c(logsum,logsum_s,logsum_r)

logsum_test - function(l) {
  cat(\nTest case:,deparse(substitute(l)),\n)
  realsum - sum(l)
  logl - log(l)
  results - times - list()
  lapply( function_names, function(f) times[[f]] - system.time(
results[[f]] - getFunction(f)(logl))[1])
  data.frame(`function`=function_names,
 time=as.numeric(times),
 error=(exp(as.numeric(results))-realsum)/realsum )
}

set.seed(1)

cat(\n\nLarge examples to test accuracy and speed\n\n)
logsum_test(runif(100))
logsum_test(rexp(100))
logsum_test(abs(rnorm(100)))
logsum_test(rep(1,10))
logsum_test(rep(10^-(1:10),each=1))

cat(\n\nMore accurate even for small cases\n\n)
logsum_test(1:100)
logsum_test(abs(rnorm(100)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] integrate lgamma from 0 to Inf

2009-04-27 Thread Stavros Macrakis

On Wed, Apr 22, 2009 at 3:28 AM, Andreas  Wittmann
andreas_wittm...@gmx.de wrote:
 i try to integrate lgamma from 0 to Inf.

Both gamma and log are positive and monotonically increasing for large
arguments.

What can you conclude about the integrability of log(gamma(x))?

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] The assign(paste(...,i),...) idiom

2009-04-20 Thread Stavros Macrakis

Judging from the traffic on this mailing list, a lot of R beginners
are trying to write things like

  assign( paste( myvar, i), ...)

where they really should probably be writing

  myvar[i] - ...

Do we have any idea where this bizarre habit comes from?

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2009-04-18 Thread Stavros Macrakis

 mylist - c( 2,1,3,5,4 ) make a vector of numbers
 sort(mylist)
[1] 1 2 3 4 5in sorted order

 mylist - c( this, is, a, test)
 sort(mylist)
[1] ais   test this   in sorted order
 order(mylist)
[1] 3 2 4 1  original positions, e.g. mylist[3] is a


On Sat, Apr 18, 2009 at 10:46 AM, Dan Cary daniel_c...@hotmail.co.uk wrote:
 ...all i want to know is how to arrange a set of numbers in size order 
 without putting them in a table. just arranging them from for e.g. 2,1,3,5,4 
 into 1,2,3,4,5 - it must be simple but i cant find how to do it anywhere

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loop question

2009-04-18 Thread Stavros Macrakis

On Fri, Apr 17, 2009 at 10:12 PM, Brendan Morse morse.bren...@gmail.com wrote:
 ...I would like to automatically generate a series of matrices and
 give them successive names. Here is what I thought at first:

 t1-matrix(0, nrow=250, ncol=1)

 for(i in 1:10){
        t1[i]-rnorm(250)
 }

 What I intended was that the loop would create 10 different matrices with a
 single column of 250 values randomly selected from a normal distribution,
 and that they would be labeled t11, t12, t13, t14 etc.

Very close.  But since you've started out with a *matrix* t1, your
assignments to t1[i] will assign to parts of the matrix.  To correct
this, all you need to do is initialize t1 as a *list of matrices* or
(even better) as an *empty list*, like this:

   t1 - list()

and then assign to *elements* of the list (using [[ ]] notation), not
to *sublists* of the list (which is what [ ] notation means in R),
like this:

for(i in 1:10){
   t1[[i]] - rnorm(250)
}

Is that what you had in mind?

   -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using trace

2009-04-17 Thread Stavros Macrakis

Well, yes, of course I could add the code to the function by hand.  I
could also calculate square roots by hand.  But -- as in every other
basic programming environment -- there exists an R function 'trace'
which appears to automate the process, and I can't figure out how to
use it to handle this most elementary and standard case.  Clearly I'm
missing something.

  -s

On Thu, Apr 16, 2009 at 9:26 PM, ronggui ronggui.hu...@gmail.com wrote:
 Can you just print what you need to know? For example:

 fact - function(x) {
 + if(x1) ans - 1 else ans - x*fact(x-1)
 + print(sys.call())
 + cat(sprintf(X is %i\n,x))
 + print(ans)
 + }
 fact(4)
 fact(x - 1)
 X is 0
 [1] 1
 fact(x - 1)
 X is 1
 [1] 1
 fact(x - 1)
 X is 2
 [1] 2
 fact(x - 1)
 X is 3
 [1] 6
 fact(4)
 X is 4
 [1] 24


 2009/4/13 Stavros Macrakis macra...@alum.mit.edu:
 I would like to trace functions, displaying their arguments and return
 value, but I haven't been able to figure out how to do this with the
 'trace' function.

 After some thrashing, I got as far as this:

    fact - function(x) if(x1) 1 else x*fact(x-1)
    tracefnc - function() dput(as.list(parent.frame()),  #
 parent.frame() holds arg list
                                                control=NULL)
    trace(fact,tracer=tracefnc,print=FALSE)

 but I couldn't figure out how to access the return value of the
 function in the 'exit' parameter.  The above also doesn't work for
 ... arguments.  (More subtly, it forces the evaluation of promises
 even if they are otherwise unused -- but that is, I suppose, a weird
 and obscure case.)

 Surely someone has solved this already?

 What I'm looking for is something very simple, along the lines of
 old-fashioned Lisp trace:

 defun fact (i) (if ( i 1) 1 (* i (fact (+ i -1)
 FACT
 (trace fact)
 (FACT)
 (fact 3)
  1 (FACT 3)
    2 (FACT 2)
      3 (FACT 1)
        4 (FACT 0)
        4 (FACT 1)
      3 (FACT 1)
    2 (FACT 2)
  1 (FACT 6)
 6

 Can someone help? Thanks,

         -s

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 HUANG Ronggui, Wincent
 PhD Candidate
 Dept of Public and Social Administration
 City University of Hong Kong
 Home page: http://asrr.r-forge.r-project.org/rghuang.html


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using trace

2009-04-17 Thread Stavros Macrakis

Yes, that is similar to the solution in my original posting, but
doesn't solve the problem I was having with that solution, namely
reporting on the return value.

   -s

On Fri, Apr 17, 2009 at 11:28 AM, ronggui ronggui.hu...@gmail.com wrote:
 Here is a partial solution:

 trace(fact,quote({cat(sprintf(x= %i\n,x));return}),print=T)
 [1] fact
 fact(4)
 Tracing fact(4) on entry
 x= 4
 Tracing fact(x - 1) on entry
 x= 3
 Tracing fact(x - 1) on entry
 x= 2
 Tracing fact(x - 1) on entry
 x= 1
 Tracing fact(x - 1) on entry
 x= 0
 [1] 24


 2009/4/17 Stavros Macrakis macra...@alum.mit.edu:
 Well, yes, of course I could add the code to the function by hand.  I
 could also calculate square roots by hand.  But -- as in every other
 basic programming environment -- there exists an R function 'trace'
 which appears to automate the process, and I can't figure out how to
 use it to handle this most elementary and standard case.  Clearly I'm
 missing something.

              -s

 On Thu, Apr 16, 2009 at 9:26 PM, ronggui ronggui.hu...@gmail.com wrote:
 Can you just print what you need to know? For example:

 fact - function(x) {
 + if(x1) ans - 1 else ans - x*fact(x-1)
 + print(sys.call())
 + cat(sprintf(X is %i\n,x))
 + print(ans)
 + }
 fact(4)
 fact(x - 1)
 X is 0
 [1] 1
 fact(x - 1)
 X is 1
 [1] 1
 fact(x - 1)
 X is 2
 [1] 2
 fact(x - 1)
 X is 3
 [1] 6
 fact(4)
 X is 4
 [1] 24


 2009/4/13 Stavros Macrakis macra...@alum.mit.edu:
 I would like to trace functions, displaying their arguments and return
 value, but I haven't been able to figure out how to do this with the
 'trace' function.

 After some thrashing, I got as far as this:

    fact - function(x) if(x1) 1 else x*fact(x-1)
    tracefnc - function() dput(as.list(parent.frame()),  #
 parent.frame() holds arg list
                                                control=NULL)
    trace(fact,tracer=tracefnc,print=FALSE)

 but I couldn't figure out how to access the return value of the
 function in the 'exit' parameter.  The above also doesn't work for
 ... arguments.  (More subtly, it forces the evaluation of promises
 even if they are otherwise unused -- but that is, I suppose, a weird
 and obscure case.)

 Surely someone has solved this already?

 What I'm looking for is something very simple, along the lines of
 old-fashioned Lisp trace:

 defun fact (i) (if ( i 1) 1 (* i (fact (+ i -1)
 FACT
 (trace fact)
 (FACT)
 (fact 3)
  1 (FACT 3)
    2 (FACT 2)
      3 (FACT 1)
        4 (FACT 0)
        4 (FACT 1)
      3 (FACT 1)
    2 (FACT 2)
  1 (FACT 6)
 6

 Can someone help? Thanks,

         -s

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 HUANG Ronggui, Wincent
 PhD Candidate
 Dept of Public and Social Administration
 City University of Hong Kong
 Home page: http://asrr.r-forge.r-project.org/rghuang.html





 --
 HUANG Ronggui, Wincent
 PhD Candidate
 Dept of Public and Social Administration
 City University of Hong Kong
 Home page: http://asrr.r-forge.r-project.org/rghuang.html


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection of two sets of intervals

2009-04-15 Thread Stavros Macrakis

There is a very nice intervals package in CRAN.  It is impressively
efficient even for intersections of many millions of intervals.  If I
remember correctly, it is purely in-core, so on a 32-bit R you'll be
limited to something like 100 million intervals.  Is that enough for
your application?

  -s

On Wed, Apr 15, 2009 at 8:59 AM, Thomas Meyer t...@cornell.edu wrote:
 Hi,

 Algorithm question: I have two sets of intervals, where an interval is an
 ordered pair [a,b] of two numbers. Is there an efficient way in R to
 generate the intersection of two lists of same?

 For concreteness: I'm representing a set of intervals with a data.frame:

 list1 = as.data.frame(list(open=c(1,5), close=c(2,10)))
 list1
  open close
 1    1     2
 2    5    10

 list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10)))
 list2
  open close
 1  1.5   2.5
 2  3.0  10.0

 How do I get the intersection which would be something like:
  open close
 1  1.5   2.0
 2  5.0  10.0

 I wonder if there's some ready-built functionality that might help me out.
 I'm new to R and am still learning to vectorize my code and my thinking. Or
 maybe there's a package for interval arithmetic that I can just pull off the
 shelf.

 Thanks,

 -tom

 --
 Thomas Meyer

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automating object creation

2009-04-14 Thread Stavros Macrakis

It is certainly possible to create x2, x4, etc. using something like
assign( sprintf(x%d,i), ...value... ).

But are you sure you need separate *variables* x2, x4, etc.?  Why not
create a list of vectors addressible as x[2] etc.?

You can do that with x - list() (to define the data type of x as
allowing generic objects) then x[2] - ... value ... etc.

-s

On Tue, Apr 14, 2009 at 1:32 PM, Zachary Patterson
zak.patter...@gmail.com wrote:
 I am new to R. I would like to automate the creation of a number of
 vectors but can't seem to get the string formatting to work.

 Here's what I would like to be able to do:

 Suppose we have a vector:
 x - c(2,4,5)

 I would like to be able to create a set of vectors whose names are
 associated with the values in x - e.g.

 x2 - 0
 x4 - 0
 x5 - 0

 I have tried with a for loop and eval and sprintf, paste, etc. but end
 up with the following error:

 Error in sprintf(%s%i, x, 1) - 0 :
  target of assignment expands to non-language object

 How can I assign a string formatted name to a vector?

 Any help appreciated,
 Zak

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Forcing the extrapolation of loess through the origin

2009-04-14 Thread Stavros Macrakis

On Tue, Apr 14, 2009 at 1:08 PM,  jimm-pa...@gmx.de wrote:
 I'm fitting a line to my dataset. Later I want to predict missing values that 
 exceed the [min,max] interval of my empirical data, therefore I choose 
 surface=direct for extrapolation.

 l1-loess(y1~x1,span=0.1,data.frame(x=x1,y=y1),control=loess.control(surface=direct))

 In my application it is highly important that the fitted line intercepts at 
 the point of origin. Is it possible to do this in R?

Well, you could always add lots of artificial data points x=0, y=0
..., like this:

l1-loess(y1~x1,span=0.1,data.frame(x=c(rep(0,100),x1),y=c(rep(0,100),y1)),control=loess.control(surface=direct))

which will eventually drive f(0) to near 0, but surely that will
create fitting artifacts.

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Physical Units in Calculations

2009-04-13 Thread Stavros Macrakis

On Sun, Apr 12, 2009 at 11:01 PM,  bill.venab...@csiro.au wrote:
 It is, however, an interesting problem and there are the tools there to 
 handle it.  Basically you need to create a class for each kind of measure you 
 want to handle (length, area, volume, weight, and so on) and then 
 overload the arithmetic operators so that they can handle arguments of the 
 appropriate class.

I'd think it would be far simpler and cleaner to have a single
dimensioned-units class with a slot for magnitude and one for the
power of each dimension -- M, L, T are uncontroversial, pick your
system for electromagnetism and thermodynamics  Once you have
that, you have not just mass, length, and time, but also area, volume,
density, acceleration, viscosity, etc. etc.

It would of course be nice if the existing difftime class could be fit
into this, as it is currently pretty much a second-class citizen.  For
example, c of two time differences is currently a numeric vector,
losing its units (hours, days, etc.) completely.

One of the difficulties of adding units would be, I suspect, making
them work nicely with the rest of the system.  For example, although
sum is defined abstractly in terms of '+', as far as I can tell
sum.units would have to be overloaded explicitly. Similarly for mean,
cumsum, rle, var, %*%, etc. etc.

  -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding the 5th percentile

2009-04-13 Thread Stavros Macrakis

quantile( dsamp100, 0.05 )

On Mon, Apr 13, 2009 at 10:41 AM, Henry Cooper henry.1...@hotmail.co.uk wrote:
 dsamp100-coef(100,39.83,5739,2869.1,49.44)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Concatenation, was Re: Physical Units in Calculations

2009-04-13 Thread Stavros Macrakis

On Mon, Apr 13, 2009 at 5:15 AM, Peter Dalgaard
p.dalga...@biostat.ku.dk wrote:
 Stavros Macrakis wrote:
 ...c of two time differences is currently a numeric vector,
 losing its units (hours, days, etc.) completely.

 That's actually a generic feature/issue of c(). ...

 There is some potential for redesigning this, using a concat() generic which
 should do the Right Thing for all classed vector-like objects. (There is
 such a function in Splus, but I don't their data frame code is using it.)

That would be a very good thing. The current design is very confusing
and difficult to learn for new users, especially for factors.

I would be very happy to have a 'logical' concatenation as well as a
'physical' one.  For instance, I'd expect the levels of factors to be
merged: concat(factor(1:3),factor(3:4)) should be
factor(c(1,2,3,3,4)), not c(1,2,3,1,2).

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using trace

2009-04-12 Thread Stavros Macrakis

I would like to trace functions, displaying their arguments and return
value, but I haven't been able to figure out how to do this with the
'trace' function.

After some thrashing, I got as far as this:

fact - function(x) if(x1) 1 else x*fact(x-1)
tracefnc - function() dput(as.list(parent.frame()),  #
parent.frame() holds arg list
control=NULL)
trace(fact,tracer=tracefnc,print=FALSE)

but I couldn't figure out how to access the return value of the
function in the 'exit' parameter.  The above also doesn't work for
... arguments.  (More subtly, it forces the evaluation of promises
even if they are otherwise unused -- but that is, I suppose, a weird
and obscure case.)

Surely someone has solved this already?

What I'm looking for is something very simple, along the lines of
old-fashioned Lisp trace:

 defun fact (i) (if ( i 1) 1 (* i (fact (+ i -1)
FACT
 (trace fact)
(FACT)
 (fact 3)
  1 (FACT 3)
2 (FACT 2)
  3 (FACT 1)
4 (FACT 0)
4 (FACT 1)
  3 (FACT 1)
2 (FACT 2)
  1 (FACT 6)
6

Can someone help? Thanks,

 -s

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 >

1 - 100 of 234 matches

Mail list logo