from:"Charles C. Berry"

Re: [R] fitting cosine curve

2017-06-21 Thread Charles C. Berry

On Wed, 21 Jun 2017, J C Nash wrote:

Using a more stable nonlinear modeling tool will also help, but key is to get
the periodicity right.

The model is linear up to omega after transformation as Don and I noted.

Taking a guess that 2*pi/240 = 0.0262 is about right for omega:

rsq <- function(x) {t2<-t*x;summary(lm(y~cos(t2)+sin(t2)))$r.squared}
vrsq <- Vectorize(rsq)
optimise(rsq, c(0.8,1.2)*2*pi/240,maximum=TRUE)

$maximum
[1] 0.02794878

$objective
[1] 0.8127072

curve(vrsq,0.025,0.03)

Isn't this stable enough?

And as you note

plot(lm(y~cos(t*0.0279)+sin(t*0.0279)))

reveals lack-of-fit.

Of course there are some other issues not addressed here such as 
possible autoregression.

HTH,

Chuck

y=c(16.82, 16.72, 16.63, 16.47, 16.84, 16.25, 16.15, 16.83, 17.41, 17.67,
17.62, 17.81, 17.91, 17.85, 17.70, 17.67, 17.45, 17.58, 16.99, 17.10)
t=c(7,  37,  58,  79,  96, 110, 114, 127, 146, 156, 161, 169, 176, 182,
190, 197, 209, 218, 232, 240)

lidata <- data.frame(y=y, t=t)

#I use the method to fit a curve, but it is different from the real curve,
#which can be seen in the figure.
linFit  <- lm(y ~ cos(t))
library(nlsr)
#fullFit <- nls(y ~ A*cos(omega*t+C) + B,
#start=list(A=coef(linFit)[1],B=coef(linFit)[2],C=0,omega=.4))
#omega cannot be set to 1, don't know why.
fullFit <- nlxb(y ~ A*cos(omega*t+C) + B, data=lidata,
start=list(A=coef(linFit)[1],B=coef(linFit)[2],C=0,omega=.04), trace=TRUE)
co <- coef(fullFit)
fit <- function(x, a, b, c, d) {a*cos(b*x+c)+d}
plot(x=t, y=y)
curve(fit(x, a=co['A'], b=co['omega'], c=co['C'],d=co['B']), add=TRUE
,lwd=2, col="steelblue")
jstart <- list(A=20, B=100, C=0, omega=0.01)
jfit <- nlxb(y ~ A*cos(omega*t+C) + B, data=lidata,
   start=jstart, trace=TRUE)
co <- coef(jfit)
fit <- function(x, a, b, c, d) {a*cos(b*x+c)+d}
plot(x=t, y=y)
curve(fit(x, a=co['A'], b=co['omega'], c=co['C'],d=co['B']), add=TRUE
 ,lwd=2, col="steelblue")

JN

On 2017-06-21 12:06 AM, lily li wrote:

I'm trying the different parameters, but don't know what the error is:
Error in nlsModel(formula, mf, start, wts) :
   singular gradient matrix at initial parameter estimates

Thanks for any suggestions.

On Tue, Jun 20, 2017 at 7:37 PM, Don Cohen <don-r-h...@isis.cs3-inc.com>
wrote:

If you know the period and want to fit phase and amplitude, this is
equivalent to fitting a * sin + b * cos

  > >>> > I don't know how to set the approximate starting values.

I'm not sure what you meant by that, but I suspect it's related to
phase and amplitude.

  > >>> > Besides, does the method work for sine curve as well?

sin is the same as cos with a different phase
Any combination of a and b above = c * sin (theta + d) for
some value of c and d and = e * cos (theta + f) for some value
of e and f.
Also for any c,d and for any e,f there is an a,b.
the c and e are what I'm calling amplitude, the d and f are what
I'm calling phase.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry Dept of Family Medicine & Public Health
cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
http://biostat.ucsd.edu/ccberry.htm

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitting cosine curve

2017-06-20 Thread Charles C. Berry


On Tue, 20 Jun 2017, lily li wrote:


Hi R users,

I have a question about fitting a cosine curve. I don't know how to set the
approximate starting values.


See

Y.L. Tong (1976) Biometrics 32:85-94

The method is known as `cosinor' analysis.  It takes advantage of the 
*intrinsic* linearity of y = a + b * cos( omega*t - c ), when omega is 
given.


It you are scratching your head saying 'that thing is not linear', you 
need to go back to your linear models text and review what `linearity' 
actually refers to.  Also, reading the Tong paper is recomended as you 
will need the transformations given there in any case.


What you end up doing is fitting

fit <- lm(y~cos(t.times.omega)+sin(t.times.omega))

and then transforming coef(fit) to get back a, b, and c. So, you only need 
to have omega. If it is not obvious what value to use,  then that will be 
more of a challenge.


The paper gives asymptotics for the dispersion matrix of (a, b, c), I 
recall.



Besides, does the method work for sine curve as well?


Seriously? See

https://en.wikipedia.org/wiki/List_of_trigonometric_identities#Shifts_and_periodicity

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Determining which.max() within groups

2017-06-07 Thread Charles C. Berry


On Tue, 6 Jun 2017, Morway, Eric wrote:


Using the dataset below, I got close to what I'm after, but not quite all
the way there.  Any suggestions appreciated:

Daily <- read.table(textConnection(" Date  wyrQ
1911-04-01 1990 4.530695
1911-04-02 1990 4.700596
1911-04-03 1990 4.898814
1911-04-04 1990 5.097032
1911-04-05 1991 5.295250
1911-04-06 1991 6.569508
1911-04-07 1991 5.861587
1911-04-08 1991 5.153666
1911-04-09 1992 4.445745
1911-04-10 1992 3.737824
1911-04-11 1992 3.001586
1911-04-12 1992 3.001586
1911-04-13 1993 2.350298
1911-04-14 1993 2.661784
1911-04-16 1993 3.001586
1911-04-17 1993 2.661784
1911-04-19 1994 2.661784
1911-04-28 1994 3.369705
1911-04-29 1994 3.001586
1911-05-20 1994 2.661784"),header=TRUE)

aggregate(Q ~ wyr, data = Daily, which.max)

# gives:
#wyr Q
# 1 1990 4
# 2 1991 2
# 3 1992 1
# 4 1993 3
# 5 1994 2

I can 'see' that it is returning the which.max() relative to each
grouping.  Is there a way to instead return the absolute position (row) of
the max value within each group.  i.e.:

# Would instead like to have
# wyr  Q
# 1  1990  4
# 2  1991  6
# 3  1992  9
# 4  1993  15
# 5  1994  18

The icing on the cake would be to get the Julien Day corresponding to the
date on which each year's maximum occurs?




Like this:


which.max.by.wyr <- with(Daily, which( ave( Q, wyr, FUN=max) == Q))
cbind( Daily[ which.max.by.wyr, ], index=which.max.by.wyr )

 Date  wyrQ index
4  1911-04-04 1990 5.097032 4
6  1911-04-06 1991 6.569508 6
9  1911-04-09 1992 4.445745 9
15 1911-04-16 1993 3.00158615
18 1911-04-28 1994 3.36970518

If there are ties in Q and you do not want more than one max value listed, 
you can add a litle fuzz to randomly pick one. i.e.



fuzz <- runif(nrow(Daily), 0, 1e-10)
which.max.by.wyr <- with(Daily, which(ave(Q+fuzz,wyr,FUN=max)==Q+fuzz))



If you want the first tied value, then sort fuzz before determining 
which.max.by.wyr.


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data import R: some explanatory variables not showing up correctly in summary

2017-06-01 Thread Charles C. Berry


On Thu, 1 Jun 2017, Rui Barradas wrote:


Hello,

In order for us to help we need to know how you've imported your data. What 
was the file type? What instructions have you used to import it? Did you use 
base R or a package?
Give us a minimal but complete code example that can reproduce your 
situation.


Hope this helps,

Rui Barradas


Absolutely.

It would also help to see what the unique values of each column 
*really* are. To that end run and report the results of this:


lapply(your.data.frame, function(x) unique(as.character(x)))

I'll bet you have both "combination" and "combination " as values or 
something similar where two different strings look to your eye to be the 
same when printed by summary().


HTH,

Chuck



Em 01-06-2017 11:02, Tara Adcock escreveu:

Hi,

I have a question regarding data importing into R.

When I import my data into R and review the summary, some of my explanatory 
variables are being reported as if instead of being one variable, they are 
two with the same name. See below for an example;


Behav person Behav dog   Position
   **combination  : 38   combination  :  4** Bank:372
   **combination  :  7   combination  :  4**   **Island  :119**
 fast :123   fast : 15 **Island  : 11**
 slow :445   slow : 95   Land:  3
 stat :111   stat : 14   Water   :230

Also, all of the distances I have imported are showing up in the summary 
along with a line entitled "other". However, I haven't used any other 
distances?


DistanceDistance.dog
2-10m  :184 <50m   : 35
<50m   :156 2-10m  : 27
10-20m :156 20-30m : 23
20-30m : 91 30-40m : 16
40-50m : 57 10-20m : 13
**(Other): 82   (Other): 18**

I have checked my data sheet over and over again and I think standardised 
the data, but the issue keeps arising. I'm assuming I need to clean the 
data set but as a nearly complete novice in R I am not certain how to do 
this. Any help at all with this would be much appreciated. Thanks so much.


Kind Regards,

Tara Adcock.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





Charles C. Berry Dept of Family Medicine & Public Health
cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
http://biostat.ucsd.edu/ccberry.htm

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odfWeave - A loop of the "same" data

2017-06-01 Thread Charles C. Berry


On Thu, 1 Jun 2017, POLWART, Calum (COUNTY DURHAM AND DARLINGTON NHS FOUNDATION 
TRUST) via R-help wrote:

Before I go and do this another way - can I check if anyone has a way of 
looping through data in odfWeave (or possibly sweave) to do a repeating 
analysis on subsets of data?


For simplicity lets use mtcars dataset in R to explain.  Dataset looks like 
this:


mtcars

  mpg cyl disp  hp drat   wt ...
Mazda RX4 21.0   6  160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0   6  160 110 3.90 2.88 ...
Datsun 71022.8   4  108  93 3.85 2.32 ...
  

Say I wanted to have a 'catalogue' style report from mtcars, where on 
each page I would perhaps have the Rowname as a heading and then plot a 
graph of mpg highlighting that specific car


Then add a page break and *do the same for the next car*.  I can manually do 
this of course, but it is effectively a loop something like this:

for (n in length(mtcars$mpg)) {
barplot (mtcars$mpg, col=c(rep(1,n-1),2,rep(1,length(mtcars$mpg)-n)))
}

There is a odfWeave page break function so I can do that sort of thing 
(I think).  But I don't think I can output more than one image can I? 
In reality I will want several images and a table per "catalogue" page.


At the moment I think I need to create a master odt document, and create 
individual catalogue pages.  And merge them into one document - but that 
feels clunky (unless I can script the merge!)


Anyone got a better way?



For a complex template inside a loop, I'd probably do as Jeff suggests and 
use a knitr child document for ease of developing and debugging the 
template.


But for the simple case you describe I'd use a brew script to
unroll the loop.

You would write your input file as usual, but put a brew script in the
right place, then run brew on the input file to produce an
intermediate file that unrolls the loop, then weave the intermediate
file to get your desired result.  Here is a simple example of such you 
can run in an R session (assuming the brew package is installed) and see 
the results printed out.


--8<---cut here---start->8---

brew::brew(text="

Everything before the loop

<% for (i in 1:10) { %>
Print the value of i
<% print(i) %> or better yet
\\Sexpr{<%= i %>}
<% } %>

everything after

")

--8<---cut here---end--->8---

The double backslash is needed in the literal string used here.  If
you put that script in a file using an editor, you would just use a
single backslash.

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make sure a data frame has been "fun through" a function

2017-02-21 Thread Charles C. Berry


On Tue, 21 Feb 2017, stephen sefick wrote:


Sorry for not being clear. I have never used S3 methods before. Below is
some R code that sketches out my idea. Is this a sensible solution?



Sure. See comments (untested) inline.

Chuck


test_data <- data.frame(a=1:10, b=1:10, c=1:10)

functionA <- function(x, impossible_genotype){
   ##some data processing
   y <- x

   ##return S3 to be able to use impossible genotype later
   class(y) <- append(class(y),"genotypes")


 class(y) <- c("genotypes",class(y))



   attr(y, "impossible_genotype") <- impossible_genotype

   return(y)
}

test_data_genotypes <- functionA(test_data, impossible_genotype="Ref")

functionB <- function(x){
   ##stop if pre-processed with functionA
   if(sum(class(x)=="genotypes")!=1){stop("Need to pre-process data with
functionA")}


if(!(inherits("genotypes")){
stop("Need to pre-process data with functionA")}


or in functionA you could skip the class()<- and just set the
"impossible_genotypes" attribute to FALSE when there are none such.

Then here test

 if (is.null(attr(x,"impossible_genotypes"))){
stop("Need to pre-process data with functionA")
} else {
return(alleles)
}




   ##use this later in functionB to
   impossible_genotype <- attributes(x)$impossible_genotype


 impossible_genotype <- attr(x,"impossible_genotype")


   alleles <- c("Ref", "Alt")

   coded_genotype <- alleles[alleles!=impossible_genotype]


 maybe `!is.element(alleles,impossible_genotype)' is safer than `!='





   return(coded_genotype)
}

##stop if not pre-processed with functionA
functionB(test_data)

##processed with functionA
functionB(test_data_genotypes)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make sure a data frame has been "fun through" a function

2017-02-20 Thread Charles C. Berry


On Mon, 20 Feb 2017, stephen sefick wrote:


Hello,

I would like to add something to a data frame that is 1) invisible to the
user, 2) has no side effects, and 3) I can test for in a following
function. Is this possible? I am exploring classes and attributes and I
have thought about using a list (but 1 and 2 not satisfied). Any help would
be greatly appreciated.



Depends on exactly what you mean by `invisible' and `side effects'.

You can do this (but I am not necessarily recommending this):


add.stuff <- function(x,...){

+ class(x)<- c("more.stuff",class(x))
+ attr(x,"stuff")<- list(...)
+ x}




And printing and model functions will be unaffected:


df <- data.frame(a=1:3,b=letters[1:3])
df2 <- add.stuff(df,comment="wow", length="3 rows")
df2

  a b
1 1 a
2 2 b
3 3 c

attr(df2,"stuff")

$comment
[1] "wow"

$length
[1] "3 rows"


all.equal(lm(a~b,df),lm(a~b,df2)) # only call should differ

[1] "Component “call”: target, current do not match when deparsed"




And if you need some generics to take account of the "stuff" attribute, 
you can write the methods to do that.


---

Another solution is to put your data.framne in a package and then have 
other objects hold the 'stuff' stuff. Once your package is loaded or 
imported, the user will have access to the data in a way that might be 
said to be `invisible' in ordinary usage.


---

But seriously, you should say *why* you want to do this. There are 
probably excellent solutions that do not involve directly altering the 
data.frame and may not involve putting together a package.


HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [ESS] feature request : completion of available packages name

2017-01-03 Thread Charles C. Berry


On Tue, 3 Jan 2017, Martin Maechler wrote:


On Tue, Jan 3, 2017 at 6:30 PM, Samuel BARRETO
 wrote:

Hi,

Do you think it would be difficult to add some kind of completion backend
to complete the package names when typing `library(` ?


[deleted]



More seriously: We do something like that already in ESS for quite a
while now but I don't recall the details.


I suspect Martin knows this, but ...

C-c C-e C-l

runs ess-load-library.

With `ido' enabled, it puts a list in the minibuffer where fuzzy matching 
will narrow it down to the package one wants. Hitting TAB will 
display all the package names that fuzzily match in a completion buffer.


Thanks for reminding me of the nifty feature!

Chuck

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help

Re: [R] MC_CORES and mc.cores for parallel package

2016-12-07 Thread Charles C. Berry


On Wed, 7 Dec 2016, Marc Girondot via R-help wrote:


Hi,

From the documentation of ?options

Options set in package parallel
These will be set when package parallel (or its namespace) is loaded if not 
already set.


mc.cores:
a integer giving the maximum allowed number of additional R processes allowed 
to be run in parallel to the current R process. Defaults to the setting of 
the environment variable MC_CORES if set. Most applications which use this 
assume a limit of 2 if it is unset.




As advertised.

- Start R with no MC_CORES specified:
- check environment var
- set environment var
- check options
- THEN load parallel
- check option again

 Sys.getenv("MC_CORES")
[1] ""

Sys.setenv("MC_CORES"=3L)
options("mc.cores")

$mc.cores
NULL


library(parallel)
options("mc.cores")

$mc.cores
[1] 3

---

I think you confused things by loading parallel *before* setting the 
environment var.


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Frequency of a character in a string

2016-11-14 Thread Charles C. Berry


On Mon, 14 Nov 2016, Marc Schwartz wrote:




On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccbe...@ucsd.edu> wrote:

On Mon, 14 Nov 2016, Bert Gunter wrote:


[stuff deleted]


Hi,

Both gsub() and strsplit() are using regex based pattern matching 
internally. That being said, they are ultimately calling .Internal code, 
so both are pretty fast.


For comparison:

## Create a 1,000,000 character vector
set.seed(1)
Vec <- paste(sample(letters, 100, replace = TRUE), collapse = "")


nchar(Vec)

[1] 100

## Split the vector into single characters and tabulate

table(strsplit(Vec, split = "")[[1]])


   a b c d e f g h i j k l
38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184 38621
   m n o p q r s t u v w x
38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904 38310
   y z
38265 38299


## Get just the count of "a"

table(strsplit(Vec, split = "")[[1]])["a"]

   a
38664


nchar(gsub("[^a]", "", Vec))

[1] 38664


## Check performance

system.time(table(strsplit(Vec, split = "")[[1]])["a"])

  user  system elapsed
 0.100   0.007   0.107


system.time(nchar(gsub("[^a]", "", Vec)))

  user  system elapsed
 0.270   0.001   0.272


So, the above would suggest that using strsplit() is somewhat faster 
than using gsub(). However, as Chuck notes, in the absence of more 
exhaustive benchmarking, the difference may or may not be more 
generalizable.



Whether splitting on fixed strings rather than treating them as
regex'es (i.e.`fixed=TRUE') makes a big difference seems to depend on
what you split:

First repeating what Marc did...


system.time(table(strsplit(Vec, split = "",fixed=TRUE)[[1]])["a"])

   user  system elapsed
  0.132   0.010   0.139 

system.time(table(strsplit(Vec, split = "",fixed=FALSE)[[1]])["a"])

   user  system elapsed
  0.130   0.010   0.138

... fixed=TRUE hardly matters. But the idiom I proposed...


system.time(sum(lengths(strsplit(paste0("X", Vec, "X"),"a",fixed=TRUE)) - 1))

   user  system elapsed
  0.017   0.000   0.018 

system.time(sum(lengths(strsplit(paste0("X", Vec, "X"),"a",fixed=FALSE)) - 1))

   user  system elapsed
  0.104   0.000   0.104




... is 5 times faster with fixed=TRUE for this case.

This result matchea Marc's count:


sum(lengths(strsplit(paste0("X", Vec, "X"),"a",fixed=FALSE)) - 1)

[1] 38664




Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Frequency of a character in a string

2016-11-14 Thread Charles C. Berry


On Mon, 14 Nov 2016, Bert Gunter wrote:


Yes, but it need some help, since nchar gives the length of the
*entire* string; e.g.

## to count "a" 's  :


x <-(c("abbababba","bbabbabbaaaba"))
nchar(gsub("[^a]","",x))

[1] 4 6

This is one of about 8 zillion ways to do this in base R if you don't
want to use a specialized package.

Just for curiosity: Can anyone comment on what is the most efficient
way to do this using base R pattern matching?



Most efficient? There probably is no uniformly most efficient way to do 
this as the timing will depend on the distribution of "a" in the atoms of 
any vector as well as the length of the vector.


But here is one way to avoid the regular expression matching:

lengths(strsplit(paste0("X", x, "X"),"a",fixed=TRUE)) - 1


Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Alternative to apply in base R

2016-11-08 Thread Charles C. Berry


On Tue, 8 Nov 2016, Doran, Harold wrote:


Without reaching out to another package in R, I wonder what the best way is to 
speed enhance the following toy example? Over the years I have become very 
comfortable with the family of apply functions and generally not good at 
finding an improvement for speed.

This toy example is small, but my real data has many millions of rows 
and the same operations is repeated many times and so finding a less 
expensive alternative would be helpful.


mm <- matrix(rnorm(100), ncol = 10)
rn <- apply(mm, 1, prod)



If the real example has only 10 columns, try this:


y <- mm[,1]
for (i in 2:10) y[] <- y*mm[,i]
all.equal(y,rn)


If it has many more columns, I would `reach out' to the inline package and 
write 3 lines of C or Fortran to do the operation.


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dealing with -Inf in a maximisation problem.

2016-11-06 Thread Charles C. Berry


On Mon, 7 Nov 2016, Rolf Turner wrote:


On 07/11/16 13:07, William Dunlap wrote:

Have you tried reparameterizing, using logb (=log(b)) instead of b?


Uh, no.  I don't think that that makes any sense in my context.

The "b" values are probabilities and must satisfy a "sum-to-1" constraint. 
To accommodate this constraint I re-parametrise via a "logistic" style 
parametrisation --- basically


  b_i = exp(z_i)/[sum_j exp(z_j)], j = 1, ... n

with the parameters that the optimiser works with being z_1, ..., z_{n-1} 
(and with z_n == 0 for identifiability).  The objective function is of the 
form sum_i(a_i * log(b_i)),



This is sum_i(a_i * z_i) - sum(a_i)*log(sum_j(exp(z_j)), isn't it?

So you don't need to evaluate b_i here, do you?

Large values of z_j will lead to exp(z_j) == Inf, but using

sum_i(a_i * (z_i-max.z)) - sum(a_i)*log(sum_j(exp(z_j-max.z))

will handle that.

HTH,

Chuck

p.s. Regarding "advice from younger and wiser heads", I probably cannot 
claim to be either.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function ave() with seq_along returning char sequence instead of numeric

2016-11-01 Thread Charles C. Berry


On Mon, 31 Oct 2016, Jeff Newmiller wrote:

The help page describes the first argument x as a numeric... it is not 
designed to accept character,


Actually it is so designed, but not advertised as such. See below.


so the fact that you get anything even close to right is just a bonus.

As the doctor says, "if it hurts, don't do that".

ave( rep( 1, length( v ), v, FUN=seq_along )
--


[snip]


Reading the code of `ave` and then `split<-.default`, you will see subset 
replacement, "x[i]<- ...", on the argument 'x'. So, the issue is having 
FUN and that replacement (and possible coercion) yield something 
useful/sensible. In other words, class(x) need not be "numeric".


For instance, operating on "Date" objects:


# start at 2016-01-02, step 10 days, ...
x <- as.Date("2016-01-01")+seq(1,1000,by=10)
z <- rep(1:10, 10)
 class(ave(x,z)) # Date class is preserved

[1] "Date"

ave(x,z) # mean date
  [1] "2017-03-27" "2017-04-06" "2017-04-16" "2017-04-26" ... 

ave(x,z,FUN=min) # earliest date

  [1] "2016-01-02" "2016-01-12" "2016-01-22" "2016-02-01" ...

However, trying to describe this feature in the help page without a lot of 
detail and examples might confuse more users than it would enlighten.


--
Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Escaping quotes WAS: Storing long string with white space in variable

2016-10-19 Thread Charles C. Berry


On Wed, 19 Oct 2016, g.maub...@weinwolf.de wrote:


Hi All,

I would like to store a long string with white space in a variable:

-- cut --
 # Create README.md
 readme <- "---
title: "Your project title here"
author: "Author(s) name(s) here"
date: "Current date here"
output: html_document
---


[snip]


"
cat(readme, file = "README.md")

-- cut --

I am looking for an equivalent to Pythons """  """ long string feature.



Whitespace isn;t the issue here. Embedded quotes are the problem.

See

?Quotes

In this case, if your replace the first and last double quotes with single 
quotes, you will get a single string of 464 characters, which is what you 
want.


In other cases where you have embedded single quotes, you can escape them 
and the double quotes and get what you want.


[snip]


PS: This is a template for a project folder for each project. I would like
to create it with R script instead of distributing it as a template file.
This way one needs only the R script to setup a project like this:



[rest deleted]

There are templating procedures available. Package brew might be worth a 
look. It can take a template string with embedded R code and return 
a string with the R results spliced in. Package knitr has some ability to 
deal with templates, too, and I think there are other packages.


Jeff Newmiller's suggestion to make a package seems right to me whether 
you use brew or not. Long term it will probably be the easiest path.


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Looping through data tables (or data frames) by removing previous individuals

2016-10-03 Thread Charles C. Berry


On Mon, 3 Oct 2016, Frank S. wrote:


Dear R users,




[deleted]

I want to get a list of "k" data tables (or data frames) so that each 
contains those individuals who for the first time are at least 65, 
looping on each of the dates of vector "v". Let's consider the following 
example with 5 individuals:



dt <- data.table(
  id = 1:5,
  fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", 
"1943-12-31")),
  sex = as.factor(rep(c(0, 1), c(2, 3)))
  )

v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4


I would expect to obtain k=4 data tables so that:
dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on 
v[3])
dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on 
v[4])




Here is a start (using a data.frame for dt):


vp <- as.POSIXlt( c( as.Date("1000-01-01"), v ))
vp$year <- vp$year-65
dt.cut <- as.numeric(cut(as.POSIXlt(dt$fborn),vp))
split(dt,factor(dt.cut, 1:length(v)))

$`1`
  id  fborn sex
1  1 1935-07-25   0

$`2`
[1] idfborn sex
<0 rows> (or 0-length row.names)

$`3`
  id  fborn sex
2  2 1942-10-05   0
3  3 1942-09-07   1

$`4`
  id  fborn sex
4  4 1943-09-07   1
5  5 1943-12-31   1


See
  ?as.POSIXlt
  ?cut.POSIXt
  ?split

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-03 Thread Charles C. Berry


On Sat, 3 Sep 2016, Bert Gunter wrote:


Chuck et. al.:

As I said previously, my intuition about the relative efficiency of
tapply() and duplicated() in the context of this thread was wrong.


My `intuition' was wrong, too.

But tapply() uses split() which runs quite fast. So not a big surprise, 
but if you look thru tapply() you'll notice it is well crafted in other 
ways. In particular, the way the `f' arg of split is constructed makes a 
big difference in timing (using a for loop to build up a mixed radix 
number). In fact interaction(f,g) needs about 3 times the time of 
tapply(f,list(f,g)) for just building an index.


Thanks for following up.

Best,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread Charles C. Berry


On Fri, 2 Sep 2016, Bert Gunter wrote:
[snip]


The "trick" is to use tapply() to select the necessary row indices of
your data frame and forget about all the do.call and rbind stuff. e.g.



I agree the way to go is "select the necessary row indices" but I get 
there a different way. See below.



set.seed(1001)
df <- data.frame(f =factor(sample(LETTERS[1:4],100,rep=TRUE)),

+  g <- factor(sample(letters[1:6],100,rep=TRUE)),
+  y = runif(100))


ix <- seq_len(nrow(df))

ix <- with(df,tapply(ix,list(f,g),function(x)x[length(x)]))
ix

  a  b   c  d  e  f
A 94 69 100 59 80 87
B 89 57  65 90 75 88
C 85 92  86 95 97 62
D 47 73  72 74 99 96



  jx <- which( !duplicated( df[,c("f","g")], fromLast=TRUE ))

  xtabs(jx~f+g,df[jx,]) ## Show equivalence to Bert's `ix'

   g
f a   b   c   d   e   f
  A  94  69 100  59  80  87
  B  89  57  65  90  75  88
  C  85  92  86  95  97  62
  D  47  73  72  74  99  96


Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models

2016-08-08 Thread Charles C. Berry


On Mon, 8 Aug 2016, Ellis, Alicia M wrote:

I have a large dataset with ~500,000 columns and 1264 rows.  Each column 
represents the percent methylation at a given location in the genome. 
I need to run 500,000 linear models for each of 4 predictors of interest 
in the form of:



Methylation.stie1 ~ predictor1 + covariate1+ covariate2 + ... covariate9
...and save only the pvalue for the predictor

The original methylation data file had methylation sites as row labels 
and the individuals as columns so I read the data in chunks and 
transposed it so I now have 5 csv files (chunks) with columns 
representing methylation sites and rows as individuals.


I was able to get results for all of the regressions by running each 
chunk of methylation data separately on our supercomputer using the code 
below.


This sounds like a problem for my old laptop, not a supercomputer.

You might want to review the algebra and geometry of least squares.

In particular, covariate1 ... covariate9 are the same 1264 x 9 matrix for 
every problem IIUC. So, you can compute the QR decomposition for that 
matrix (and the unit vector `intercept') *once* and use it in all the 
problems.


Using that decomposition, find the residuals for the regressands and for 
`predictor1' (etc) regressors. The rest is simple least squares. You 
compute the correlation coefficient of the residuals of a regressand and 
those of a regressor, for each combination. Make a table of critical 
values for the p-value(s) you require - remember to get the degrees of 
freedom right (i.e. account for the covariates). These correlations of 
residuals are the partial correlations given the covariates, and a test on 
one of them is algebraically equal to the test on regression coefficient 
for corresponding regressand and regressor in a modelthat also includes 
those 9 covariates.


See:

 ?qr
 ?lm.fit

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make the "apply" faster

2016-07-09 Thread Charles C. Berry


On Sat, 9 Jul 2016, Debasish Pai Mazumder wrote:


I have 4-dimension array x(lat,lon,time,var)

I am using "apply" to calculate over time
new = apply(x,c(1,2,4),FUN=function(y) {length(which(y>=70))})

This is very slow. Is there anyway make it faster?


If dim(x)[3] << prod(dim(x)[-3]),

new <-  Reduce("+",lapply(1:dim(x)[3],function(z) x[,,z,]>=70))

will be faster.

However, if you can follow Peter Langfelder's suggestion to use rowSums, 
that would be best. Even using rowSums(aperm(x,c(1,2,4,3)>=70,dims=3) and 
paying the price of aperm() might be better.


Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to identify runs or clusters of events in time

2016-07-01 Thread Charles C. Berry



See below

On Fri, 1 Jul 2016, Mark Shanks wrote:


Hi,


Imagine the two problems:


1) You have an event that occurs repeatedly over time. You want to 
identify periods when the event occurs more frequently than the base 
rate of occurrence. Ideally, you don't want to have to specify the 
period (e.g., break into months), so the analysis can be sensitive to 
scenarios such as many events happening only between, e.g., June 10 and 
June 15 - even though the overall number of events for the month may not 
be much greater than usual. Similarly, there may be a cluster of events 
that occur from March 28 to April 3. Ideally, you want to pull out the 
base rate of occurrence and highlight only the periods when the 
frequency is less or greater than the base rate.




A good place to start is:

Siegmund, D. O., N. R. Zhang, and B. Yakir. "False discovery rate
for scanning statistics." Biometrika 98.4 (2011): 979-985.

and

Aldous, David. Probability approximations via the Poisson clumping 
heuristic. Vol. 77. Springer Science & Business Media, 2013.


---

A nice illustration of how scan statistcis can be used is:

Aberdein, Jody, and David Spiegelhalter. "Have London's roads
become more dangerous for cyclists?." Significance 10.6 (2013):
46-48.




2) Events again occur repeatedly over time in an inconsistent way. 
However, this time, the event has positive or negative outcomes - such 
as a spot check of conformity to regulations. You again want to know 
whether there is a group of negative outcomes close together in time. 
This analysis should take into account the negative outcomes as well 
though. E.g., if from June 10 to June 15 you get 5 negative outcomes and 
no positive outcomes it should be flagged. On the other hand, if from 
June 10 to June 15 you get 5 negative outcomes interspersed between many 
positive outcomes it should be ignored.



I'm guessing that there is some statistical approach designed to look at 
these types of issues. What is it called?


`Scan statistic' is a good search term. `Poisson clumping', too.

What package in R implements it? I basically just need to know where to 
start.





There are some R packages.

CRAN has packages SNscan and graphscan, which sound like they 
might interest you.


My BioConductor package geneRxCluster:

http://bioconductor.org/packages/release/bioc/html/geneRxCluster.html

seeks clusters in a binary sequence as described in detail at

http://bioinformatics.oxfordjournals.org/content/30/11/1493

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Order of formula terms in model.matrix

2016-01-17 Thread Charles C. Berry


On Sun, 17 Jan 2016, Lars Bishop wrote:

I’d appreciate your help on understanding the following. 



It is not very clear to me from the model.matrix documentation, why 
simply changing the order of terms in the formula may change the number 
of resulting columns. Please note I’m purposely not including main 
effects in the model formula in this case.



IIRC, there are some heuristics involved harking back to the White Book. I 
recall there have been discussions of whether and how this could be fixed 
before on this list and or R-devel, but I cannot seem to lay my browser on 
them right now.






set.seed(1)
x1 <- rnorm(100)
f1 <- factor(sample(letters[1:3], 100, replace = TRUE))
trt <- sample(c(-1,1), 100, replace = TRUE)
df <- data.frame(x1=x1, f1=f1, trt=trt)

dim(model.matrix( ~ x1:trt + f1:trt, data = df))
[1] 100 4

dim(model.matrix(~ f1:trt + x1:trt, data = df))
[1] 100 5



By `x1:trt' I guess you mean the same thing as `I(x1*trt)'.

If you use the latter form, the issue you raise goes away.

Note that `I(some.expr)' gives you the ability to force the behavior of 
model.matrix to be exactly what you want by suitably crafting `some.expr', 
heuristics notwithstanding.


HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add sequence numbers to lines with the same ID: How can this be accomplished?

2015-10-24 Thread Charles C. Berry


On Sat, 24 Oct 2015, Bert Gunter wrote:


Rolf's solution works for the situation where all duplicated values
are contiguous, which may be what you need. However, I wondered how it
could be done if this were not the case. Below is an answer. It is not
as efficient or elegant as Rolf's solution for the contiguous case I
think; maybe someone will come up with something better.


The often underappreciated `ave' comes to mind. viz.,

ave(w,w,FUN=seq_along)
and
ave(ID,ID,FUN=seq_along)

agree with the results below.

Of course, ave(...) is just split/unsplit in guise, further our discussion 
of a month or two back.


Best,

Chuck


But I think
it works. Here's an example with code:


w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
w

[1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3

d <- 0+duplicated(w)
for(x in unique(w)){

+   i <- w==x
+   d[i]<-1+ cumsum(d[i])
+
+ }

d

[1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3

As always, corrections and/or improvements welcome.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
  -- Clifford Stoll


On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.tur...@auckland.ac.nz> wrote:

On 25/10/15 11:28, John Sorkin wrote:


I have a file that has (1) Line numbers, (2) IDs. A given ID number can
appear in more than one row. For each row with a repeated ID, I want to add
a number that gives the sequence number of the repeated ID number. The R
code below demonstrates what I want to have, without any attempt to produce
the result, as I have no idea how to accomplish my goal.


line <- c(1,2,3,4,5,6,7,8,9,10)
ID<-c(1,1,2,3,4,5,6,7,8,8)
cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain ID
8")
cbind(line,ID)
Seq <-  c(1,2,1,1,1,1,1,1,1,2)
cat("Sequence numbers within ID added to the data")
cbind(line,ID,Seq)



I *think* that

  unlist(lapply(rle(ID)$lengths,seq_len))

gives what you want.  At least it does for the given example.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




Charles C. Berry Dept of Family Medicine & Public Health
cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
http://famprevmed.ucsd.edu/faculty/cberry/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear regression with a rounded response variable

2015-10-21 Thread Charles C. Berry


On Wed, 21 Oct 2015, Ravi Varadhan wrote:

Hi, I am dealing with a regression problem where the response variable, 
time (second) to walk 15 ft, is rounded to the nearest integer.  I do 
not care for the regression coefficients per se, but my main interest is 
in getting the prediction equation for walking speed, given the 
predictors (age, height, sex, etc.), where the predictions will be real 
numbers, and not integers.  The hope is that these predictions should 
provide unbiased estimates of the "unrounded" walking speed. These 
sounds like a measurement error problem, where the measurement error is 
due to rounding and hence would be uniformly distributed (-0.5, 0.5).




Not the usual "measurement error model" problem, though, where the errors 
are in X and not independent of XB.


Look back at the proof of the unbiasedness of least squares under the 
Gauss-Markov setup. The errors in Y need to have expectation zero.


From your description (but see caveat below) this is true of walking 
*time*, but not not exactly true of walking *speed* (modulo the usual 
assumptions if they apply to time). In fact if E(epsilon) = 0 were true of 
unrounded time, it would not be true of unrounded speed (and vice versa).




Are there any canonical approaches for handling this type of a problem?


Work out the bias analytically? Parametric bootstrap? Data augmentation 
and friends?



What is wrong with just doing the standard linear regression?



Well, what do the actual values look like?

If half the subjects have a value of 5 seconds and the rest are split 
between 4 and 6, your assertion that rounding induces an error of 
dunif(epsilon,-0.5,0.5) is surely wrong (more positive errors in the 6 
second group and more negative errors in the 4 second group under any 
plausible model).



HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare two normal to one normal

2015-09-23 Thread Charles C. Berry


On Tue, 22 Sep 2015, John Sorkin wrote:


Charles,


I am not sure the answer to me question, given a dataset, how can one 
compare the fit of a model of the fits the data to a mixture of two 
normal distributions to the fit of a model that uses a single normal 
distribution, can be based on the glm model you suggest.


Well you *did* ask how to calculate the log-likelihood of a fitted normal 
density, didn't you? That is what I responded to. You can check that 
result longhand as sum( dnorm( y, y.mean, y.std , log=TRUE ) ) and get the 
same result (as long as you used ML estimates of the mean and standard 
deviation).





I have used normalmixEM to fit the data to a mixture of two normal 
curves. The model estimates four (perhaps five) parameters: mu1, sd^2 1, 
mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing 
proportion may not need to be estimated, it may be determined once once 
specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data to a 
model that contains only the mean, and estimates 2 parameters mu0 and 
sd0^2.  I am not sure that your model and mine can be considered to be 
nested. If I am correct I can't compare the log likelihood values from 
the two models. I may be wrong. If I am, I should be able to perform a 
log likelihood test with 2 (or 3, I am not sure which) DFs. Are you 
suggesting the models are nested? If so, should I use 3 or 2 DFs?


As Rolf points out there is a literature on such tests (and Googling 'test 
finite mixture' covers much of it).


Do you really want a test? If you merely want to pick a winner from two 
candidate models there are other procedures. k-fold crossvalidation 
of the loglikelihood ratio statistic seems like an easy, natural approach.


HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare two normal to one normal

2015-09-22 Thread Charles C. Berry


On Tue, 22 Sep 2015, John Sorkin wrote:



In any event, I still don't know how to fit a single normal distribution 
and get a measure of fit e.g. log likelihood.




Gotta love R:


y <- rnorm(10)
logLik(glm(y~1))

'log Lik.' -17.36071 (df=2)

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple if function

2015-09-17 Thread Charles C. Berry


On Thu, 17 Sep 2015, Berend Hasselman wrote:




On 17 Sep 2015, at 01:42, Dénes Tóth  wrote:



On 09/16/2015 04:41 PM, Bert Gunter wrote:

Yes! Chuck's use of mapply is exactly the split/combine strategy I was
looking for. In retrospect, exactly how one should think about it.
Many thanks to all for a constructive discussion .

-- Bert


Bert Gunter



Use mapply like this on large problems:

unsplit(
  mapply(
  function(x,z) eval( x, list( y=z )),
  expression( A=y*2, B=y+3, C=sqrt(y) ),
  split( dat$Flow, dat$ASB ),
  SIMPLIFY=FALSE),
  dat$ASB)

Chuck




Is there any reason not to use data.table for this purpose, especially if 
efficiency is of concern?

---

# load data.table and microbenchmark
library(data.table)
library(microbenchmark)
#
# prepare data
DF <- data.frame(
   ASB = rep_len(factor(LETTERS[1:3]), 3e5),
   Flow = rnorm(3e5)^2)
DT <- as.data.table(DF)
DT[, ASB := as.character(ASB)]
#
# define functions
#
# Chuck's version
fnSplit <- function(dat) {
   unsplit(
   mapply(
   function(x,z) eval( x, list( y=z )),
   expression( A=y*2, B=y+3, C=sqrt(y) ),
   split( dat$Flow, dat$ASB ),
   SIMPLIFY=FALSE),
   dat$ASB)
}
#
# data.table-way (IMHO, much easier to read)
fnDataTable <- function(dat) {
   dat[,
   result :=
   if (.BY == "A") {
   2 * Flow
   } else if (.BY == "B") {
   3 + Flow
   } else if (.BY == "C") {
   sqrt(Flow)
   },
   by = ASB]
}
#
# benchmark
#
microbenchmark(fnSplit(DF), fnDataTable(DT))
identical(fnSplit(DF), fnDataTable(DT)[, result])

---

Actually, in Chuck's version the unsplit() part is slow. If the order is not of 
concern (e.g., DF is reordered before calling fnSplit), fnSplit is comparable 
to the DT-version.



But David’s version is faster than Chuck’s fnSplit. I modified David’s solution 
slightly to get a result that is identical to fnSplit.

# David's version
# my modification to return a vector just like fnSplit
fnDavid <- function(dat) {
   z <- mapply(
 function(x,z) eval( x, list( y=z )),
 expression(A= y*2, B=y+3, C=sqrt(y) ),
 split( dat$Flow, dat$ASB ),
 USE.NAMES=FALSE, SIMPLIFY=TRUE
   )
   as.vector(t(z))
}

Added this to Dénes's code.
Benchmarking  with R package rbenchmark and testing result like this

library(rbenchmark)
benchmark(fnSplit(DF), fnDataTable(DT),fnDavid(DF))
identical(fnSplit(DF), fnDataTable(DT)[, result])
identical(fnSplit(DF), fnDavid(DF))

gave this:

test replications elapsed relative user.self sys.self user.child
2 fnDataTable(DT)  100   0.8291.000 0.7620.066  0
3 fnDavid(DF)  100   1.6151.948 1.5150.098  0
1 fnSplit(DF)  100   2.8783.472 2.6850.190  0
 sys.child
2 0
3 0
1 0


identical(fnSplit(DF), fnDataTable(DT)[, result])

[1] TRUE

identical(fnSplit(DF), fnDavid(DF))

[1] TRUE


The above `TRUE' depends on the structure of ASB here. identical(...) is 
often FALSE in the general case. A permutation of ASB is enough to show 
this:



DF$ASB <- sample(DF$ASB)
identical(fnSplit(DF), fnDavid(DF))

[1] FALSE




unsplit() is the price you pay to cope with general orderings.

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple if function

2015-09-15 Thread Charles C. Berry


On Tue, 15 Sep 2015, Bert Gunter wrote:


Thanks to both Davids.

I realize that these things are often a matter of aesthetics -- and
hence have little rational justification -- but I agree with The Other
David: eval(parse) seems to me to violate R's soul( it makes R a macro
language instead of a functional one).

However, mapply(... switch) effectively loops through the frame row by
row. Aesthetically, I like it; but it seems inefficient. If there are
e.g. 1e6 rows in say 10 categories, I think Jeff's approach should do
much better.  I'll try to generate some actual data to see unless
someone else beats me to it.


Use mapply like this on large problems:

unsplit(
mapply(
function(x,z) eval( x, list( y=z )),
expression( A=y*2, B=y+3, C=sqrt(y) ),
split( dat$Flow, dat$ASB ),
SIMPLIFY=FALSE),
dat$ASB)

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Integrals

2015-08-29 Thread Charles C. Berry


On Fri, 28 Aug 2015, Shant Ch via R-help wrote:


Hello all,


For a study I want to find E|X1/3+X2/3+X3/3-X4| for variables following 
lognormal distribution. To get the value we need to use four integrals.


So far, so good.



This is the code which I is used

  fx-function(x){
    dlnorm(x,meanlog=2.185,sdlog=0.562)
  }
U31-integrate(function(y1) { sapply(y1, function(y1) { 
+  integrate(function(y2){  sapply(y2, function(y2) {
+  integrate(function(x1){  sapply(x1, function(x1) { 
+  integrate(function(x2)

+   abs(y1/3+y2/3+x1/3-x2)*fx(y1)*fx(y2)*fx(x1)*fx(x2),0, Inf)$value
+  })},0, Inf)$value })},0, Inf)$value})},0,Inf)$value
The error I received is the following:
Error in integrate(function(y2) { :
  maximum number of subdivisions reached

I can understand the problem,


This is NOT a problem for which a numerical solution (apart from 
evaluating exp(x) and then doing some arithmetic) is required.


You are calculating the expectation of a sum of random variables. And from 
your code, these are independent random variables.


There is a well known calculus of expectations. Consult a book on 
probability theory. The expectation of a lognormal distribution is 
both well known and easy to work out. So is the expectation of a constant 
times a random variable. The expectation of a sum of random variables is 
also well known.


So write down the expectation of the lognormal. Render that expression as 
an R function.


Write down the expectation of a random variable times a constant. Render 
that expression as an R function.


Write down the expression for expectation of a sum of independent 
random variables as a function of the expectations of the random 
variables.


Lastly, write an R function that calls all of the above to yield the 
expectation of a sum of lognormal variables times fixed constants.


You do not need to use (and should not use!) the integrate() function to 
accomplish your aim.


Chuck

p.s. Nesting calls to integrate() is almost certainly a very bad approach 
to any multiple integration problem. If you ever do need to solve a 
multiple integration problem numerically, consult an expert before trying 
to write the solution as R code.



but I am unable to figure out what can be done.. It would be great if 
you can let me know a solution to the problem so as to find a value for 
the integral.


Shant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry Dept of Family Medicine  Public Health
cberry at ucsd edu   UC San Diego / La Jolla, CA 92093-0901
http://famprevmed.ucsd.edu/faculty/cberry/
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Integrals

2015-08-29 Thread Charles C. Berry


On Sat, 29 Aug 2015, Shant Ch wrote:


Hello Dr. Berry,

I know the theoretical side but note we are not talking about 
expectation of sums rather expectation of ABSOLUTE value of the function 
(X1/3+X2/3+X3/3-X4), i.e. E|X1/3+X2/3+X3/3-X4| , I don#39;t think this 
can be handled for log normal distribution by integrals by hand.




Sorry! My tired eyes missed the absolute value.

FWIW, there are some quadrature packages on CRAN.

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Do grep() and strsplit() use different regex engines?

2015-07-11 Thread Charles C. Berry


On Sat, 11 Jul 2015, Bert Gunter wrote:


David/Jeff:

Thank you both.

You seem to confirm that my observation of an infelicity in
strsplit() is real. That is most helpful.

I found nothing in David's message 2 code that was surprising. That
is, the splits shown conform to what I would expect from \\b . But
not to what I originally showed and David enlarged upon in his first
message. I still don't really get why a split should occur at every
letter.

Jeff may very well have found the explanation, but I have not gone
through his code.

If the infelicities noted (are there more?) by David and me are not
really bugs -- and I would be frankly surprised if they were -- I
would suggest that perhaps they deserve mention in the strsplit() man
page. Something to the effect that \b and \ should not be used as
split characters... .


Bert et al,

?strsplit already says:

If empty matches occur, in particular if split has length 0, x is split 
into single characters.


And there are various ways that empty matches can happen besides using 
\\b as the split arg. But there would be no harm in adding your cases to 
'in particular ...'


The comment in the code (src/main/grep.c: line 493) suggests this was a 
deliberate decision. However, similar functions in other languages do not 
do this.


For example, emacs `(split-string red green \\b)' gives

( red   green )

as the result.

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Programming R to avoid loops

2015-04-18 Thread Charles C. Berry


On Sat, 18 Apr 2015, Brant Inman wrote:


I have two large data frames with the following structure:


df1

 id   date test1.result
1  a 2009-08-28  1
2  a 2009-09-16  1
3  b 2008-08-06  0
4  c 2012-02-02  1
5  c 2010-08-03  1
6  c 2012-08-02  0


df2

 id   date test2.result
1  a 2011-02-03  1
2  b 2011-09-27  0
3  b 2011-09-01  1
4  c 2009-07-16  0
5  c 2009-04-15  0
6  c 2010-08-10  1



I need to match items in df2 to those in df1 with specific matching 
criteria. I have written a looped matching algorithm that works, but it 
is very slow with my large datasets. I am requesting help on making a 
version of this code that is faster and “vectorized so to speak.


As I see in your posted code, you match id's exactly, dates according to a 
range, and count the number of positive test result in the second 
data.frame.


For this, the countOverlaps() function of the GenomicRanges package will 
do the trick with suitably defined GRanges objects. Something like:


require(GenomicRanges)

date1 - as.integer( as.Date( df1$date, %Y-%m-%d ))
date2 - as.integer( as.Date( df2$date, %Y-%m-%d ))

lagdays - 30L
predays - -30L

gr1 - GRanges(seqnames=df1$id, IRanges(start=date1,width=1),strand=*)

gr2 - GRanges(seqnames=df2$id,
   IRanges(start=date2+predays,end=date2+lagdays),
   strand=*)[ df2$test2.result==1,]

df1$test2.count - countOverlaps(gr1,gr2)


For the example data.frames (as rendered by Jim Lemon's code), this yields


df1

  id   date test1.result test2.count
1  a 2009-08-281   0
2  a 2009-09-161   0
3  b 2008-08-060   0
4  c 2012-02-021   0
5  c 2010-08-031   1
6  c 2012-08-020   0

The GenomicRanges package is at

http://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.html

where you will find installation instructions and links to vignettes.

HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Requesting function for A/B testing

2015-03-06 Thread Charles C. Berry


On Sat, 7 Mar 2015, Rolf Turner wrote:



On 06/03/15 22:34, Namratha K wrote:


Dear Sir/Madam,
I am a student pursuing MCA .As i am doing an project using R language .I
want to implement A/B testing using R language.I am searching in google
from past few days and not able to implement .So i request u to kindly help
me by sending function or code on A/B testing method using R language.



talk soon


Huh?  Or, to put it another way, WTF?


Rolf,

Old wine, new bottle.

According to http://en.wikipedia.org/wiki/A/B_testing,

A/B testing has been marketed by some as a change in philosophy and
 business strategy in certain niches, though the approach is identical to
 a between-subjects design, which is commonly used in a variety of
 research traditions[refs]

A friend who is a management consultant tells me that the core ideas in 
his discipline never change, but every five years or so a new set of 
buzzwords is introduced to describe them. It makes it seem like a change 
in philosophy and business strategy has occurred. And of course the 
clients will need to hire consultants who know those buzzwords to stay 
current. And the consultants will need to sign up for continuing education 
courses to learn those buzzwords.


===

Namratha,

Read R-intro. Either from your R installation or at

http://cran.r-project.org/doc/manuals/r-release/R-intro.html

Then start R and enter

ls(package:stats,pat=test)

at the prompt and push ENTER. Browse the help pages for the functions 
listed. Run the examples.


That should get you started.

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Not finding superclass in library

2015-02-23 Thread Charles C. Berry


On Mon, 23 Feb 2015, Ramiro Barrantes wrote:

Thank you for pointing this out.  I had no idea about the distinction 
but there are some good references on the matter 
(http://www.r-bloggers.com/packages-v-libraries-in-r/).  I am pasting 
the corrected version below, any suggestions appreciated:






I have a package that I created that defines a parent class, 
assayObject.




I created other classes that inherit from it, say assayObjectDemo. 
Each one of those classes I wanted to make in its own directory separate 
from where the assayObject is defined (there are reasons for that).




But now if I do:

library(assayObject)  #--- where parent object is defined
source(assayObjectDemo.R)

where assayObjectDemo.R is just:

setClass(assayObjectDemo,contains=assayObject)
createDemoAssayObject - function() {
 df - data.frame()
 assay-new(Class=assayObjectDemo)
 assay
}

I get:

Error in reconcilePropertiesAndPrototype(name, slots, prototype, superClasses,  
:
 no definition was found for superclass “assayObject” in the specification of 
class “assayObjectDemo”

What can I do?



Start with a reproducible example. Here is one:

  library(Matrix)
  tmpf - tempfile(fileext=.R)
  cat('setClass(MatrixDemo,contains=Matrix)',file=tmpf)
  source(tmpf)
  slotNames(MatrixDemo)

And it produces the expected output without error:

 [1] Dim  Dimnames

Since this works fine for a widely used package and fails for your
(unspecified) package, I suspect there is a problem with your package.

If I had to guess, I'd say it is a NAMESPACE issue. Be sure your
exportClasses directive is correctly formed per Section 1.5.6
Namespaces with S4 classes and methods of R-exts.

r-devel might be a better venue for this discussion.

HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting a list of lists using lapply

2015-02-20 Thread Charles C. Berry


On Fri, 20 Feb 2015, Aron Lindberg wrote:


Hmm…Chuck’s solution may actually be problematic because there are several 
entries which at the deepest level are called “sha”, but that should not be 
included, such as:





input[[67]]$content[[1]]$commit$tree$sha




and




input[[67]]$content[[1]]$parents[[1]]$sha





it’s only the “sha” that fit the following subsetting pattern that should be 
included:





input[[i]]$content[[1]]$sha[1]





This should be straightforward. Look at what grepl() is doing.

And look at what names(unlist(input)) yields.

You can either write a regular expression to handle this (perhaps 
content.sha$) or write other grepl() expressions to select (or get rid 
of) the desired (or unwanted) pattern.


See ?grepl and the page on regular expression referenced there.

HTH,

Chuck
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with expression

2011-01-24 Thread Charles C. Berry


On Tue, 25 Jan 2011, David Scott wrote:

I have a problem with expressions. I am trying to create a title where the 
parameter of interest is displayed as a Greek character. Which parameter is 
being considered is stored in a character variable.


As an example, if I have

param - alpha


param - as.name(alpha)


HTH,

Chuck


and then do

plot(0, 0, main = bquote(Parameter==.(param)))

then in the title I get Parameter = alpha,
whereas I want the Greek character alpha.

David Scott


--
_
David Scott Department of Statistics
 The University of Auckland, PB 92019
 Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function of probability for normal distribution

2011-01-19 Thread Charles C. Berry


On Wed, 19 Jan 2011, JClark wrote:




Dear Greg Snow,

I'm a biologist trying to write a mathematical formula for a doubly
truncated normal distribution to be used in the language R. I realise this
is simple stuff for a mathematician but I'm stumped.
Wikipedia gives what seems a fairly simple formula - with function = maths
with mean and standard deviation - but also phi - WHAT IS PHI !!?? -



On

http://en.wikipedia.org/wiki/Truncated_normal_distribution

it says (by copy-and-paste)

The density function involves \scriptstyle{\phi(\cdot)} \ , which is the 
probability density function of the standard normal distribution and 
\scriptstyle{\Phi(\cdot)} \ , its cumulative distribution function.


There are some links there that you can follow to get up to speed.


especially how do I write this in R and why is the top phi in italics ??


Hmmm. Try the posting guide's suggestions. This seems to help:

?distribution
[stuff deleted]
For the normal distribution see dnorm.

and
?dnorm
[stuff deleted]
dnorm gives the density, pnorm gives the distribution function,
qnorm gives the quantile function, and rnorm generates random
deviates.

so match up 'density' and 'distribution function' in the ?dnorm page and 
the wiki page and you should be able to put it together.


(FWIW, ?density and ??density are not so helpful.)

BTW

 \frac{\frac{1}{\sigma}\phi(\frac{x - \mu}{\sigma})}

(copy and pasted from the wiki page) can be rendered as

dnorm(x , mu, sigma ) in R.

HTH,

Chuck


Hoping you can help.

Yours sincerely,

Jeremy Clark







--
View this message in context: 
http://r.789695.n4.nabble.com/Re-R-function-for-Probabilities-for-the-standard-normal-distribution-tp903639p3225457.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parameters/data that live globally

2011-01-18 Thread Charles C. Berry


On Tue, 18 Jan 2011, analys...@hotmail.com wrote:


I am coming to R from Fortran and I used to use fixed size arrays in
named common. common /name1/array(100)

The contents of array can be accessed/modified if and only if this
line occurs in the function.  Very helpful if different functions need
different global data (can have name2, name3 etc. for common data
blocks).

Is there a way to do this in R?


Sure.

But probably better to work in R as the developers intended, rather than 
trying to simulate COMMON blocks per se.


See

?list
?environment
?search
10.7 Scope in Intro to R
?get
?assign

You can collect objects in an environment and pass it to a function or put 
it in the search path. Then you can get/assign objects.


Probably it is easier to work with a list. Here is a toy example


foo - function(object){ object$x - object$y + 1; object}
my.list - list( y = 4:5 )
my.list - foo(my.list)
my.list

$y
[1] 4 5

$x
[1] 5 6





HTH,

Chuck




Thanks for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] list concatenation

2011-01-11 Thread Charles C. Berry


On Tue, 11 Jan 2011, Georg Otto wrote:


Dear R gurus,


first let me apologize for a question that might hve been answered
before. I was not able to find the solution yet. I want to concatenate
two lists of lists at their lowest level.

Suppose I have two lists of lists:

list.1 - list(I=list(A=c(a, b, c), B=c(d, e, f)),

  II=list(A=c(g, h, i), B=c(j, k, l)))


list.2 - list(I=list(A=c(A, B, C), B=c(D, E, F)),

  II=list(A=c(G, H, I), B=c(J, K, L)))




list.1

$I
$I$A
[1] a b c

$I$B
[1] d e f


$II
$II$A
[1] g h i

$II$B
[1] j k l



list.2

$I
$I$A
[1] A B C

$I$B
[1] D E F


$II
$II$A
[1] G H I

$II$B
[1] J K L


Now I want to concatenate list elements of the lowest levels, so the
result looks like this:


$I
$I$A
[1] a b c A B C

$I$B
[1] d e f D E F


$II
$II$A
[1] g h i G H I

$II$B
[1] j k l J K L


Has anybody a good solution for that?




mapply( function(x,y) mapply(c, x, y, SIMPLIFY=FALSE),

+ list.1, list.2, SIMPLIFY=FALSE )
$I
$I$A
[1] a b c A B C

$I$B
[1] d e f D E F


$II
$II$A
[1] g h i G H I

$II$B
[1] j k l J K L

HTH,

Chuck








Best,

Georg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory Needed for Regression

2011-01-10 Thread Charles C. Berry


On Mon, 10 Jan 2011, efreeman wrote:


I'm looking for a formula for memory usage in standard regression; that
is, if I have X rows with Y predictors, how much memory is needed? I'm
speccing out a system, and I'd like to be able to get enough memory
that we can do some fairly large regressions.



install.packages(biglm)
require(biglm)

Then see

?biglm

biglm creates a linear model object that uses only p^2 memory for p 
variables. It can be updated with more data using update. This allows 
linear regression on data sets larger than memory.



If you want to get serious about this look in Golub and Van Loan* (Sorry, 
my copy is not at hand so I cannot be more specific. Maybe there is a 
section like Updating Matrix Factorizations that says what is needed.)


Also, see

Algorithm AS274 Applied Statistics (1992) Vol.41, No. 2

which is what biglm() refers to. And maybe read the source code of 
biglm() if you are planning on using that package.


HTH,

Chuck

* @book{golub1996matrix,
  title={{Matrix computations}},
  author={Golub, G.H. and Van Loan, C.F.},
  isbn={0801854148},
  year={1996},
  publisher={Johns Hopkins Univ Pr}
}




==Ed Freeman


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question on list objects

2011-01-08 Thread Charles C. Berry


On Sat, 8 Jan 2011, Ron Michael wrote:


Hi, I have 2 questions on list object:
?
1. Suppose I have a matrix like:
dat - matrix(1:9,3)
?
Now I want to replicate this entire matrix 3 times and put entire result in a list 
object. Means, if res is the resulting list then I should have:
?
res[[1]]=dat, res[[2]]=dat, res[[3]]=dat
?
How can I do that in the easilest manner?


See

?rep
...
Examples
...
## replicate a list
...


?
2. Suppose I have 2 list objects:
list1 - list2 - vector(list, length=2)
for(i in 1:2) {
??list1[[i]] - matrix(rnorm(15), 3)
??list2[[i]] - matrix(rnorm(15), 3)
?}

How can I add these 2 list objects? I have tried with just list1+list2, however 
it is generating some error.
?


See

?mapply

take note of the SIMPLIFY arg, which you want as FALSE

HTH,

Chuck



Would be grateful for any help.
?
Thanks,


[[alternative HTML version deleted]]




Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Normal Distribution Quantiles

2011-01-08 Thread Charles C. Berry


On Sat, 8 Jan 2011, Rainer Schuermann wrote:


Sounds like homework, which is not an encouraged use of the Rhelp
list. You can either do it in theory...




It is _from_ a homework but I have the solution already (explicitly got 
that done first!) - this was the pasted Latex code (apologies for that, 
but in plain text it looks unreadable[1], and I thought everybody here 
has his / her favorite Latrex editor open all the time anyway...). I'm 
just looking, for my own advancement and programming training, for a way 
of doing that in R - which, from your and Dennis' reply, doesn't seem to 
exist.


Your question was 'how do I find the smallest integer $n$ such that...', 
right?


Using uniroot and pnorm, you could solve for real $n$ and then round up.

Doing this, I find that in greater than 95% of trials, your bushwalker 
would be done in 105 days or less.


Or you could use findInterval, sapply, and pnorm to get all of the $n$s in 
one expression.


HTH,

Chuck





I would _not_ misuse the list for getting homework done easily, I will not ask 
learning statistics questions here, and I will always try to find the 
solution myself before posting something here, I promise!

Thanks anyway for the simulation advice,
Rainer


   (4000 - (40*n))   -329
[1] --- = 
 1200
  (10*(n^-))
 2




On Saturday 08 January 2011 14:56:20 you wrote:


On Jan 8, 2011, at 6:56 AM, Rainer Schuermann wrote:


This is probably embarrassingly basic, but I have spent quite a few
hours in Google and RSeek without getting a clue - probably I'm
asking the wrong questions...

There is this guy who has decided to walk through Australia, a total
distance of 4000 km. His daily portion (mean) is 40km with an sd of
10 km. I want to calculate the number of days it takes to arrive
with 80, 90, 95, 99% probability.
I know how to do this manually, eg. for 95%
$\Phi \left( \frac{4000-40n}{10 \sqrt{n}}  \right) \leq 0.05$
find the z score...

but how would I do this in R? Not qnorm(), but what is it?


Sounds like homework, which is not an encouraged use of the Rhelp
list. You can either do it in theory or you can simulate it. Here's a
small step toward a simulation approach.

 cumsum(rnorm(100, mean=40, sd=10))
   [1]   41.90617   71.09148  120.55569  159.56063  229.73167
255.35290  300.74655
snipped
  [92] 3627.25753 3683.24696 3714.11421 3729.41203 3764.54192
3809.15159 3881.71016
  [99] 3917.16512 3932.00861
 cumsum(rnorm(100, mean=40, sd=10))
   [1]   38.59288   53.82815  111.30052  156.58190  188.15454
207.90584  240.64078
snipped
  [92] 3776.25476 3821.90626 3876.64512 3921.16797 3958.83472
3992.33155 4045.96649
  [99] 4091.66277 4134.45867

The first realization did not make it in the expected 100 days so
further efforts should extend the simulation runs to maybe 120 days.
The second realization had him making it on the 98th day. There is an
R replicate() function available once you get a function running that
will return a specific value for an instance. This one might work:
 min(which(cumsum(rnorm(120, mean=40, sd=10)) = 4000) )
[1] 97

If you wanted a forum that does not explicitly discourage homework and
would be a better place to ask theory and probability questions, there
is CrossValidated:
http://stats.stackexchange.com/faq



Thanks in advance,
and apologies for the level of question...
Rainer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pattern recognition with paths

2011-01-05 Thread Charles C. Berry


On Wed, 5 Jan 2011, Benjamin Polidore wrote:


I'm trying to identify patterns among various paths like the following:

http://i.imgur.com/bQPI3.png

If I plot these, I can observe intuitively two different patterns: a front
loaded (1 and 3) and a backloaded (2,4) progress path:

http://i.imgur.com/L5qwZ.png

I have thousands of observations like the above table, and I want to use R
to identify clusters of these paths.  I looked at spatstat, but it seems
more relevant to points than paths.


Hmmm. Is this what you are after?

http://en.wikipedia.org/wiki/Functional_data_analysis

It is a hefty topic.

There is a substantial literature on characterizing curves. Just  Google

Functional Data Analysis

for a start and look at the 'fda' and 'MFDA' packages.

HTH,

Chuck



Thanks,
bp

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Repeated Indexing / Sequence Operation

2010-12-31 Thread Charles C. Berry


On Fri, 31 Dec 2010, Paolo Rossi wrote:


Hi Everyone,

quick question before the end of the year.

I have soem indices to select data from a bigger sample. I want to select n
days before each index and n days after the index. Any clever way to do it.


For heavy duty applications involving intervals - many intervals, finding 
overlapping intervals, set operations, et cetera, you may want to use the 
IRanges package:


http://www.bioconductor.org/help/bioc-views/release/bioc/html/IRanges.html



A for loop would do but I wanted to know if there is a moreR-friendly way to
approach this

Example
# InitialIndices
i2 = (90, 190, 290)



Like this:


 i2 - c(90, 190, 290)
as.vector( IRanges( start= i2 - 5, end = i2 + 5 ) )

 [1]  85  86  87  88  89  90  91  92  93  94  95 185 186 187 188 189 190 191 192
[20] 193 194 195 285 286 287 288 289 290 291 292 293 294 295




HTH,

Chuck


# Indices I want to end up with
i3 = c(85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 185, 186, 187, 188, 189,
190, 191, 192, 193, 194, 195
285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295)
# A way to get Final Indices
SampleWidth
i3 = c(i2)
for (j in seq(1, SampleWidth )) {
i3 = c(i3, i2 + j )
i3 = c(i3, i2 - j )
}


I tried to tackle this with seq and the apply families but got nowhere

Thanks and Happy New Year

Paolo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] forcing evaluation of a char string argument

2010-12-23 Thread Charles C. Berry


On Wed, 22 Dec 2010, rballen wrote:



Why does x in assign(x) correctly evaluate to rank where
UseMethod(func) does not get correctly evaluated?


Because it is the body of a function definition.

If you want to plug in the value of 'func' in the body of that function, 
you need to do something like this:


toGeneric - 
function(func) {

env-environment(get(func))

# default method of new generic = the original function
assign(paste(func,.default,sep=),get(func),pos=env)
foo - function(x,...) {}
lf - list(x=func)
body.foo - substitute(UseMethod(x),lf)
body(foo)-body.foo
assign(func,foo,pos=env)
}

BTW, are you sure you know what 'env' evaluates to?? (It is NOT the 
environment of the object named by the value of func in the parent.frame 
of toGeneric.)


HTH,

Chuck


Can we use as.call(list(UseMethod,func))?


assign(paste(func,.default,sep=),get(func),pos=env)

assign(func,function(x,...) UseMethod(func),pos=env)


On Wed, Dec 22, 2010 at 9:03 PM, William Dunlap [via R]
ml-node+3161542-1898476570-206...@n4.nabble.com wrote:

Try the following, which I haven't tested much
and needs more error checking (e.g., to see that
the function is not already generic and that
its argument list is compatible with (x,...)).
I put in the print statements to show what
the calls to substitute() do.

toGeneric - function (funcName) {
?? ?? stopifnot(is.character(funcName))
?? ?? funcItself - get(funcName)
?? ?? stopifnot(is.function(funcItself))
?? ?? envir - environment(funcItself)
?? ?? tmp - substitute(funcSymbol - function(x, ...)
UseMethod(funcName),
?? ?? ?? ?? list(funcSymbol = as.symbol(funcName), funcName = funcName))
?? ?? print(tmp)
?? ?? eval(tmp, envir = envir)
?? ?? tmp - substitute(defaultSymbol - funcItself, list(defaultSymbol =
as.symbol(paste(sep = .,
?? ?? ?? ?? funcName, default)), funcItself = funcItself))
?? ?? print(tmp)
?? ?? eval(tmp, envir = envir)
}

E.g.,

?? ?? wsx - function(x, base=2)log(x, base=base)
?? ?? toGeneric(wsx)
?? ??wsx - function(x, ...) UseMethod(wsx)
?? ??wsx.default - function (x, base = 2)
?? ??log(x, base = base)
?? ?? wsx
?? ??function (x, ...)
?? ??UseMethod(wsx)
?? ?? wsx.default
?? ??function (x, base = 2)
?? ??log(x, base = base)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


-Original Message-
From: [hidden email]
[mailto:[hidden email]] On Behalf Of rballen
Sent: Wednesday, December 22, 2010 2:42 PM
To: [hidden email]
Subject: [R] forcing evaluation of a char string argument


I'm trying to make a function to turn a regular function into
an S3 generic
one. I want myMethod to be:

function(x,...) UseMethod(myMethod)

But I keep getting:

function(x,...) UseMethod(func)

Here's the function:

toGeneric-function(func) {
env-environment(get(func))

# default method of new generic = the original function
assign(paste(func,.default,sep=),get(func),pos=env)

assign(func,function(x,...) UseMethod(func),pos=env)
}

toGeneric(myMethod)

I messed around with force, substitute, and deparse, but I
can't get any of
those to help.

Thanks.
--
View this message in context:
http://r.789695.n4.nabble.com/forcing-evaluation-of-a-char-str

ing-argument-tp3161365p3161365.html

Sent from the R help mailing list archive at Nabble.com.

__
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



View message @
http://r.789695.n4.nabble.com/forcing-evaluation-of-a-char-string-argument-tp3161365p3161542.html
To unsubscribe from forcing evaluation of a char string argument, click
here.


--
View this message in context: 
http://r.789695.n4.nabble.com/forcing-evaluation-of-a-char-string-argument-tp3161365p3161666.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]




Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with function

2010-12-17 Thread Charles C. Berry


On Fri, 17 Dec 2010, Iain Gallagher wrote:


Hello List

I'm moving this over from the bioC list as, although the problem I'm working on 
is biological, the current bottle neck is my poor understanding of R.

I wonder if someone would help me with the following function.



Here is how I'd take it apart.

Either

 1) put browser() as the first line of the function,then feed lines to the
browser one -by-one to see where it hangs,

 2) trace(cumulMetric) , then try to run it (much like 1, but it will
handle feeding the lines of the function more easily)

 3) options( error = recover ), then run it till it hits the error, then
browser thru the frames to see what is where

See

?browser
?trace
?recover

as background.

HTH,

Chuck


cumulMetric - function(deMirPresGenes, deMirs){
   
#need to match position of each miR in deMirPresGenes with its FC to form a 
vector of FC in correct order
    fc - deMirs
    fcVector - as.numeric(with (fc, FC[match(deMirPresGenes[,4], Probe)] ) )

    #multiply fc by context score for each interaction
    metric - fcVector * as.numeric(deMirPresGenes[,11])
    geneMetric - cbind(deMirPresGenes[,2], as.numeric(metric))

    #make cumul weighted score
    listMetric - unstack(geneMetric, as.numeric(geneMetric[,2])~geneMetric[,1])
    listMetric - as.data.frame(sapply(listMetric,sum)) #returns a dataframe
    colnames(listMetric) - c('cumulMetric')

    #return whole list
    return(listMetric)
}

deMirPresGenes looks like this:

Gene.ID    Gene.Symbol    Species.ID    miRNA    Site.type    UTR_start    
UTR_end    X3pairing_contr    local_AU_contr    position_contr    
context_score    context_percentile
22848    AAK1    9606    hsa-miR-183    2    1546    1552    -0.026    -0.047   
 0.099    -0.135    47
19    ABCA1    9606    hsa-miR-183    2    1366    1372    -0.011    -0.048    
0.087    -0.133    46
20    ABCA2    9606    hsa-miR-495    2    666    672    -0.042    -0.092    
-0.035    -0.33    93
23456    ABCB10    9606    hsa-miR-183    3    1475    1481    0.003    -0.109  
  -0.05    -0.466    98
6059    ABCE1    9606    hsa-miR-495    2    1474    1480    0.005    -0.046    
0.006    -0.196    58
55324    ABCF3    9606    hsa-miR-1275    3    90    96    0.007    0.042    
-0.055    -0.316    94

although it is much longer in 'real life'.

The aim of the function is to extract a dataframe of gene symbols along with a 
weighted score from the above data. The weighted score is the FC column of 
deMirs * the context_score column of deMirPresGenes. This is easy peasy!

Where I'm falling down is that if I run this function it complains that 
'geneMetric' can't be found. Hmm - I've run it all line by line (i.e. not as a 
function) and it works but wrapped up like this it fails!

e.g.


testF2 - cumulMetric(testF1, deMirs$up)

Error in eval(expr, envir, enclos) : object 'geneMetric' not found

deMirs$up looks like this:

Probe    FC
hsa-miR-183    2.63
hsa-miR-1275    2.74
hsa-miR-495    3.13
hsa-miR-886-3p    3.73
hsa-miR-886-5p    3.97
hsa-miR-144*    6.62
hsa-miR-451    7.94

In an effort to debug this I have examined each object using 'print' statements 
(as suggested by cstrato on the bioC list).

All the objects in the function up until listMetric look ok in terms of 
structure and length. But the error persists and listMetric is not made?!?! Odd.

I have added some comments to the output below.


tf2-cumulMetric(tf1, deMirs$up)#deMirs$up is a dataframe (see above - it is 
the same as deMirs)



[1] 2.63 2.63 3.13 2.63 3.13 2.74 # print fcVector - looks ok
[1] -0.35505 -0.34979 -1.03290 -1.22558 -0.61348 -0.86584 # print metric - 
looks ok
[1] 1045 # length of metric - is correct
     sym      metric    # print geneMetric - looks ok
[1,] AAK1   -0.35505
[2,] ABCA1  -0.34979
[3,] ABCA2  -1.0329
[4,] ABCB10 -1.22558
[5,] ABCE1  -0.61348
[6,] ABCF3  -0.86584
[1] 1045 # nrow of geneMetric - is correct
Error in eval(expr, envir, enclos) : object 'geneMetric' not found




Could someone possibly point out where I falling down.

Thanks

iain


sessionInfo()

R version 2.12.0 (2010-10-15)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
[3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8   
[5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8   
[7] LC_PAPER=en_GB.utf8       LC_NAME=C               
[9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C     

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.12.0






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry

Re: [R] Confidence Intervals for Odds Ratios in multivariate logistic regression

2010-12-08 Thread Charles C. Berry


On Wed, 8 Dec 2010, S.M. Raghavan wrote:


Hi all,

I am trying to fit a logistic regression for a bivariate response using five
independent variables in a stepwise procedure. My outputs look okay but does
any one know (or is there any literature on) how the confidence intervals
are calculated for the reported odds ratios..?



Bert and David have wanred you about the misleading results that 
confidence intervals can give with stepwise procedures.


There are a number of approaches that are not so misleading.

For one such and perhaps some insights into the problems that David and 
Bert were pointing out, try this:


install.packages( BMA )
library( BMA )
?bic.glm
example( bic.glm )

HTH,

Chuck



Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for strings

2010-12-04 Thread Charles C. Berry


On Sun, 5 Dec 2010, Santosh Srinivas wrote:


I am trying to find the function where I can search for a pattern in a
text string (I thought I could use grep for this but no :().


Correct, but reading

?grep

and running

example( grep )

should show you how to do that search.




x

[1] abcdefghijkl

I want to find the positions (i.e. equivalent of nchar) for cd and
in case there are multiple hits .. then the results as a array


You will need to deal with the possibility that there are more 'cd's in 
some elements of 'text' than in others.


So, it is not obvious that an _array_ will work without some special 
tooling.


HTH,

Chuck





Thank you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to pass selection criteria in a function

2010-12-01 Thread Charles C. Berry


On Wed, 1 Dec 2010, cmccar...@bmcc.cuny.edu wrote:


Hi,
Suppose I have the following data

name score
Abel??? 88
Baker? 54
Charlie??? 77

stored a? table called myData.


I want to write a function that will create a table which is a subset of myData 
containing those have a score  75.

I know I can do this with the following command:
subset(myData, score  75)

But I would like to do this via a function, something like:

newTable - function( data,? criteria){
??? subset( data, criteria) }

and then calling: newTable(myData, score  75)

But this doesn't work. I am sure there is a simple way to do this,
but I am stuck! Please help. Thanks!


Simple? Maybe not so much!

You are trying to pass objects without evaluating them. subset is rather 
special in the way it works. Here is one way:



foo - function(x,...){

+ mc - match.call()
+ mc[[1]] - as.name(subset)
+ eval(mc)
+ }

foo(iris, Petal.Width2.4 )

Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
101  6.3 3.3  6.0 2.5 virginica
110  7.2 3.6  6.1 2.5 virginica
145  6.7 3.3  5.7 2.5 virginica


Reading the code at the top of lm shows how this kind of strategy can be 
used.


You might look at

?bquote

for another way to skin this cat.


HTH,

Chuck




Chris?McCarthy, PhD
Department?of?Mathematics?BMCC
Office:?N522???Extension:?5235
cmccar...@bmcc.cuny.edu
[[alternative HTML version deleted]]




Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] more flexible ave

2010-11-30 Thread Charles C. Berry


On Tue, 30 Nov 2010, Adaikalavan Ramasamy wrote:


Here is a possible solution using sweep instead of ave:

  df - data.frame(site = c(a, a, a, b, b, b),
   gr = c(total, x1, x2, x1, total,x2),
   value1 = c(212, 56, 87, 33, 456, 213),
   value2 = c(1546, 560, 543, 234, 654, 312) )



lm() and friends provide a simple approach:



cbind( df, percent =

+   df[,-(1:2)] /
+ predict( lm( cbind(value1,value2) ~ gr*site, df),
+  new=data.frame(site=df$site,gr='total' ))
+ )
  sitegr value1 value2 percent.value1 percent.value2
1a total212   1546 1.  1.000
2ax1 56560 0.26415094  0.3622251
3ax2 87543 0.41037736  0.3512290
4bx1 33234 0.07236842  0.3577982
5b total456654 1.  1.000
6bx2213312 0.46710526  0.4770642




HTH,

Chuck


 sdf - split(df, df$site)

 out - lapply( sdf, function(mat){

small.mat - mat[ , -c(1,2)]
totals- mat[ which( mat[ , gr] == total ), -c(1,2) ]
totals- as.numeric(totals)

percent=sweep( small.mat, MARGIN=2, STATS=totals, FUN=/ )
colnames(percent) - paste(percent_, colnames(percent), sep=)
return( cbind(mat, percent) )
  } )

 do.call(rbind, out)

  sitegr value1 value2 percent_value1 percent_value2
  a.1a total212   1546 1.  1.000
  a.2ax1 56560 0.26415094  0.3622251
  a.3ax2 87543 0.41037736  0.3512290
  b.4bx1 33234 0.07236842  0.3577982
  b.5b total456654 1.  1.000
  b.6bx2213312 0.46710526  0.4770642

Also I think it might be more efficient to replace your gr variable with a 
binary 0,1 where 1 indicates the total. That way you don't have to generate 
x1, x2, x3, 


Regards, Adai


On 30/11/2010 14:42, Patrick Hausmann wrote:

 Hi all,

 I would like to calculate the percent of the total per group for this
 data.frame:

 df- data.frame(site = c(a, a, a, b, b, b),
gr = c(total, x1, x2, x1, total,x2),
value1 = c(212, 56, 87, 33, 456, 213))
 df

 calcPercent- function(df) {

   df- transform(df, pct_val1 = ave(df[, -c(1:2)], df$gr,
 FUN = function(x)
 x/df[df$gr == total, value1]) )
}

 # This works as intended...
 w- lapply(split(df, df$site), calcPercent)
 w- do.call(rbind, w)
 w

 # ... but when I add a new column
 df$value2- c(1546, 560, 543, 234, 654, 312)

 # the result is not what I want...
 w- lapply(split(df, df$site), calcPercent)
 w- do.call(rbind, w)
 w

 Clearly I have to change the function, (particularly value1) - but
 how... I've also played around with apply but without any success.

 Thanks for any help!
 Patrick

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the first. from SAS in R

2010-11-23 Thread Charles C. Berry


On Tue, 23 Nov 2010, Joel wrote:



Is there any similar function in R to the first. in SAS?

What it dose is:

Lets say we have this table:

 a b  c
 1 1  5
 1 0  2
 2 0  2
 2 0 NA
 2 9  2
 3 1  3


and then I want do to do one thing the first time the number 1 appers in a
and something else the secund time 1 appers in a and so on.

so

something similar to:

if first.a {
a$d-1
}else{
a$d-0
}

This would give me

 a b  c b
 1 1  5 1
 1 0  2 0
 2 0  2 1
 2 0 NA 0
 2 9  2 0
 3 1  3 1

Is there such a function in R or anything similar?


See

?duplicated

then try

a$d - ifelse( duplicated( a$a ), 0 , 1 )

and

a$d.2 - as.numeric( !duplicated( a$a ) )

HTH,

Chuck




thx

//Joel

--
View this message in context: 
http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3055417.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the first. from SAS in R

2010-11-23 Thread Charles C. Berry


On Tue, 23 Nov 2010, Dennis Murphy wrote:


Interesting. Check this out:

u - sample(c(TRUE, FALSE), 10, replace = TRUE)

u

[1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE

class(u)

[1] logical

u + 0

[1] 0 0 1 0 0 1 0 0 0 0

0 + u

[1] 0 0 1 0 0 1 0 0 0 0

v - rpois(10, 3)

!duplicated(v)

[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

class(!duplicated(v))

[1] logical

!duplicated(v) + 0

[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

0 + !duplicated(v)

[1] 1 0 1 1 1 1 0 1 0 1

# Now assign !duplicated(v) to an object:

w - !duplicated(v)
class(w)

[1] logical

0 + w

[1] 1 0 1 1 1 1 0 1 0 1

w + 0

[1] 1 0 1 1 1 1 0 1 0 1

I can see *what* is going on, but what is the reason for it? I see another
notebook entry coming :)


See

?Arithmetic

and read the paragraph under Details starting 'Logical vectors'

Chuck



Dennis

On Tue, Nov 23, 2010 at 6:12 AM, David Winsemius dwinsem...@comcast.netwrote:



On Nov 23, 2010, at 8:33 AM, Joel wrote:



Is there any similar function in R to the first. in SAS?

What it dose is:

Lets say we have this table:

 a b  c
 1 1  5
 1 0  2
 2 0  2
 2 0 NA
 2 9  2
 3 1  3


and then I want do to do one thing the first time the number 1 appers in a
and something else the secund time 1 appers in a and so on.

so

something similar to:

if first.a {
a$d-1
}else{
a$d-0
}



The duplicated function which returns a logical vector with those features
can easily be coerced to numeric.

df$d - as.numeric(!duplicated(df$a))


I was a bit puzzled about my failure to get coercion by the method which I
thought was supposed to work, namely adding 0.

df$e - !duplicated(df$a)+0  # does not coerce

df$e - 0 + !duplicated(df$a) # pre-adding 0 does coerce

Maybe the rules on coercion were amended.

--
David


 This would give me


 a b  c b
 1 1  5 1
 1 0  2 0
 2 0  2 1
 2 0 NA 0
 2 9  2 0
 3 1  3 1

Is there such a function in R or anything similar?


thx

//Joel

--
View this message in context:
http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3055417.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



David Winsemius, MD
West Hartford, CT


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Slow update(insert) on Data.frame

2010-11-23 Thread Charles C. Berry


On Tue, 23 Nov 2010, Joel wrote:



Hi

When I try to update an number in a large data.frame by its pos It's really
slow it take almost a sec to do this and I wonder why and if where is any
faster way to update a number in a data.frame

ive tried

DF$col[POS]-number
DF[xPOS,yPOS]-number


See

?tracemem

then try

tracemem( DF )
DF$col[POS]-number
DF[xPOS,yPOS]-number

then
mat - as.matrix( DF )
tracemem( mat )
mat[ 2,3 ] - 4
mat[ 15 ] - 5

If your data.frame can be represented by a matrix, a single update will 
likely be faster as it can be done without making any copies.


If you are doing more than one update, it will help to do them all at 
once. Reread


?Subscript

and section 5.2 and 5.3 of Intro to R if you are unsure how to do that.

HTH,

Chuck



Thx
//Joel
--
View this message in context: 
http://r.789695.n4.nabble.com/Slow-update-insert-on-Data-frame-tp3055707p3055707.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the first. from SAS in R

2010-11-23 Thread Charles C. Berry


On Tue, 23 Nov 2010, David Winsemius wrote:


On Nov 23, 2010, at 11:04 AM, Charles C. Berry wrote:


On Tue, 23 Nov 2010, Dennis Murphy wrote:

 Interesting. Check this out:
 
 u - sample(c(TRUE, FALSE), 10, replace = TRUE)

  u
 [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
  class(u)
 [1] logical
  u + 0
 [1] 0 0 1 0 0 1 0 0 0 0
  0 + u
 [1] 0 0 1 0 0 1 0 0 0 0
 
 v - rpois(10, 3)

  !duplicated(v)
 [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
  class(!duplicated(v))
 [1] logical
  !duplicated(v) + 0
 [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
  0 + !duplicated(v)
 [1] 1 0 1 1 1 1 0 1 0 1
 
 # Now assign !duplicated(v) to an object:

  w - !duplicated(v)
  class(w)
 [1] logical
  0 + w
 [1] 1 0 1 1 1 1 0 1 0 1
  w + 0
 [1] 1 0 1 1 1 1 0 1 0 1
 
 I can see *what* is going on, but what is the reason for it? I see 
 another

 notebook entry coming :)

See

 ?Arithmetic

and read the paragraph under Details starting 'Logical vectors'


Chuck;

Compare these three, all of which are using binary operators on logical 
vectors which is what is being discussed in ?Arithmetic:



 duplicated(c(a, a, b) ) + 0

[1] 0 1 0

 !duplicated(c(a, a, b) ) + 0

[1]  TRUE FALSE  TRUE

 0 + !duplicated(c(a, a, b) )

[1] 1 0 1

I believe the proper place to go is ?Syntax where operator precedence is 
discussed. I think the precendence rules implicitly do this in the second 
instance, because + has higher precendence than negation:


! ( duplicated(c(a, a, b) ) + 0 )


David,

Thanks.

Both you and David Lorenz are correct in pointing to operator precedence 
as the answer to Dennis' question.


Mea culpa for my not reading Dennis' question carefully enough to 
understand what his question really was!


Best,

Chuck



--
David.


Chuck

 
 Dennis
 
 On Tue, Nov 23, 2010 at 6:12 AM, David Winsemius 
 dwinsem...@comcast.netwrote:
 
  
  On Nov 23, 2010, at 8:33 AM, Joel wrote:
  
  
   Is there any similar function in R to the first. in SAS?
   
   What it dose is:
   
   Lets say we have this table:
   
   a b  c

   1 1  5
   1 0  2
   2 0  2
   2 0 NA
   2 9  2
   3 1  3
   
   
   and then I want do to do one thing the first time the number 1 appers 
   in a

   and something else the secund time 1 appers in a and so on.
   
   so
   
   something similar to:
   
   if first.a {

   a$d-1

}else{

   a$d-0

}
   
   
  The duplicated function which returns a logical vector with those 
  features

  can easily be coerced to numeric.
  
  df$d - as.numeric(!duplicated(df$a))
  
  
  I was a bit puzzled about my failure to get coercion by the method 
  which I

  thought was supposed to work, namely adding 0.
  
  df$e - !duplicated(df$a)+0  # does not coerce
  
  df$e - 0 + !duplicated(df$a) # pre-adding 0 does coerce
  
  Maybe the rules on coercion were amended.
  
  --

  David
  
  
  This would give me
   
   a b  c b

   1 1  5 1
   1 0  2 0
   2 0  2 1
   2 0 NA 0
   2 9  2 0
   3 1  3 1
   
   Is there such a function in R or anything similar?
   
   
   thx
   
   //Joel
   
   --

   View this message in context:
   
http://r.789695.n4.nabble.com/the-first-from-SAS-in-R-tp3055417p3055417.html
   Sent from the R help mailing list archive at Nabble.com.
   
   __

   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
   
  
  David Winsemius, MD

  West Hartford, CT
  
  
  __

  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
  [[alternative HTML version deleted]]
 
 __

 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.
 

Charles C. BerryDept of Family/Preventive 
Medicine

cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




David Winsemius, MD
West Hartford, CT




Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the first. from SAS in R

2010-11-23 Thread Charles C. Berry


On Tue, 23 Nov 2010, seeliger.c...@epamail.epa.gov wrote:


Is there any similar function in R to the first. in SAS?

?duplicated

a$d - ifelse( duplicated( a$a ), 0 , 1 )

a$d.2 - as.numeric( !duplicated( a$a ) )


Actually, duplicated does not duplicate SAS' first. operator, though it
may suffice for the OP's needs.

To illustrate, let's start with a dataframe of 3 key columns and some data
in x:
tt - data.frame(k1 = rep(1:3, each=10), k2 = rep(1:5, each=2, times=3),
k3=rep(1:2, times=15), x = 1:30)

# Try to mimic what the following SAS datastep would do,
# assuming 'tt' is already sorted:
#   data foo;
# set tt;
# by k1, k2;
# put first.k1=, first.k2=;
#   run;

# SAS' first. operations would result in these values:
tt$sas.first.k1 - rep(c(1, rep(0,9)), 3)
tt$sas.first.k2 - rep(1:0, 15)

# R duplicated() returns these values.  You can see they
# are the same for k1, but dissimilar after row 10 for k2.
tt$duplicated.k1 - 0+!duplicated(tt$k1)
tt$duplicated.k2 - 0+!duplicated(tt$k2)


It depends on how you use duplicated()


all.equal( tt$sas.first.k2, 0+!duplicated( tt[, c(k1,k2) ] ) )

[1] TRUE




Chuck



# I've found I need to lag a column to mimic SAS' first.
# operator, thusly, though perhaps someone else knows
# differently.  Note this does not work on unordered
# dataframes!
lag.k1 - c(NA, tt$k1[1:(nrow(tt) - 1)])
tt$r.first.k1 - ifelse(is.na(lag.k1), 1, tt$k1 != lag.k1)

lag.k2 - c(NA, tt$k2[1:(nrow(tt) - 1)])
tt$r.first.k2 - ifelse(is.na(lag.k2), 1, tt$k2 != lag.k2)

Mimicking SAS' last. operation can be done in a similar manner, by
anti-laging the column of interest and changing the comparisons somewhat.

Enjoy the days,
cur
--
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.c...@epa.gov
541/754-4638

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] efficient conversion of matrix column rows to list elements

2010-11-17 Thread Charles C. Berry


On Wed, 17 Nov 2010, Chris Carleton wrote:


Hi List,

I'm hoping to get opinions for enhancing the efficiency of the following
code designed to take a vector of probabilities (outcomes) and calculate a
union of the probability space. As part of the union calculation, combn()
must be used, which returns a matrix, and the parallelized version of
lapply() provided in the multicore package requires a list. I've found that
parallelization is very necessary for vectors of outcomes greater in length
than about 10 or 15 elements, which is why I need to make use of multicore
(and, therefore, convert the combn() matrix to a list). It would speed the
process up if there was a more direct way to convert the columns of combn()
to elements of a single list.



I think you are mistaken.

Is this what Rprof() tells you?

On my system, combn() is the culprit


Rprof()
outcomes - 1:25
nada - replicate(200, {apply(combn(outcomes,2),2,column2list);NULL})
Rprof(NULL)
summaryRprof()

$by.self
  self.time self.pct total.time total.pct
combn0.6461.54   0.70 67.31
apply0.2019.23   1.04100.00
FUN  0.10 9.62   1.04100.00
!=   0.04 3.85   0.04  3.85
0.02 1.92   0.02  1.92
-0.02 1.92   0.02  1.92
is.null  0.02 1.92   0.02  1.92


And it hardly takes any time at that!


HTH,

Chuck

p.s. Isn't

as.data.frame( combn( outcomes, 2 ) )
or
combn(outcomes, 2, list )

good enough?


Any constructive suggestions will be greatly

appreciated. Thanks for your consideration,

C

code:

unionIndependant - function(outcomes) {
   intsctn - c()
   column2list - function(x){list(x)}
   pb -
ProgressBar(max=length(outcomes),stepLength=1,newlineWhenDone=TRUE)
   for (i in 2:length(outcomes)){
   increase(pb)
   outcomes_ - apply(combn(outcomes,i),2,column2list)
   for (j in 1:length(outcomes_)){outcomes_[[j]] -
outcomes_[[j]][[1]]}
   outcomes_container - mclapply(outcomes_,prod,mc.cores=3)
   intsctn[i] - sum(unlist(outcomes_container))
   }
   intsctn - intsctn[-1]
   return(sum(outcomes) - sum(intsctn[which(which((intsctn %in% intsctn))
%% 2 == 1)]) + sum(intsctn[which(which((intsctn %in% intsctn)) %% 2 == 0)])
+ ((-1)^length(intsctn) * prod(outcomes)))
}

PS This code has been tested on vectors of up to length(outcomes) == 25 and
it should be noted that ProgressBar() requires the R.utils package.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.matrix behaves weird

2010-11-14 Thread Charles C. Berry


On Sun, 14 Nov 2010, Feng Mai wrote:


ncol is not an argument for as.matrix()



Perhaps this is more than Alexx wanted to know, but ...

Things can get a wee bit tricky when the function is a 'generic'.

More precisely:

'ncol' is not an argument for any method for as.matrix() that is on 
your search path.


as.matrix(x,...) is an S3 generic function. try

methods( as.matrix )

and you will see what methods are available for it. And some do have 
arguments other than 'x':



args( as.matrix.data.frame )

function (x, rownames.force = NA, ...)
NULL




If you wanted to (but you don't, this is just didactic) you could make a 
method that accepts the ncol arg for a particular class



as.matrix.columned - function(x,ncol=1) matrix(x,ncol=ncol)
x - 1:3
class(x) - 'columned'
as.matrix(x, ncol=3)

 [,1] [,2] [,3]
[1,]123




HTH,

Chuck



Alexx Hardt mikrowelle1234 at gmx.de writes:



Hi,
can someone tell me why x is still a 2x1-matrix in the last step?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count occurrence and distance of characters in string

2010-11-04 Thread Charles C. Berry


On Thu, 4 Nov 2010, Immanuel wrote:


Hello all,

I want to know how often one character occurs in a given string
and the distance from between every two occurences. (distance = other
characters between them).


You should provide commented, minimal, self-contained, reproducible code 
as asked.


And especially for a question like this one with many simple answers that 
RespondeRs will shower you with if only you give them a starting point.


Use tapply, strsplit, seq, nchar, unlist, diff, -, and table for one 
way.


Chuck



thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count occurrence and distance of characters in string

2010-11-04 Thread Charles C. Berry


On Fri, 5 Nov 2010, Immanuel wrote:


Hey,

thanks for the answer, actually I already typed an example
but deleted it since I thought it's superfluous.
regards

-
string - kjokllokkoadddo

# f1(string, o) should return that o was found 4 times
# f2(string, o) should return that the distances between the o's
found is 3 , 2, 4
-


In that case, I'd use split:


res - split(seq(nchar(string)),unlist(strsplit(string,'')))
length(res[['o']])

[1] 4
## or 
sapply(res,length)

a d j k l o
1 3 1 4 2 4

diff(res[['o']])-1

[1] 3 2 4

# or
sapply(sapply(res,diff),-,1)

$a
numeric(0)

$d
[1] 0 0

$j
numeric(0)

$k
[1] 2 3 0

$l
[1] 0

$o
[1] 3 2 4




Chuck





On 11/05/2010 12:28 AM, Charles C. Berry wrote:

On Thu, 4 Nov 2010, Immanuel wrote:


Hello all,

I want to know how often one character occurs in a given string
and the distance from between every two occurences. (distance = other
characters between them).


You should provide commented, minimal, self-contained, reproducible
code as asked.

And especially for a question like this one with many simple answers
that RespondeRs will shower you with if only you give them a starting
point.

Use tapply, strsplit, seq, nchar, unlist, diff, -, and table for one
way.

Chuck



thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. BerryDept of Family/Preventive
Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
92093-0901








Charles C. BerryDept of Family/Preventive Medicine
cbe...@tajo.ucsd.eduUC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging nested lists

2010-10-28 Thread Charles C. Berry


On Wed, 27 Oct 2010, Charles C. Berry wrote:


On Wed, 27 Oct 2010, Alex P. wrote:


 Hello All,

 I have multiple list of lists in the form of

 Mylist1[[N]][[K]]$Name_i,

 with N=1..6, K=1..3, and i=1..7. Each Name_i is a matrix. I have 30 of
 these objects Mylist1, Mylist2, ...

 I would like to merge these lists by each Name_i using rbind, but I
 couldn't figure out how to do it. What I want at the end is a single list
 of lists, again in the form of Mylist[[N]][[K]]$Name_i. Manually doing it
 is not feasible given the large number of Mylist objects.



Turn them into a single array of mode 'list', then do the rbind'ing, and save 
the result as an array (or see ?relist for imposing the nested structure of 
the orignal lists)


Something like:


I see a few hiccups below.

Maybe Alex P. should include a minimal, self-contained example that 
RespondeRs could try out...




objs - paste('MyList',1:30,sep='')
big.list - lapply( objs, get )
for (i in 1:4) big.list - unlist(big.list,recursive=FALSE)


for (i in 1:3) ...


dim( big.list ) - c(30, 6, 3, 7 )


dim( big.list ) - rev( c(30, 6, 3, 7 ) )


res - apply( big.list, 2:4, rbind )
dim(res) - c( 30, 3, 7 )


res - apply( big.list, 1:3, function(x) list( do.call( rbind, x ) ))


Chuck





HTH,

Chuck


 Thanks in advance,

 Alex

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
   Dept of Family/Preventive 
Medicine

E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging nested lists

2010-10-27 Thread Charles C. Berry


On Wed, 27 Oct 2010, Alex P. wrote:


Hello All,

I have multiple list of lists in the form of

Mylist1[[N]][[K]]$Name_i,

with N=1..6, K=1..3, and i=1..7. Each Name_i is a matrix. I have 30 of these 
objects Mylist1, Mylist2, ...

I would like to merge these lists by each Name_i using rbind, but I 
couldn't figure out how to do it. What I want at the end is a single 
list of lists, again in the form of Mylist[[N]][[K]]$Name_i. Manually 
doing it is not feasible given the large number of Mylist objects.




Turn them into a single array of mode 'list', then do the rbind'ing, and 
save the result as an array (or see ?relist for imposing the nested 
structure of the orignal lists)


Something like:

objs - paste('MyList',1:30,sep='')
big.list - lapply( objs, get )
for (i in 1:4) big.list - unlist(big.list,recursive=FALSE)
dim( big.list ) - c(30, 6, 3, 7 )
res - apply( big.list, 2:4, rbind )
dim(res) - c( 30, 3, 7 )


HTH,

Chuck


Thanks in advance,

Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question on passing the subset argument to an lm wrapper

2010-10-25 Thread Charles C. Berry


On Sun, 24 Oct 2010, Erik Iverson wrote:


Hello,

How would you go about handling the following situation?
This is on R 2.12.0 on Ubuntu 32-bit.

I have a wrapper function to lm.  I want to pass in a
subset argument.  First, I just thought I'd use 


The subset arg needs to be unevaluated until lm() - or whatever function 
- is called in your function.


The canonical advice for this kind of question about passing unevaluated 
args is to study the first lines of the function lm noting what it does 
with objects cl - match.call() and mf - match.call( expand.dots = FALSE 
).


Something like this might be what you want:

testlm2 - function(formula, ...) {
mc - match.call()
mc[[1]] - as.name('lm')
eval(mc)
}

testlm2(bmi ~ age, data= df1, subset = age  50)

HTH,

Chuck




## make example reproducible
set.seed(123)
df1 - data.frame(age = rnorm(100, 50, 10),
  bmi = rnorm(100, 30, sd = 2))

## create a wrapper using ...
testlm - function(formula, ...) {
 lm(formula, data = df1, ...)
}


 testlm(bmi ~ age, subset = age  50)


Error in eval(expr, envir, enclos) :
  ..1 used in an incorrect context, no ... to look in

I found some other examples of this error message,
but couldn't piece together how it fits in with this
example.

Next, I tried specifying a subset argument.

testlm2 - function(formula, subset) {
 lm(formula, data = df1, subset = subset)
}


 testlm2(bmi ~ age, subset = age  50)


Error in xj[i] : invalid subscript type 'closure'

I also don't understand this one.

Any pointers on if I'm just missing the easy
solution to do what I want?  Any explanations
as to the above behavior (I know it has to do
with model.frame, but not sure how) would also
be greatly appreciated!

Thanks!
--Erik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Putting the same array into a matrix

2010-10-14 Thread Charles C. Berry


On Thu, 14 Oct 2010, Desmond Lim wrote:


Hi,

I have an array and I want to put in into a matrix x number of times. Currently 
I doing this

matrix - cbind(array, array, array).

Is there a more elegant way of doing this?


Fortunately!

If 'array' really is a matrix (bad choice of names here, Bub!), then

a.matrix - matrix( rep( array, 3 ), nc = ncol( array )* 3 )

But this will work too:

a.matrix - do.call( cbind, rep( list( array ), 3) )

even if 'array' is a data.frame

HTH,

Chuck




I've tried

matrix - cbind(rep(array, times=x)) and matrix - rep(cbind(array), times = 5)

but it didn't work.

Thanks.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [OT] (slightly) - OpenOffice Calc and text files

2010-10-13 Thread Charles C. Berry


On Wed, 13 Oct 2010, Schwab,Wilhelm K wrote:


It will get a good look, as will gnumeric - thanks to all!



emacs org-mode can convert your tab delimited file to a 'table' that you 
can edit either using org-mode functions OR as plain text by switching to 
fundamental mode.


In emacs speak, just put the cursor at the top of a buffer holding your 
file and do


M-x replace-string RET TAB RET | RET

I think, then move your cursor to a line that has a '|' in it and hit TAB, 
and you have a neatly formatted table.


See,

http://orgmode.org/worg/org-tutorials/tables.php

for an intro.

A big advantage in using an org-mode table is you can place an R source 
code block further down in the same file, and it can read in the data in 
the table. Then you can go back to the table to edit, then rerun R, ...


I append an example below.

There is a load of tutorial info at

http://orgmode.org/worg/org-tutorials/index.php

HTH,

Chuck


#+begin_example
#+tblname: simpleDF
| a | b | c |
|---+---+---|
| 1 | 2 | 3 |
| 5 | 4 | 2 |
|---+---+---|
#+end_example

#+begin_src R :var df=simpleDF :results output :colnames yes

   summary( df )

#+end_src

#+results:
:a   b c
:  Min.   :1   Min.   :2.0   Min.   :2.00
:  1st Qu.:2   1st Qu.:2.5   1st Qu.:2.25
:  Median :3   Median :3.0   Median :2.50
:  Mean   :3   Mean   :3.0   Mean   :2.50
:  3rd Qu.:4   3rd Qu.:3.5   3rd Qu.:2.75
:  Max.   :5   Max.   :4.0   Max.   :3.00





Bill




From: Albyn Jones [jo...@reed.edu]
Sent: Wednesday, October 13, 2010 2:14 PM
To: Schwab,Wilhelm K
Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files

emacs shows you exactly what is there, nothing more nor less.
it isn't a spreadsheet, but tabs will align columns.

albyn

On Wed, Oct 13, 2010 at 01:53:46PM -0400, Schwab,Wilhelm K wrote:

Albyn,

I'll look into it.  In fact, I have a small book on it that I bought in my very 
early days of using Linux.  I quickly found TeX Maker (for the obvious), 
Code::Blocks for C/C++ and I would not have started the move without a working 
Smalltalk (http://pharo-project.org/home).

For editing data files, I really just want something that shows data in an 
understandable grid and does not do weird stuff thinking it's being helpful.

Bill



From: Albyn Jones [jo...@reed.edu]
Sent: Wednesday, October 13, 2010 1:39 PM
To: Schwab,Wilhelm K
Cc: r-help@r-project.org
Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files

How about emacs?

albyn

On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote:

Hello all,
.
Have any of you found a nice (or at least predictable) way to use OO Calc to 
edit files like this?  If it insists on thinking for me, I wish it would think 
in 24 hour time and 4 digit years :)  I work on Linux, so Excel is off the 
table, but another spreadsheet or text editor would be a viable option, as 
would configuration changes to Calc.

Bill

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Albyn Jones
Reed College
jo...@reed.edu




--
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poisson Regression

2010-10-13 Thread Charles C. Berry


On Wed, 13 Oct 2010, David Winsemius wrote:



On Oct 13, 2010, at 4:50 PM, Antonio Paredes wrote:


Hello everyone,

I wanted to ask if there is an R-package to fit the following Poisson
regression model

log(\lambda_{ijk}) = \phi_{i} + \alpha_{j} + \beta_{k}
i=1,\cdots,N (subjects)
j=0,1 (two levels)
k=0,1 (two levels)

treating the \phi_{i} as nuinsance parameters.


If I am reading this piece correctly there should be no difference between a 
conditional treatment of phi_i in  that model and results from the 
unconditional model one would get from fitting with


glm(lambda ~ phi + alpha + beta  ,family=poisson).


Right.

But if N is large, the model.matrix will be huge and there may be problems 
with memory and elapsed time.


loglin() and loglm() will fit the same model without need for a 
model.matrix (modulo having enough data to actually fit that model), and 
large values of N are no big deal.


HTH,

Chuck


http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.9679rep=rep1type=pdf

(But I am always looking for corrections to my errors.)

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to convert a list to a ... argument for a function

2010-10-05 Thread Charles C. Berry


On Tue, 5 Oct 2010, Richard R. Liu wrote:


Hi,


I have a function f - function(..., func){ something }, where func is a
function of the form function(...). ??I would like to pass func all the 
arguments
passed to f except the last. ??I know that I can manipulate the variable number
of arguments passed to f by converting ... to a list, i.e., arglist -
list(...). ??But how do I pass func the first n-1 list items of arglist (n -
length(arglist)), as n-1 arguments, not as one list of n-1 items?



One way:

foo - function(...,func){
mc - match.call()
mc[[1]] -mc$func
mc$func - NULL
mc[[ length(mc) ]] - NULL
eval(mc) }

foo(1,2,3,4,func=c)
foo(1,2,4,3,func=sum)


or perhaps you meant like this:

foo - function(...,func){
mc - match.call()
mc[[1]] -mc$func
mc$func - NULL
eval(mc) }

foo(1,2,3,4,func=c)
foo(1,2,4,3,func=sum)


HTH,

Chuck



Regards,
Richard



Richard R. Liu
richard@pueo-owl.ch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about Reduce

2010-10-01 Thread Charles C. Berry


On Fri, 1 Oct 2010, Dimitri Liakhovitski wrote:


Hello!

In the example below the Reduce() function by default assigns a to
be the last accumulated value and b to be the current value in x.
I could not find this documented anywhere as the default settings for
the Reduce() function.  Does any sort of documentation for this
behavior exist?


Yes.

Under 'Details' on the Reduce help page:

... a left reduce computes l_1 = f(v_1, v_2), l_2 = f(l_1, v_3),
etc...

HTH,

Chuck


x -  c(0,0,0,0,0,1,0,0,0,5,0,0,0,7,0,0,0,8,5,10)
Reduce(function(a,b) b + (a * 0.5),x,accumulate=T,init=0)

[1]  0.000  0.000  0.000  0.000  0.000  0.000
1.000  0.500  0.250
[10]  0.125  5.0625000  2.5312500  1.2656250  0.6328125  7.3164062
3.6582031  1.8291016  0.9145508
[19]  8.4572754  9.2286377 14.6143188


Thanks a lot!

--
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to apply vector value function to a multidimensional array indexed by the remaining dimensions?

2010-10-01 Thread Charles C. Berry


On Fri, 1 Oct 2010, yunjiangster wrote:



Hi,
 I am looking for some generalization of colSums and rowSums for general
vector valued functions, and for arrays of more than 2 dimensions.
 So as a concrete example, suppose I have a 3 dimensional array, given by x
= array(1:100,c(3,4,5)).
and I want to sum the 3rd index of x to obain a 3 by 4 matrix. Using rowSums
would return a vector of length 3 because it treats the last two indices as
a single index.



see the help page for colSums, esp. the 'dims' arg


 Besides summation, let's say if I define a vector valued function
f-function(x){min(x[1],min(x[2],x[3]))}, and I want to apply f to the last
index of x, meaning for each 1= i = 3, 1=j =4, I want to compute
f(x[i,j,]), and then put them in a 3 by 4 matrix. How would I be able to do
that without using a for loop?



?apply


 thanks.

 Sincerely,
 John
--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-apply-vector-value-function-to-a-multidimensional-array-indexed-by-the-remaining-dimensions-tp2937368p2937368.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave and LaTeX beamer class

2010-09-30 Thread Charles C. Berry


On Thu, 30 Sep 2010, Johannes Huesing wrote:


I am failing to uncover Sweave chunks step by step using the LaTeX beamer
class.

The following minimal example:

\documentclass{beamer}
\usepackage{Sweave}
\begin{document}
\begin{frame}[fragile]
 In the year \uncover2-{25}\uncover3-{\Sexpr{5*5}}
\uncover4-{
echo=TRUE, print=TRUE=
5*5*101
@
}
\end{frame}
\end{document}

leads to an error message when running pdflatex over the *.tex file:

[...]
! FancyVerb Error:
 Extraneous input ` 2525 \end {Soutput} \end {Schunk} \bea...@endcovered ' bet
ween \begin{Soutput}[key=value] and line end
.
\...@error ... {FancyVerb Error:
\space \space #1
}



I do not have  enough of an understanding of LaTeX to explain this, but it 
seems \uncover can be tricky.


This works for me:

\documentclass{beamer}
\usepackage{Sweave}
\begin{document}
\begin{frame}[fragile]
  In the year \uncover2-{25}\uncover3-{\Sexpr{5*5}}
\begin{uncoverenv}4-
echo=TRUE, print=TRUE=
5*5*101
@ %def
\end{uncoverenv}
\end{frame}
\end{document}



--
Johannes H??sing   There is something fascinating about science.
 One gets such wholesale returns of conjecture
mailto:johan...@huesing.name  from such a trifling investment of fact.
http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] drawing samples based on a matching variable

2010-09-29 Thread Charles C. Berry


On Tue, 28 Sep 2010, L Brown wrote:


Hi, everyone. I have what I hope will be a simple coding question. It seems
this is a common job, but so far I've had trouble finding the answer in
searches.

I have two matrices (x and y) with a different number of observations in
each. I need to draw a random sample without replacement of observations
from x, and then, using a matching variable, draw a sample of equal size
from y. It is the matching variable that is hanging me up.

For example--


# example matrices. lets assume seed always equals 1. (lets also assume I

have assigned variable names A and B to my columns..)

set.seed(1)
x-cbind(1:10,sample(1:5,10,rep=T))
x

 [A] [B]
[1,]12
[2,]22
[3,]33
[4,]45
[5,]52
[6,]65
[7,]75
[8,]84
[9,]94
[10,]   101



Looks like set.seed(1) was invoked here, too.


y-cbind(1:14,sample(1:5,14,rep=T))
y

 [A] [B]
[1,]12
[2,]22
[3,]33
[4,]45
[5,]52
[6,]65
[7,]75
[8,]84
[9,]94
[10,]   101
[11,]   112
[12,]   121
[13,]   134
[14,]   142


#draw random sample of n=4 without replacement from matrix x.
x.samp-x[sample(10,4,replace=F),]
x.samp

[A] [B]
[1,]33
[2,]45
[3,]52
[4,]75

Next, I would need to draw four observations from matrix y (without
replacement) so that the distribution of y$B is identical to x.samp$B.

I'd appreciate any help, and sorry to post such a basic question!



Break it down like this:


x.levels - sort( unique(x[,2]) )
x.samp.tab - table( factor( x.samp[,2], x.levels ) )
y.rows - split(1:nrow(y),factor( y[,2], x.levels ) )
unlist( mapply( sample, y.rows, x.samp.tab ),use.names=FALSE )


In some cases sample might fail if

length( y.rows[[i]] )  x.samp.tab[ i ]

you can trace which element that was using 'traceback()' or write a 
wrapper for sample() that checks that condition.


HTH,

Chuck



LB

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sample from very large distribution

2010-09-29 Thread Charles C. Berry


On Thu, 30 Sep 2010, Matthew Finkbeiner wrote:


I don't have enough RAM for this problem, so I need a work around.  This is
what I want to do:

y- sample(2^32, 10, replace=FALSE)



y - trunc(runif( 10, 1, 2^32+1))
while( any( dup.y -duplicated(y) ) ) y[dup.y] -
trunc(runif( sum(dup.y), 1, 2^32+1))

HTH,

Chuck


but my machine won't let me do that.  so I now do this:

x- seq(1,2^32, by=100)
y- sample(x, 10, replace=FALSE)

this works fine, but by selecting every 100th item, it introduces a
systematicity that may be problematic.

I've tried this:
x- seq(1,2^32, by=sample(1:200, 1))

but that yields some unpredictable behavior

so, any suggestions?

Thank you kindly,

Matthew

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT: Is randomization for targeted cancer therapies ethical?

2010-09-20 Thread Charles C. Berry


On Mon, 20 Sep 2010, Bert Gunter wrote:


Hi Folks:

**Off Topic**

Those interested in clinical trials may find the following of interest:

http://www.nytimes.com/2010/09/19/health/research/19trial.html

It concerns the ethicality of randomizing those with life-threatening
disease to relatively ineffective SOC when new biologically targeted
therapies appear to be more effective. While the context may be new,
the debate, itself, is not: Tukey wrote (or maybe it was talked -- I
can't remember for sure) about this about 30 years ago. I'm sure many
other also have done so.


Anscombe's remarkable (and influential) review of Armitage's 'Sequential 
Medical Trials' back in 1963


http://www.jstor.org/stable/2283272

is worth a look by any statistician who is interested in this topic.

It makes explicit several factors that weigh in the ethical assessment of 
a particular trial design.


He discusses in formal terms the weighing of outcomes for patients 
in the trial at hand aginst those of future patients and the impact that 
this might have on design decisions.


HTH,

Chuck




Cheers,

Bert
--
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] break the long R code lines automatically

2010-08-24 Thread Charles C. Berry


On Tue, 24 Aug 2010, heyi xiao wrote:






Dear
all,

I have
written some R source program with many thousands of lines. I didn???t insert
line breaks automatically or manually for the long lines. But now I would like
to edit the source code in Emacs/ESS to make it more formal as a package. One of
the major problems here is how to break the long lines automatically. Emacs 
auto-fill-mode
only works for the lines you are typing in currently, and fill commands like M-q
(fill-paragraph) or M-x fill-region (fill-region) mess up the R code lines as
they take a whole function/paragraph as a long line, and remove the original
line breaks.

I find
simple solutions for indenting code regions in Emacs/ESS, but no good ones for
breaking code lines. However, I saw the nice multi-line codes in all
R/Bioconductor packages. Please let me know if you have any ideas on how people
usually break the existent long R code lines automatically. I will really
appreciate your kind help!



Not particualrly elegant, but a combination of parse and print will break 
long lines:




cat(y - ,paste( 1:20,collapse= + ),\n,y2 - ,

+  paste( 1:20,collapse=+),\n,file=testwrap.R)

for (iexpr in parse(testwrap.R)) print(iexpr)

y - 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 +
14 + 15 + 16 + 17 + 18 + 19 + 20
y2 - 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 +
14 + 15 + 16 + 17 + 18 + 19 + 20


nchar(readLines(testwrap.R))

[1] 95 59




and of course you will want 'sink' or some such to save the lines.

HTH,

Chuck




Heyi





[[alternative HTML version deleted]]




Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] on abort error, always show call stack?

2010-08-22 Thread Charles C. Berry


On Sun, 22 Aug 2010, Matt Shotwell wrote:


On Sun, 2010-08-22 at 11:41 -0400, ivo welch wrote:

Dear R Wizards---is it possible to get R to show its current call
stack (sys.calls()) upon an error abort?  I don't use ESS for
execution, and it is often not obvious how to locate how I triggered
an error in an R internal function.  Seeing the call stack would make
this easier.  (right now, I sprinkle cat statements everywhere, just
to locate the line where the error appears.)  Of course, I would
really love to see the line in my program that triggered this, but I
have asked this before, and I understand this is too difficult to get
into the R language.


The traceback() function will print out the call stack after an error.
However, you may find the debug() family of functions more useful for
debugging. Also see the browser() function.



Further, you might set

options( error = recover )

which prints the list of current calls, and prompts the user to select 
one of them. And then allows browser()-ing.


Chuck




-Matt



regards,

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] spare matrix replacing values efficiently

2010-08-22 Thread Charles C. Berry


On Sun, 22 Aug 2010, david h shanabrook wrote:


I am working with a large sparse matrix trying replace all 1 values with 0.  
The normal method doesn't work.  Here is a small example:


x - Matrix(0,nrow=10,ncol=10,sparse=TRUE)
x[,1] - 1:2
x

10 x 10 sparse Matrix of class dgCMatrix

[1,] 1 . . . . . . . . .
[2,] 2 . . . . . . . . .
[3,] 1 . . . . . . . . .
[4,] 2 . . . . . . . . .
[5,] 1 . . . . . . . . .
[6,] 2 . . . . . . . . .
[7,] 1 . . . . . . . . .
[8,] 2 . . . . . . . . .
[9,] 1 . . . . . . . . .
[10,] 2 . . . . . . . . .

x[x==1] - 0

Error in .local(x, i, j, ..., value) :
not-yet-implemented 'Matrix[-' method


Suggestions?  Any solution must be memory efficient as my sparse matrix is 
large.


Looks like

attr( x, x )[ attr( x, x ) == 1 ] - 0

should do it, but does not reduce the density of the matrix. In order to 
get that you'll need to do


x - drop0(x)


HTH,

Chuck



dhs
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making a series of similar, but modified .r files - suggested method(s)?

2010-08-21 Thread Charles C. Berry


On Sat, 21 Aug 2010, Laura S wrote:


Dear all:

Any suggestions are much appreciated. I am looking for a way to make a
series of similar, but slightly modified, .r files.

My issue is automating making 320 .r files that change the for(i in 1:x) in
my base .r file (as well as other elements, e.g., the load(...),
setwd(...)). For smaller jobs running on a single computer with batch files,
I have been manually changing the for(i in 1:x) line, etc..



R is a scripting language (among other things!).

You can read in a template .R file and then substitute pieces of it inside 
a loop that writes the revised peices to a directory from which you can 
later source or BATCH them.


Here is a simple example with just one substitution:

Here is the template file where the pieces to be substituted is #for i#:

,[ template.R ]
| #for i#{
| toupper(i)
| }
|
`


template - readLines(template.R) # read it in!
dir.create(temp) # make a place to save new .Rs
# modify the template
for (i in letters[1:5]){

+  cat( sub( #for i#,
+paste(i - ',i,'\n,sep=''),
+template), sep='\n',
+file=file.path(temp,paste(i,R,sep='.')))
+ }

# look at a.R
readLines(file.path(temp,a.R))

[1] i - 'a'   {  toupper(i) }  

sapply(Sys.glob(temp/*.R),source) # Run them all

temp/a.R temp/b.R temp/c.R temp/d.R temp/e.R
value   A  B  C  D  E
visible TRUE TRUE TRUE TRUE TRUE




HTH,

Chuck



Why does this matter to me? I am planning on running a simulation experiment
on a linux cluster as a serial job. Although not elegant, it has been
suggested I make 320 .r files so qsub runs one .r file and then selects
other jobs. Thus, the manual route I am currently using would take a very
long time (given multiple runs of 320 .r files, given experimental
replication).

Thank you,
Laura


--
 Genius is the summed production of the many with the names of the few
attached for easy recall, unfairly so to other scientists

- E. O. Wilson (The Diversity of Life)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] U value from wilcox.test

2010-08-20 Thread Charles C. Berry



Issues like that in this thread can often be resolved by reading the
help page for the relevant function.

From:
?wilcox.test

Note

The literature is not unanimous about the definitions of the Wilcoxon
rank sum and Mann-Whitney tests. The two most common definitions
correspond to the sum of the ranks of the first sample with the
minimum value subtracted or not: R subtracts and S-PLUS does not,
giving a value which is larger by m(m+1)/2 for a first sample of size
m. (It seems Wilcoxon's original paper used the unadjusted sum of the
ranks but subsequent tables subtracted the minimum.)

R's value can also be computed as the number of all pairs (x[i], y[j])
for which y[j] is not greater than x[i], the most common definition of
the Mann-Whitney test.



HTH,

Chuck

On Fri, 20 Aug 2010, Cedric Laczny wrote:


Hi Chloe,

first of all, I want to note, that you should be careful using the WMW-test.
Even though it is often reported to be some sort of a swiss-army-knife for
comparing two distributions, recent research on this test has revelaed that it
is crucial what hypotheses you consider. Also the assumptions imposed to the
test are critical. For the assumptions, the test basically is a test on
identical distributions. For your sample sizes, this is in my opinion quite
problematic, as you can not really know what the population distributions look
like. Furthermore, the test has shown to be quite strongly influenced by
differing variances in the two groups. All this is more or less valid for not
necessarily small sample sizes, therefore I am not sure how much it might
affect your results. Therefore, caution should be adressed to the
interpretation of the results.

On Friday, 20. August 2010 19:41:55 Chloe wrote:

Dear all,
I want to compare the efficiency of 2 methods in extracting proteins from
algal samples. I collected 6 independant algal samples and I extracted 3 by
the method 1 and 3 others by the method 2.
So I have 2 groups of 3 samples, that are not paired. I would like to know
if the results obtained by these 2 methods are significantly different, I
hope method 2 to be more efficient than method 1. As I have few data I went
for the Mann-whitney test:

method1=c(35,40,56)
method2=c(90,110,115)
wilcox.test(method1,method2,paired=FALSE,alternative=less)

  Wilcoxon rank sum test

data:  method1 and method2
W = 0, p-value = 0.05
alternative hypothesis: true location shift is less than 0

As I have a small number of samples, I would prefer to compare the U value
of the Mann-Whitney test to critical value for table rather than to rely on
the p-value.

Is W value correspond to U value ?


From the help I understand that W=U+m*(m+1)/2, is this true ?


In the case it is, my U values would be U=W-6=-6!! I thought that a U value
could not be neagtive.


Im a little bit puzzled on this one... I would agree with you. I can't really
help you with this one right now, but doing the calculation of U manually is
not really hard for your problem. All the values from method 2 are higher than
the ones from method 1. So the ranking would be:

method1 : 1,2,3
method2: 4,5,6
= W(rank sum)_m,n = 1 + 2 + 3 = 6

If I use the definition of U from http://de.wikipedia.org/wiki/Mann-Whitney-U-
Test
I would calculate U = 0 , which goes with your formula: U = W - 6 = 6 - 6 = 0,
what makes sense because the values of X are never greater than the ones of Y.
(s. link: the formula for U with the two summations )

Thinking of that, the usage of W in R might simply be misleading and it could
indeed represent U.



I would be happy to have any information about how to obtain the U value
from the Mann-Withney test (wilcox.test()) in order to be able to compare
it with table of critical U value commonly found.
Thanks a lot for your time and help
Have a nice day,
Chlo??


For your sample sizes you can nicely use the critical value tables that can be
found easily on the net.

I hope I could help with your problem, if not, please feel free to ask
further.

Best,

Cedric

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] functions and multiple levels

2010-08-18 Thread Charles C. Berry


On Wed, 18 Aug 2010, chris20 wrote:




Hi,
I am trying to write a function;
I want to subtract the mean of each class in level 2 from the mean of each
class in level 1 and square the answer, eg.

  level.1  level.2  observation
 1 1 0.5
 1 1 0.2
 1 2 0.6
 1 2 0.4
 2 3 0.8
 2 3 0.7
 2 4 0.6
 2 4 0.4

(mean(1$level.1) - mean(1$level.2))^2
(mean(1$level.1) - mean(2$level.2))^2

etc.


Chris,

Almost always best to break things into little pieces, like this:


# read data (copy above to clipboard)
dat - read.table(clipboard,head=T, 

+colClasses=c('factor','factor','numeric'))

# means of level.1
m1 - coef(lm(observation~0+level.1,dat))
# means of level.2
m2 - coef(lm(observation~0+level.2,dat))
# all differences
outer( m1, m2, '-')

 level.21 level.22 level.23 level.24
level.110.075   -0.075   -0.325   -0.075
level.120.2750.125   -0.1250.125


# all differences squared
outer( m1, m2, '-')^2

 level.21 level.22 level.23 level.24
level.11 0.005625 0.005625 0.105625 0.005625
level.12 0.075625 0.015625 0.015625 0.015625




HTH,

Chuck



I want to write a function because I have lots of levels and lots of
different observations.
I thought this should be easy (it's my first attempt at writing a function)
but I just can't work it out!

Thanks
Chris
--
View this message in context: 
http://r.789695.n4.nabble.com/functions-and-multiple-levels-tp2329935p2329935.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot in cartesian plane

2010-08-17 Thread Charles C. Berry


On Wed, 18 Aug 2010, Pablo Cerdeira wrote:


Hi all,

I'm trying to plot this two curves in a single cartesian plane, but when I
plot the first one, the plot appears with no negative y value. When I plot
the second curve, it almost does not apear in the graph.

I was trying the plot.window but with no success.

Can someone help me with this?


The answer is in

?par

but it requires a bit of deduction.  Start with the description of yaxs, 
which refers to xaxs. Notice what the xaxs paragraph says about xlim. If 
you can figure out that there is also a ylim arg, then you are pretty much 
finished. Something like


plot( f, -10, 10, ylim=c(-100,100) )

will reveal both curves.

xlim and ylim also appear briefly in R-Intro ni Appendix A.

FWIW, one can use xlim and ylim to 'zoom in' on a section of a plot.

HTH,

Chuck



If possible, I'd like to plot this curves in a perfect cartesian plane.

f = function(x) {
 x^2
}
f2 = function(x) {
 -x^2
}

plot(f,-10,10)
abline(h=0, v=0, col = gray60)
curve(f2,col=orange, add=T)

Thanks in advanced!


*pablo de camargo cerdeira*
pablo.cerde...@gmail.com | pa...@fgv.br
Phone: +55-(21)-3799-6065
[image: Facebook] http://www.facebook.com/pablo.cerdeira[image:
LinkedIn]http://br.linkedin.com/in/pablocerdeira[image:
Google] http://www.google.com/profiles/pablo.cerdeira
[image: Google Talk/]pablo.cerdeira [image: Skype/]pablocerdeira

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge function in R?

2010-08-13 Thread Charles C. Berry


On Fri, 13 Aug 2010, fishkbob wrote:



So I have a bunch of c(start,end) points and want to consolidate them into as
few c(start,end) as possible.

For example:
sample   startend
A  5   10
B  7   18
C  14
D  16  20

I'd want the function to return the two distinct sets (1,4) and (5,20)

Is there an R function that already does this?


Yes.

See the reduce() function in the IRanges package on BioConductor

See pages 11-12 of

http://www.bioconductor.org/packages/2.6/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf


HTH,

Chuck



or should I write my own? (how would I go about that?)
--
View this message in context: 
http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324684.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading fixed width format data with 2 types of lines

2010-08-12 Thread Charles C. Berry


On Thu, 12 Aug 2010, Tim Gruene wrote:


I don't know if it's elegant enough for you, but you could split the file into
two files with 'grep ^3 file  file_3' and 'grep ^4 file  file_4'
and then read them in separately.



along the same lines, but all in R (untested)

original.lines - readLines( filename )

tcon.3 - textConnection( grep( ^3, original.lines, value=T ))
res.3 - read.fwf( tcon.3, etc )
close(tcon.3)

tcon.4 - textConnection( grep( ^4, original.lines, value=T ))
res.4 - read.fwf( tcon.4, etc )
close(tcon.4)

rm( original.lines )

Or skip the readLines() step and use

tcon.3 - pipe(paste(grep '^3',filename))

...

I think you can use 'findstr.exe' on windows in lieu of grep.

HTH,

Chuck





Tim

On Thu, Aug 12, 2010 at 01:57:19PM -0400, Denis Chabot wrote:

Hi,

I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large 
number of old fwf files with 2 types of lines. Lines that begin with 3 in first column 
carry one set of variables, and lines that begin with 4 carry another set, like this:

???
3A00206546L07004901609004599  1015002  001001008010004002004007003   001
3A00206546L07004900609003099  1029001002001001006014002
3A00206546L07004900229000499  1015001001
3A00206546L070049001692559049033  1015 018036024
3A00206546L07004900229000499  1001   002
4A00176546L06804709001011100060651640015001001501063   065914
4A00176546L068047090010111000407616 1092   095614
4A00196546L098000100010111001706214450151062   065914
4A00176546L068047090010111000505913 1062   065914
4A00196546L09800010001011100260472140002001000201042   046114
4A00196546L0980001000101110025042214501200051042   046114
4A00196546L09800010001011100290372140005001220501032   036214
???

I have searched for tricks to do this but I must not have used the right 
keywords, I found nothing.

I suppose I could read the entire file as a single character variable for each 
line, then subset for lines that begin with 3 and save this in an ascii file 
that will then be reopened with a read.fwf call, and do the same with lines 
that begin with 4. But this does not appear to me to be very elegant nor 
efficient??? Is there a better method?

Thanks in advance,


Denis Chabot
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen

GPG Key ID = A46BEE1A




Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question regarding updating formulas with coefficients

2010-08-11 Thread Charles C. Berry


On Wed, 11 Aug 2010, David Winsemius wrote:



On Aug 11, 2010, at 6:03 PM, Jarrett Byrnes wrote:

I have formulae with coefficents that I would like to update.  However, I 
get some strange results.  For example, see the following:


For the formula y ~ d+ 3*r+t


Did you really get meaningful results from that formula? Care to provide an 
example? Maybe there is something more for me to learn.




I want to add a variable p, so

 update(y~d+0*r+t, .~.+p)


In formulas the * operator is an interaction creator. so you told R to make 
0 + r + 0:r. Probably not what you thought you were doing. So what were you 
trying to do anyway?




produces

y ~ d + t + p - 1


Which at least explains why you got the -1 (which in R formulas is that same 
as +0).


If the coefficient is not 0, but rather, something else - say, 3,


What do you think you are accomplishing when you put a scalar coefficient in 
the formula?




Maybe he wants

?I

??

Chuck


I get the following:

 update(y~d+3*r+t, .~.+p)

Error in terms.formula(tmp, simplify = TRUE) :
 invalid model formula in ExtractVars
 



Is there a way to do this,



or a different call I should be trying?


What you should be doing depends on what you want to happen.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question regarding updating formulas with coefficients

2010-08-11 Thread Charles C. Berry


On Wed, 11 Aug 2010, David Winsemius wrote:



On Aug 11, 2010, at 6:45 PM, Charles C. Berry wrote:


On Wed, 11 Aug 2010, David Winsemius wrote:

 
 On Aug 11, 2010, at 6:03 PM, Jarrett Byrnes wrote:
 
  I have formulae with coefficents that I would like to update.  However, 
  I get some strange results.  For example, see the following:

  For the formula y ~ d+ 3*r+t
 
 Did you really get meaningful results from that formula? Care to provide 
 an example? Maybe there is something more for me to learn.
 
 
  I want to add a variable p, so

update(y~d+0*r+t, .~.+p)
 
 In formulas the * operator is an interaction creator. so you told R to 
 make 0 + r + 0:r. Probably not what you thought you were doing. So what 
 were you trying to do anyway?
 
  produces

  y ~ d + t + p - 1
 
 Which at least explains why you got the -1 (which in R formulas is that 
 same as +0).

  If the coefficient is not 0, but rather, something else - say, 3,
 
 What do you think you are accomplishing when you put a scalar coefficient 
 in the formula?
 


Maybe he wants

 ?I


Possibly, but wouldn't that just result in a deflation by a factor of 3 of 
the estimated coefficient if you wrappedI around that rem ...  I(3*r) ?


Quite so. It is just one way (and I am not touting it) of rescaling a 
variable.


 I 

also wondered if he might need to be referred to:

?offset




??


?? back 'atcha.


My '??' was intended for the OP, but your guess seems good to me.

Chuck



--
David.


Chuck

  I get the following:
update(y~d+3*r+t, .~.+p)
  Error in terms.formula(tmp, simplify = TRUE) :
  invalid model formula in ExtractVars
Is there a way to do this,
 
  or a different call I should be trying?
 
 What you should be doing depends on what you want to happen.
 
 -- 
 
 David




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package for measurement error models

2010-08-07 Thread Charles C. Berry


On Sat, 7 Aug 2010, Carrie Li wrote:


Hi,all,

I posted this question couple of days again, but haven't gotten any answers
back. I would like to post it again, and if you have any ideas, please let
me know. Any helps and suggestions are very much appreciated.


Start by reading the Posting Guide.

It will tell you to

   *) Do RSiteSearch(keyword) with different keywords (at the R prompt)
  to search R functions, contributed packages and R-Help postings.

and

RSiteSearch(measurement error)

gives a number of hits that might be of interest.

If these hits do not pan out, you should let us know why none of the 
available packages that have 'measurement error' in their descriptions 
satisfy your needs.


HTH,

Chuck



The problem is about linear regression with both y and x have measurement,
and the variance of errors are heterogeneous. The estimated regression
coefficient and its variance are of interest. Is any R package doing this
task ?

Thank you

Carrie--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] discrete ECDF

2010-08-04 Thread Charles C. Berry


On Wed, 4 Aug 2010, David Winsemius wrote:


Dear list;

I just created a utility function that replicates what I have done in the 
past with Excel or OO.org by putting a formula of the form =sum($A1:A$1) in 
an upper-corner of a section and then doing a fill procedure by dragging 
the lower-rt corner down and to the right. When divided by the grand sum of 
the entries this function then calculates a 2D-discrete-ECDF.


I keep thinking I am missing the obvious, but I did try searching. Here is my 
effort at creating that functionality:


ecdf.tbl - function (.dat) { .dat - data.matrix(.dat)  #speeds up 
calculations

  .sdat - matrix(0, nrow(.dat), ncol(.dat) )
  .sdat[] - sapply(1:ncol(.dat), function(x)
 sapply(1:nrow(.dat),
   function(y)  sum(.dat[1:y, 1:x], na.rm=TRUE )  ) 
)

return(.sdat) }



ecdf.tbl2 -
function(mat) {
mat[is.na(mat)] - 0
t( apply( apply( mat,2, cumsum ), 1, cumsum ))}

HTH,

Chuck



 tst - read.table(textConnection(NA 5 6

4   5   7
5   6   8
6   7   9
NA 8 NA)   )


 tst

V1 V2 V3
1 NA  5  6
2  4  5  7
3  5  6  8
4  6  7  9
5 NA  8 NA


 ecdf.tbl(tst)

   [,1] [,2] [,3]
[1,]05   11
[2,]4   14   27
[3,]9   25   46
[4,]   15   38   68
[5,]   15   46   76


 ecdf.tbl(tst)/sum(tst, na.rm=TRUE)

 [,1]   [,2]  [,3]
[1,] 0. 0.06578947 0.1447368
[2,] 0.05263158 0.18421053 0.3552632
[3,] 0.11842105 0.32894737 0.6052632
[4,] 0.19736842 0.5000 0.8947368
[5,] 0.19736842 0.60526316 1.000


Did I miss a more compact vectorized or sweep()-ed solution? (I realize this 
is not really a function in the sense that ecdf() is.) I have seen prop.table 
and margin.table, but could not see how they would address this problem.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with indicators for switch

2010-08-02 Thread Charles C. Berry


On Mon, 2 Aug 2010, Roy Davy wrote:



Hi,

I am having some problems setting up some indicators and would appreciate any 
help.
I have some data called 'lights' with 3 variables called x, a and b.
x - is the date
a - equals 1 to indicate an 'on' button is activated
b - equals 1 to indicate an 'off' button is activated

Essentially i wannt to create 2 new variables c and d
c - will reflect the current state of the light (1 being on)
d - will be a count for how many days the light has been on

here's some data with the date omitted to illustrate what i have and what is 
required.

a b c d
0 0 0 0
0 0 0 0
1 0 1 1
1 0 1 2
0 0 1 3
0 0 1 4
1 0 1 5
0 0 1 6
0 0 1 7
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 1 1
0 0 1 2
0 0 1 3
0 0 1 4
0 1 0 0
0 0 0 0

After some considerable time i have managed to create variable c with a 
loop but it's really slow with the volume of data i have. Could anyone 
please show me how to do this efficiently?


Two solutions:

1) inline - if you are configured to compile source,
   use the inline package to render your loop as C or Fortran

2) findInterval - like this:

on.pos - which(a==1)
off.pos - which(b==1)
last.on - c(0,on.pos)[ 1 + findInterval(1:20,on.pos)]
last.off - c(0,off.pos)[ 1 + findInterval(1:20,off.pos)]
cee - as.integer(last.onlast.off)

Don't use 'c' as a variable name as it is a common function

HTH,

Chuck




I hope this is clear...

Thanks


Roy
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lognormal distribution - Range Factor

2010-08-01 Thread Charles C. Berry


On Sun, 1 Aug 2010, ted.hard...@manchester.ac.uk wrote:


On 01-Aug-10 10:59:03, Tims Corbett wrote:

Hi, What does it mean to say Lognormal distribution with a mean of
1.03E-6 with a range factor of 100 ? How can I find the lognormal
distribution paramters from this information?

Thanks, Tims


If you can, please say what is meant by range factor. I have seen
it used to denote the ratio maximum/minimum for a sample, but it
might be intended to mean something else in the context of your
question.



Tim, you should learn to use Google to answer this question instead of 
sending postings like this to lots of differnt mailing lists.


Googling

lognormal range factor

easily turns up the definition of range factor as a function of the 
quantiles of the distribution.


FWIW, Range factor looks like a term of art used in certain circles.

RSiteSearch'ing (or just ??lognormal, for crying out loud!) turns up 
everything else needed to answer this, viz the R functions to find the 
quantiles, and even the relation between the mean of the lognormal and its 
parameters.


I think a little bit of algebra makes it possible to solve this directly 
from tables of the normal (or lognormal) even without having to resort to 
uniroot().


Is this homework?

HTH,

Chuck

p.s. can the posting guide advise that OT questions like this be preceeded 
by a suitable amount of Googling before coming here??




If it does mean that, then the context is a sample of values, deemed
to be log-normally distributed, with (presumably) the sample mean
equal to 1.03E-6 (0.0103) and max/min=100. However this would
be really inadequate information to determine what the parameters
of the log-normal distribution might be. At the very least one needs
also the sample size, and even then the determination would not be
reliable.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Aug-10   Time: 12:33:45
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help splitting a data frame

2010-07-29 Thread Charles C. Berry


On Thu, 29 Jul 2010, kayj wrote:



Hi All,

I have a dataset that I would like to split based on : or – ( the data file
is tab delimited) for example:
Ny:23-45AC
BA:88-91DB
KJ:21-13PA

And I would like the data to be splitted and the final results look like
NY  23  45  AC
BA  88  91  DB
KJ  21  13  PA


read.table(textConnection(gsub([-:],\t,readLines(your.data.file.tab

or

read.table(textConnection(chartr(-:,\t\t,readLines(your.data.file.tab

HTH,

Chuck



I would like to have the resulting data as a data frame so each column is a
variable.

Thanks,


--
View this message in context: 
http://r.789695.n4.nabble.com/help-splitting-a-data-frame-tp2306832p2306832.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survival analysis MLE gives NA or enormous standard errors

2010-07-27 Thread Charles C. Berry


On Tue, 27 Jul 2010, Christopher David Desjardins wrote:


Hi Charles,

On Fri, 2010-07-23 at 14:40 -0700, Charles C. Berry wrote:

On Fri, 23 Jul 2010, Christopher David Desjardins wrote:


Sorry. I should have included some data. I've attached a subset of my
data (50/192) cases in a Rdata file and have pasted it below.

Running anova I get the following:


anova(sr.reg.s4.nore)

  Df Deviance Resid. Df-2*LL P(|Chi|)
NULL   NA   NA45 33.89752NA
as.factor(lifedxm)  2 2.43821143 31.45931 0.2954943

That would just be an omnibus test right and should that first NULL NA
line be worrisome? What if I want to test specifically that CONTROL and
BIPOLAR were different and that MAJOR DEPRESSION and BIPOLAR were
different?




You wrote:


Construct a likelikehood ratio test for each hypothesis by fitting three
models - two containing each term  and one containing both - and comparing
each simpler model to the fuller model.



Would that be recoding lifedxm (presently a dummy variable where 0 -
BIPOLAR, 1 - CONTROL, and 2 - MAJOR DEPRESSED) as three seperate
variables: CONTROL (0 - No, 1 - Yes), BIPOLAR (0 - N0, 1 - Yes), and
MAJOR DEPRESSED (0 - No, 1 - Yes) and then comparing the following
models with anova()?

CONTROL + BIPOLAR to MAJOR
CONTROL + MAJOR to BIPOLAR

I am sorry I am just a little confused. Basically I want to know if
BIPOLAR is a higher risk than MAJOR and CONTROL and if MAJOR is a higher
risk than CONTROL.



It would help communication if you would use a standard notation like R.

The meaning of pseudocodes tends to be a bit fuzzy. And from far below, it 
seems you have a factor called lifedxm not a 'dummy variable' in the 
sense that that term is often used.


Here are three models defined using the formula language:


frm1 - Surv(age_sym4, sym4) ~ I(lifedxm==MAJOR)

frm2 - Surv(age_sym4, sym4) ~ I(lifedxm==BIPOLAR)

frm3 - Surv(age_sym4, sym4) ~ I(lifedxm==MAJOR) +
I(lifedxm==BIPOLAR)

The model of frm1 is nested under that of frm3.

The model of frm2 is nested under that of frm3.

anova( frm1, frm3 ) will report on the effect of adding 
I(lifedxm==BIPOLAR) to frm1. i.e. it will give the increase in 
likelihood associated with that term.


anova( frm2, frm3 ) will report on the effect of 
adding I(lifedxm==MAJOR) to frm2.


Chuck




Thank you very much for all your help,
Chris



I'll look at Hauck-Donner effect.

Thanks,
Chris


bip.surv.s

  age_sym4 sym4 lifedxm
1  16.128680   MAJOR
2  19.326490   MAJOR
3  16.550310 CONTROL
4  19.367560 CONTROL
5  16.090350   MAJOR
6  21.505820   MAJOR
7  16.361400   MAJOR
8  20.572210   MAJOR
9  16.457220 CONTROL
10 19.945240 CONTROL
11 15.791920   MAJOR
12 20.766600   MAJOR
13 16.150580 BIPOLAR
14 19.258040 BIPOLAR
15 17.363450   MAJOR
16 21.180010   MAJOR
17   NA0 BIPOLAR
18   NA0 BIPOLAR
19 16.317591   MAJOR
20 18.297060   MAJOR
21 16.407940   MAJOR
22 19.137580   MAJOR
23 16.194390 CONTROL
24 21.368930 CONTROL
25 15.890490 CONTROL
26 18.997950 CONTROL
27   NA0 BIPOLAR
28 18.904860 BIPOLAR
29 16.364130   MAJOR
30 20.427100   MAJOR
31 16.659820   MAJOR
32 19.457910   MAJOR
33 16.643390 CONTROL
34 19.400410 CONTROL
35 15.373031 BIPOLAR
36 19.838470 BIPOLAR
37 15.422311   MAJOR
38 19.370290   MAJOR
39 15.069130   MAJOR
40 17.815200   MAJOR
41 15.504450 BIPOLAR
42 17.921970 BIPOLAR
43 15.345650 CONTROL
44 18.075290 CONTROL
45 15.594800 CONTROL
46 19.674200 CONTROL
47 14.789870   MAJOR
48 20.054760   MAJOR
49 14.787130   MAJOR
50 19.868580   MAJOR


On Fri, 2010-07-23 at 11:52 -0700, Charles C. Berry wrote:

On Fri, 23 Jul 2010, Christopher David Desjardins wrote:


Hi,
I am trying to fit the following model:

sr.reg.s4.nore - survreg(Surv(age_sym4,sym4), as.factor(lifedxm),
data=bip.surv)


Next time include a reproducible example. i.e. something we can run.

Now, Google Hauck Donner Effect to understand why

anova(sr.reg.s4.nore)

is preferred.

Chuck




Where age_sym4 is the age that a subject develops clinical thought
problems; sym4 is whether they develop clinical thoughts problems (0 or
1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or
CONTROL.

I am interested in whether or not survival differs by this covariate.

When I run my model, I am getting the following output:


summary(sr.reg.s4.nore)


Call:
survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm),
   data = bip.surv)
  Value Std. Error z   p
(Intercept)4.037  0.455  8.86643
0.00755
as.factor(lifedxm)CONTROL 14.844   4707.383  0.00315
0.997484052845082791450
as.factor(lifedxm)MAJOR0.706  0.447  1.58037
0.114022774867277756905
Log(scale)-0.290  0.267 -1.08493
0.277952437474223823521

Scale

Re: [R] Fwd: Questions about templates for R

2010-07-27 Thread Charles C. Berry


On Mon, 26 Jul 2010, Alon Friedman wrote:


Hi
I am looking for R templates to introduce the R to my students at
Seton hall university. The templates are predefined scripts in R that
will retain its primary intent when individually customized with their
own variable data or text.



Well the help pages for many (most?) R functions contain runnable 
examples.


example( image )

is one such.

If you want you students to perform a linear regression (say), having them 
read a bit about it, then start R, run


example( lm )

then read through the input and output and display the help pages for 
functions that are called is not a bad way to start.


There are BTW numerous books listed at

http://www.r-project.org/doc/bib/R-publications.html

which have plenty of runnable code in them.

And there are  as well as online resources 
accessible under 'Documnetation' at


http://www.r-project.org/

HTH,

Chuck

In this case, my students at Seton Hall

University. For example, PSPad editor provides new users predefined
templates before writing their own scripts.

Please let me know if it makes more sense.

Yours
AF
Alon Friedman, PhD
New York, NY 10014
Phone: 212-645-1538



--
Yours
AF
Alon Friedman, PhD
New York, NY 10014
Phone: 212-645-1538

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave and scan()

2010-07-27 Thread Charles C. Berry


On Tue, 27 Jul 2010, Murray Jorgensen wrote:

I am introducing the scan() function to my class. Consider the following file 
(Scanexamp.Rnw )


\documentclass[12pt]{article}

\begin{document}
 =
height = scan()
64 62 66 65 62
69 72 72 70

part = scan(what = character(0))
 Soprano Soprano Soprano
 AltoAltoTenor
 Tenor   BassBass

sh = data.frame(height, part)
sh
@
\end{document}

Now what happens when I attempt to Sweave this is


 Sweave(scanexamp.Rnw)

Writing to file scanexamp.tex
Processing code chunks ...
 1 : echo term verbatim

Error:  chunk 1
Error in parse(text = chunk) : unexpected numeric constant in:
height = scan()
64 62




Right.

Sweave is trying to parse the whole chunk.

It cannot parse 64 62 66 65 62. (And the command line cannot parse it 
either - try typing it at the R prompt.)


If you put each number on a separate line, Sweave will parse it, but when 
scan() runs, it will prompt for input and accept it from STDIN just as 
when run from the command line. Which probably isn't what you want.


I'd guess the path of least resistance is to have a bit of deception.

Use two chunks - one like that above but with eval=F and another with 
eval=T,echo=F with code like this


tcon - textConnection( 64 62 66 65 62
69 72 72 70 )
height = scan(tcon)
close(tcon)
...

If the deception doesn't please you, then use a file as in

example( scan )

to illustrate scan()

HTH,

Chuck



Comments would be appreciated. (And thanks to Ross Darnell for a lot of help 
on another list.)


Cheers,  Murray Jorgensen

--
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: m...@waikato.ac.nzFax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441   Mobile 021 0200 8350

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loading Rdata files in a Package

2010-07-25 Thread Charles C. Berry


On Sun, 25 Jul 2010, Andrew Leeser wrote:


I have an .Rdata file saved in my data folder in a package I created. However,
when I use the data() function it doesn't recognize that there any dataset
exists within the package. Is there anything specific I need to do such that the
data() function will load my dataset?


No reproducible example was provided.

Did you provide a name for the .Rdata file? e.g. mydat.RData

Run

example( package.skeleton )

then edit the 'mypkg' Rd files to put add dummy \title{} entries.

Then INSTALL, then run R,

library(mypkg)
data(d)
str(d)

.
.
.

Emulate what you see in mypkg/*

Or use package.skeleton() to get you started.


HTH,

Chuck



Thanks in advance,
Andrew



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] , Updating Table

2010-07-24 Thread Charles C. Berry


On Fri, 23 Jul 2010, Marcus Liu wrote:


Hi everyone,

Is there any command for updating table withing a loop??


Loops? We don't need no stinking loops!
 (From 'The Good, the Bad, and the Rgly')

tab - table(data.raw, findInterval(seq(along=data.raw), ind+1 ) )
tab %*% upper.tri(tab,diag=T)

or

tab2 - tapply( factor(data.raw), findInterval(seq(along=data.raw), ind+1 ), 
table)
Reduce( +, tab2, accum=TRUE )

HTH,

Chuck

p.s. See the posting guide re including a reproducible example with 
requests like yours.


For instance, at i, I have a table as ZZ = table(data.raw[1:ind[i]]) 
where ind = c(10, 20, 30, ...).?Then , ZZ will be as follow


A B C
?3??? 10?? 2

At (i + 1), ZZ = table(data.raw[(ind[i]+1):ind[i+1]])

A B D
?4 ?? 7??? 8

Is there any command that can update the table ZZ for each time so that in the 
above example, ZZ will be

A B C D
?7??? 17?? 2??? 8

Thanks.

liu




[[alternative HTML version deleted]]




Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survival analysis MLE gives NA or enormous standard errors

2010-07-23 Thread Charles C. Berry


On Fri, 23 Jul 2010, Christopher David Desjardins wrote:


Hi,
I am trying to fit the following model:

sr.reg.s4.nore - survreg(Surv(age_sym4,sym4), as.factor(lifedxm),
data=bip.surv)


Next time include a reproducible example. i.e. something we can run.

Now, Google Hauck Donner Effect to understand why

anova(sr.reg.s4.nore)

is preferred.

Chuck




Where age_sym4 is the age that a subject develops clinical thought
problems; sym4 is whether they develop clinical thoughts problems (0 or
1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or
CONTROL.

I am interested in whether or not survival differs by this covariate.

When I run my model, I am getting the following output:


summary(sr.reg.s4.nore)


Call:
survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm),
   data = bip.surv)
  Value Std. Error z   p
(Intercept)4.037  0.455  8.86643
0.00755
as.factor(lifedxm)CONTROL 14.844   4707.383  0.00315
0.997484052845082791450
as.factor(lifedxm)MAJOR0.706  0.447  1.58037
0.114022774867277756905
Log(scale)-0.290  0.267 -1.08493
0.277952437474223823521

Scale= 0.748

Weibull distribution
Loglik(model)= -76.3   Loglik(intercept only)= -82.6
Chisq= 12.73 on 2 degrees of freedom, p= 0.0017
Number of Newton-Raphson Iterations: 21
n=186 (6 observations deleted due to missingness)


I am concerned about the p-value of 0.997 and the SE of 4707. I am
curious if it has to do with the fact that the CONTROL group doesn't
have a mixed response, meaning that all my subjects do not develop
clinical levels of thought problems and subsequently 'survive'.


table(bip.surv$sym4,bip.surv$lifedxm)


   BIPOLAR CONTROL MAJOR
 0  41  6078
 1   7   0 6

Is there some sort of way that I can overcome this? Is my model
misspecified? Is this better suited to be run as a Bayesian model using
priors to overcome the lack of a mixed response?

Also, please cc me on an email as I am a digest subscriber.
Thanks,
Chris


--
Christopher David Desjardins
PhD student, Quantitative Methods in Education
MS student, Statistics
University of Minnesota
192 Education Sciences Building
http://cddesjardins.wordpress.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Survival analysis MLE gives NA or enormous standard errors

2010-07-23 Thread Charles C. Berry


On Fri, 23 Jul 2010, Christopher David Desjardins wrote:


Sorry. I should have included some data. I've attached a subset of my
data (50/192) cases in a Rdata file and have pasted it below.

Running anova I get the following:


anova(sr.reg.s4.nore)

  Df Deviance Resid. Df-2*LL P(|Chi|)
NULL   NA   NA45 33.89752NA
as.factor(lifedxm)  2 2.43821143 31.45931 0.2954943

That would just be an omnibus test right and should that first NULL NA
line be worrisome? What if I want to test specifically that CONTROL and
BIPOLAR were different and that MAJOR DEPRESSION and BIPOLAR were
different?


Construct a likelikehood ratio test for each hypothesis by fitting three 
models - two containing each term  and one containing both - and comparing 
each simpler model to the fuller model.




I'll look at Hauck-Donner effect.

Thanks,
Chris


bip.surv.s

  age_sym4 sym4 lifedxm
1  16.128680   MAJOR
2  19.326490   MAJOR
3  16.550310 CONTROL
4  19.367560 CONTROL
5  16.090350   MAJOR
6  21.505820   MAJOR
7  16.361400   MAJOR
8  20.572210   MAJOR
9  16.457220 CONTROL
10 19.945240 CONTROL
11 15.791920   MAJOR
12 20.766600   MAJOR
13 16.150580 BIPOLAR
14 19.258040 BIPOLAR
15 17.363450   MAJOR
16 21.180010   MAJOR
17   NA0 BIPOLAR
18   NA0 BIPOLAR
19 16.317591   MAJOR
20 18.297060   MAJOR
21 16.407940   MAJOR
22 19.137580   MAJOR
23 16.194390 CONTROL
24 21.368930 CONTROL
25 15.890490 CONTROL
26 18.997950 CONTROL
27   NA0 BIPOLAR
28 18.904860 BIPOLAR
29 16.364130   MAJOR
30 20.427100   MAJOR
31 16.659820   MAJOR
32 19.457910   MAJOR
33 16.643390 CONTROL
34 19.400410 CONTROL
35 15.373031 BIPOLAR
36 19.838470 BIPOLAR
37 15.422311   MAJOR
38 19.370290   MAJOR
39 15.069130   MAJOR
40 17.815200   MAJOR
41 15.504450 BIPOLAR
42 17.921970 BIPOLAR
43 15.345650 CONTROL
44 18.075290 CONTROL
45 15.594800 CONTROL
46 19.674200 CONTROL
47 14.789870   MAJOR
48 20.054760   MAJOR
49 14.787130   MAJOR
50 19.868580   MAJOR


On Fri, 2010-07-23 at 11:52 -0700, Charles C. Berry wrote:

On Fri, 23 Jul 2010, Christopher David Desjardins wrote:


Hi,
I am trying to fit the following model:

sr.reg.s4.nore - survreg(Surv(age_sym4,sym4), as.factor(lifedxm),
data=bip.surv)


Next time include a reproducible example. i.e. something we can run.

Now, Google Hauck Donner Effect to understand why

anova(sr.reg.s4.nore)

is preferred.

Chuck




Where age_sym4 is the age that a subject develops clinical thought
problems; sym4 is whether they develop clinical thoughts problems (0 or
1); and lifedxm is mother's diagnosis: BIPOLAR, MAJOR DEPRESSION, or
CONTROL.

I am interested in whether or not survival differs by this covariate.

When I run my model, I am getting the following output:


summary(sr.reg.s4.nore)


Call:
survreg(formula = Surv(age_sym4, sym4) ~ as.factor(lifedxm),
   data = bip.surv)
  Value Std. Error z   p
(Intercept)4.037  0.455  8.86643
0.00755
as.factor(lifedxm)CONTROL 14.844   4707.383  0.00315
0.997484052845082791450
as.factor(lifedxm)MAJOR0.706  0.447  1.58037
0.114022774867277756905
Log(scale)-0.290  0.267 -1.08493
0.277952437474223823521

Scale= 0.748

Weibull distribution
Loglik(model)= -76.3   Loglik(intercept only)= -82.6
Chisq= 12.73 on 2 degrees of freedom, p= 0.0017
Number of Newton-Raphson Iterations: 21
n=186 (6 observations deleted due to missingness)


I am concerned about the p-value of 0.997 and the SE of 4707. I am
curious if it has to do with the fact that the CONTROL group doesn't
have a mixed response, meaning that all my subjects do not develop
clinical levels of thought problems and subsequently 'survive'.


table(bip.surv$sym4,bip.surv$lifedxm)


   BIPOLAR CONTROL MAJOR
 0  41  6078
 1   7   0 6

Is there some sort of way that I can overcome this? Is my model
misspecified? Is this better suited to be run as a Bayesian model using
priors to overcome the lack of a mixed response?

Also, please cc me on an email as I am a digest subscriber.
Thanks,
Chris


--
Christopher David Desjardins
PhD student, Quantitative Methods in Education
MS student, Statistics
University of Minnesota
192 Education Sciences Building
http://cddesjardins.wordpress.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http

Re: [R] (no subject)

2010-07-11 Thread Charles C. Berry


On Sun, 11 Jul 2010, li li wrote:


Hi all,
  I want to add the red color under the standard normal curve to the right
of 1.96.
Can anyone give me a hand? Please see the code below.
  Thank you.


x - seq(-4, 4, length=100)
hx - dnorm(x)
par(pty=s)
plot(x, hx, type=l, xlab=z value,
 ylab=Density, main=density of N(0,1))
abline(v=1.96, col=red)



?polygon
example(polygon)





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weired problem when passing arguments using ...?

2010-07-07 Thread Charles C. Berry


On Wed, 7 Jul 2010, thmsfuller...@gmail.com wrote:


Hello All,

I'm trying to pass the argument col.names to write.csv using '...'.
But I got the following warnings. Maybe it is very simple. But I'm not
sure what I am wrong. Could you please help point to me what the
problem is?


Its not a _problem_, its a _feature_.


read

?write.csv

and use write.table()




#
fun=function(x, ...) {
 fr=parent.frame()
 tmp=get(x, envir=fr)
 write.csv(
 tmp
 , file=paste(x, '.csv', sep='')
 , ...
 )
}

f=data.frame(x=1:10,y=letters[1:10])

fun('f', col.names=F)



fun('f', col.names=F)

Warning message:
In write.csv(tmp, file = paste(x, .csv, sep = ), ...) :
 attempt to set 'col.names' ignored

--
Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] forcing a zero level in contr.sum

2010-07-07 Thread Charles C. Berry

 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I test against a simple null that two regressions coefficients are equal?

2010-07-07 Thread Charles C. Berry


On Wed, 7 Jul 2010, chen jia wrote:


Hi there,

I run two regressions:

y = a1 + b1 * x + e1
y = a2 + b2 * z + e2

I want to test against the null hypothesis: b1 = b2.  How do I design the test?



You are testing a non-nested hypothesis, which requires special handling.

The classical test is due to Hotelling, but see the references (and R code 
snippets) in this posting:


http://markmail.org/message/egnowmdzpzjtahy7

(it is the merest coincidence that the above thread was initiated by Mark 
Leeds and that the URL is 'markmail' :-) )


HTH,

Chuck



I think I can add two equations together and divide both sides by 2:
y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2).
or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3

If I run this new regression, I can test against the null b1 = b2 in
this regression.  Is it an equivalent test as the original one? If
yes, how do I do that in R?

Alternatively, I think I can just test against the null:
correlation(y, x) = correlation(y, z), where correlation(. , .) is the
correlation between two random variables. Is this equivalent too? If
yes, how do I do it in R?

Thanks.

Best,
Jia

--
Ohio State University - Finance
  248 Fisher Hall
   2100 Neil Ave.
 Columbus, Ohio  43210
Telephone: 614-292-2830
  http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath vector problem; full program enclosed

2010-07-06 Thread Charles C. Berry


On Tue, 6 Jul 2010, David Winsemius wrote:



On Jul 6, 2010, at 1:41 PM, Duncan Murdoch wrote:


On 06/07/2010 10:54 AM, Paul Johnson wrote:
 Here's another example of my plotmath whipping boy, the Normal 
 distribution.
 
 A colleague asks for a Normal plotted above a series of axes that

 represent various other distributions (T, etc).
 
 I want to use vectors of equations in plotmath to do this, but have

 run into trouble.  Now I've isolated the problem down to a relatively
 small piece of working example code (below).  If you would please run
 this, I think you will see the problem.  When plotmath meets one
 vector of expressions, it converts all but one to math, so in the
 figure output i get, in LaTeX speak
 
 b1 $\mu-1.0 \sigma$$\mu$
 
 All values except the first come out correctly.
 
 This happens only when I try to use bquote or substitute to get

 variables to fill in where the 1.96, 1.0, and so forth should be.  In
 the figure output, you should see a second axis where all of the
 symbols are resolved correctly.
 
 As usual, thanks in advance for your help, sorry if I've made an

 obvious mistake or overlooked a manual.
 
 ### Filename: plotMathProblem.R

 ### Paul Johnson July 5, 2010
 ### email me paulj...@ku.edu
 
  sigma - 10.0

  mu - 4.0
 
  myx - seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)
 
  myDensity - dnorm(myx,mean=mu,sd=sigma)
 
  ### xpd needed to allow writing outside strict box of graph

  ### Need big bottom margin to add several x axes
  par(xpd=TRUE, ps=10, mar=c(18,2,2,2))
 
 plot(myx, myDensity, type=l, xlab=, ylab=Probability Density ,

 main=myTitle1, axes=FALSE)
  axis(2, pos= mu - 3.6*sigma)
  axis(1, pos=0)
 
  lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes
 
 
  addInteriorLine - function(x, m, sd){

for (i in 1:(length(x))){
  lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2)
}

}
 
 
  dividers - c(qnorm(0.025), -1, 0, 1, qnorm(0.975))

  addInteriorLine(mu+sigma*dividers, mu,sigma)
 
  # bquote creates an expression that text plotters can use

  t1 -  bquote( mu== .(mu))
  mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)
 
 
  addInteriorLabel - function(pos1, pos2, m, s){

area - abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s)))
mid - m+0.5*(pos1+pos2)*s
text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),%))

}
 
 
  addInteriorLabel(dividers[1],dividers[2],  mu, sigma)

  addInteriorLabel(dividers[2],dividers[3],  mu, sigma)
  addInteriorLabel(dividers[3],dividers[4],  mu, sigma)
  addInteriorLabel(dividers[4],dividers[5],  mu, sigma)
 
 
 
 
 ### Following is problem point: axis will

 ### end up with correct labels, except for first point,
 ### where we end up with b1 instead of mu - 1.96*sigma.
   b1 - substitute( mu - d*sigma, list(d=round(dividers[1],2)) )
   b2 - substitute( mu - sigma )
   b3 - substitute( mu )
   b4 - substitute( mu + sigma )
   b5 - substitute( mu + d*sigma, list(d=round(dividers[5],2)) )
 ##   plot(-20:50,-20:50,type=n,axes=F)
   axis(1, line=4,at=mu+dividers*sigma,
 labels=c(expression(b1),b2,b3,b4,b5), padj=-1)
 

You want as.expression(b1), not expression(b1).  The latter means the 
expression consisting of the symbol b1.  The former means take the object 
stored in b1, and convert it to an expression..


It's not perfect, because you'll end up with mu - -1.96sigma (i.e. two 
minus signs), but it's closer than what you had.


Easily addressed in this case with ~ instead of -. The value of d 
provides the minus:


b1 - substitute( mu ~ d*sigma, list(d=round(dividers[1],2)) )



But if d = 0, there will be no sign so maybe use

b1 - substitute( mu ~ d*sigma, list(d=sprintf(%+.2f, dividers[1]))) 
b5 - substitute( mu ~ d*sigma, list(d=sprintf(%+.2f, dividers[5])))


etc.

Chuck



Duncan Murdoch

 
 
 
 ### This gets right result but have to hard code the dividers

  b1 - expression( mu - 1.96*sigma )
  b2 - expression( mu - sigma )
  b3 - expression( mu )
  b4 - expression( mu + sigma )
  b5 - expression( mu + 1.96*sigma )
  axis(1, line=8,at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1)
 
 
 
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

1 2 3 4 5 6 >

1 - 100 of 581 matches

Mail list logo