Re: [R] mean

2013-08-30 Thread Albyn Jones
It would be easier to diagnose the problem if you included an example
illustrating exactly what you did.  I'll guess:

 a - list(3,4,5)
 mean(a)
[1] NA
Warning message:
In mean.default(a) : argument is not numeric or logical: returning NA

 mean(as.numeric(a))
[1] 4

But that's just a guess, as I don't know the actual contents of your list!

albyn


On Fri, Aug 30, 2013 at 5:18 AM, agnes69 fes...@gredeg.cnrs.fr wrote:

 When I try to apply mean to a list, I get the answer :

 argument is not numeric or logical: returning NA

 Could you help me?

 (I am a very beginner)



 --
 View this message in context:
 http://r.789695.n4.nabble.com/mean-tp4674999.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sample(c(0, 1)...) vs. rbinom

2013-05-23 Thread Albyn Jones

After a bit of playing around, I discovered that
sample() does something similar in other situations:


set.seed(105021)
sample(1:5,1,prob=c(1,1,1,1,1))

[1] 3

set.seed(105021)
sample(1:5,1)

[1] 2



set.seed(105021)
sample(1:5,5,prob=c(1,1,1,1,1))

[1] 3 4 2 1 5

set.seed(105021)
sample(1:5,5)

[1] 2 5 1 4 3

albyn


On 2013-05-22 22:24, peter dalgaard wrote:

On May 23, 2013, at 07:01 , Jeff Newmiller wrote:

You seem to be building an elaborate structure for testing the 
reproducibility of the random number generator. I suspect that rbinom 
is calling the random number generator a different number of times 
when you pass prob=0.5 than otherwise.


Nope. It's switching 0 and 1:

set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp)); 
set.seed(1); rbinom(10,1,pp)

 [1] 1 1 0 0 1 0 0 0 0 1
 [1] 0 0 1 1 0 1 1 1 1 0

which is curious, but of course has no implication for the
distributional properties. Curiouser, if you drop the prob= in 
sample.


set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1); 
rbinom(10,1,pp)

 [1] 0 0 1 1 0 1 1 1 1 0
 [1] 0 0 1 1 0 1 1 1 1 0

However, it was never a design goal that two different random
functions (or even two code paths within the same function) should
give exactly the same values, even if they simulate the same
distribution, so this is nothing more than a curiosity.




Appendix A: some R code that exhibits the problem
=

ppp - seq(0, 1, by = 0.01)

result - do.call(rbind, lapply(ppp, function(p) {
set.seed(1)
sampleRes - sample(c(0, 1), size = 1, replace = TRUE,
prob=c(1-p, p))

set.seed(1)
rbinomRes - rbinom(1, size = 1, prob = p)

data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))

}))

result


Appendix B: the output from the R code
==

prob equivalent
1   0.00   TRUE
2   0.01   TRUE
3   0.02   TRUE
4   0.03   TRUE
5   0.04   TRUE
6   0.05   TRUE
7   0.06   TRUE
8   0.07   TRUE
9   0.08   TRUE
10  0.09   TRUE
11  0.10   TRUE
12  0.11   TRUE
13  0.12   TRUE
14  0.13   TRUE
15  0.14   TRUE
16  0.15   TRUE
17  0.16   TRUE
18  0.17   TRUE
19  0.18   TRUE
20  0.19   TRUE
21  0.20   TRUE
22  0.21   TRUE
23  0.22   TRUE
24  0.23   TRUE
25  0.24   TRUE
26  0.25   TRUE
27  0.26   TRUE
28  0.27   TRUE
29  0.28   TRUE
30  0.29   TRUE
31  0.30   TRUE
32  0.31   TRUE
33  0.32   TRUE
34  0.33   TRUE
35  0.34   TRUE
36  0.35   TRUE
37  0.36   TRUE
38  0.37   TRUE
39  0.38   TRUE
40  0.39   TRUE
41  0.40   TRUE
42  0.41   TRUE
43  0.42   TRUE
44  0.43   TRUE
45  0.44   TRUE
46  0.45   TRUE
47  0.46   TRUE
48  0.47   TRUE
49  0.48   TRUE
50  0.49   TRUE
51  0.50  FALSE
52  0.51   TRUE
53  0.52   TRUE
54  0.53   TRUE
55  0.54   TRUE
56  0.55   TRUE
57  0.56   TRUE
58  0.57   TRUE
59  0.58   TRUE
60  0.59   TRUE
61  0.60   TRUE
62  0.61   TRUE
63  0.62   TRUE
64  0.63   TRUE
65  0.64   TRUE
66  0.65   TRUE
67  0.66   TRUE
68  0.67   TRUE
69  0.68   TRUE
70  0.69   TRUE
71  0.70   TRUE
72  0.71   TRUE
73  0.72   TRUE
74  0.73   TRUE
75  0.74   TRUE
76  0.75   TRUE
77  0.76   TRUE
78  0.77   TRUE
79  0.78   TRUE
80  0.79   TRUE
81  0.80   TRUE
82  0.81   TRUE
83  0.82   TRUE
84  0.83   TRUE
85  0.84   TRUE
86  0.85   TRUE
87  0.86   TRUE
88  0.87   TRUE
89  0.88   TRUE
90  0.89   TRUE
91  0.90   TRUE
92  0.91   TRUE
93  0.92   TRUE
94  0.93   TRUE
95  0.94   TRUE
96  0.95   TRUE
97  0.96   TRUE
98  0.97   TRUE
99  0.98   TRUE
100 0.99   TRUE
101 1.00   TRUE

Appendix C: Session information
===


sessionInfo()

R version 3.0.0 (2013-04-03)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   
base






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible 

Re: [R] sample(c(0, 1)...) vs. rbinom

2013-05-23 Thread Albyn Jones
the something similar is return a different result in two
situations where one might expect the same result, ie when
a probability vector with equal probabilities is supplied versus
the default of equal probabilities.

And, assuming that by concerns me you mean worries me,
I have no clue why you think it does!  It is a curiosity.

albyn

On Thu, May 23, 2013 at 04:38:18PM +, Nordlund, Dan (DSHS/RDA) wrote:
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Albyn Jones
  Sent: Thursday, May 23, 2013 8:30 AM
  To: r-help@r-project.org
  Subject: Re: [R] sample(c(0, 1)...) vs. rbinom
  
  After a bit of playing around, I discovered that
  sample() does something similar in other situations:
  
   set.seed(105021)
   sample(1:5,1,prob=c(1,1,1,1,1))
  [1] 3
   set.seed(105021)
   sample(1:5,1)
  [1] 2
  
  
   set.seed(105021)
   sample(1:5,5,prob=c(1,1,1,1,1))
  [1] 3 4 2 1 5
   set.seed(105021)
   sample(1:5,5)
  [1] 2 5 1 4 3
  
  albyn
 
 
 What is the something similar you are referring to?  And I guess I still 
 don't understand what it is that concerns you about the sample function.
 
 
 Dan
 
 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA 98504-5204
 
 
 
  
  
  On 2013-05-22 22:24, peter dalgaard wrote:
   On May 23, 2013, at 07:01 , Jeff Newmiller wrote:
  
   You seem to be building an elaborate structure for testing the
   reproducibility of the random number generator. I suspect that
  rbinom
   is calling the random number generator a different number of times
   when you pass prob=0.5 than otherwise.
  
   Nope. It's switching 0 and 1:
  
   set.seed(1); sample(0:1,10,replace=TRUE,prob=c(1-pp,pp));
   set.seed(1); rbinom(10,1,pp)
[1] 1 1 0 0 1 0 0 0 0 1
[1] 0 0 1 1 0 1 1 1 1 0
  
   which is curious, but of course has no implication for the
   distributional properties. Curiouser, if you drop the prob= in
   sample.
  
   set.seed(1); sample(0:1,10,replace=TRUE); set.seed(1);
   rbinom(10,1,pp)
[1] 0 0 1 1 0 1 1 1 1 0
[1] 0 0 1 1 0 1 1 1 1 0
  
   However, it was never a design goal that two different random
   functions (or even two code paths within the same function) should
   give exactly the same values, even if they simulate the same
   distribution, so this is nothing more than a curiosity.
  
  
  
   Appendix A: some R code that exhibits the problem
   =
  
   ppp - seq(0, 1, by = 0.01)
  
   result - do.call(rbind, lapply(ppp, function(p) {
   set.seed(1)
   sampleRes - sample(c(0, 1), size = 1, replace = TRUE,
   prob=c(1-p, p))
  
   set.seed(1)
   rbinomRes - rbinom(1, size = 1, prob = p)
  
   data.frame(prob = p, equivalent = all(sampleRes == rbinomRes))
  
   }))
  
   result
  
  
   Appendix B: the output from the R code
   ==
  
   prob equivalent
   1   0.00   TRUE
   2   0.01   TRUE
   3   0.02   TRUE
   4   0.03   TRUE
   5   0.04   TRUE
   6   0.05   TRUE
   7   0.06   TRUE
   8   0.07   TRUE
   9   0.08   TRUE
   10  0.09   TRUE
   11  0.10   TRUE
   12  0.11   TRUE
   13  0.12   TRUE
   14  0.13   TRUE
   15  0.14   TRUE
   16  0.15   TRUE
   17  0.16   TRUE
   18  0.17   TRUE
   19  0.18   TRUE
   20  0.19   TRUE
   21  0.20   TRUE
   22  0.21   TRUE
   23  0.22   TRUE
   24  0.23   TRUE
   25  0.24   TRUE
   26  0.25   TRUE
   27  0.26   TRUE
   28  0.27   TRUE
   29  0.28   TRUE
   30  0.29   TRUE
   31  0.30   TRUE
   32  0.31   TRUE
   33  0.32   TRUE
   34  0.33   TRUE
   35  0.34   TRUE
   36  0.35   TRUE
   37  0.36   TRUE
   38  0.37   TRUE
   39  0.38   TRUE
   40  0.39   TRUE
   41  0.40   TRUE
   42  0.41   TRUE
   43  0.42   TRUE
   44  0.43   TRUE
   45  0.44   TRUE
   46  0.45   TRUE
   47  0.46   TRUE
   48  0.47   TRUE
   49  0.48   TRUE
   50  0.49   TRUE
   51  0.50  FALSE
   52  0.51   TRUE
   53  0.52   TRUE
   54  0.53   TRUE
   55  0.54   TRUE
   56  0.55   TRUE
   57  0.56   TRUE
   58  0.57   TRUE
   59  0.58   TRUE
   60  0.59   TRUE
   61  0.60   TRUE
   62  0.61   TRUE
   63  0.62   TRUE
   64  0.63   TRUE
   65  0.64   TRUE
   66  0.65   TRUE
   67  0.66   TRUE
   68  0.67   TRUE
   69  0.68   TRUE
   70  0.69   TRUE
   71  0.70   TRUE
   72  0.71   TRUE
   73  0.72   TRUE
   74  0.73   TRUE
   75  0.74   TRUE
   76  0.75   TRUE
   77  0.76   TRUE
   78  0.77   TRUE
   79  0.78   TRUE
   80  0.79   TRUE
   81  0.80   TRUE
   82  0.81   TRUE
   83  0.82   TRUE
   84  0.83

Re: [R] the joy of spreadsheets (off-topic)

2013-04-27 Thread Albyn Jones

I once had a discussion with an economist who told me
in almost these exact words:

I don't care what the data say, the theory is so clear.

albyn

On 2013-04-26 9:30, William Dunlap wrote:
The prior for the incompetence/malice question is usually best set 
pretty heavily in

favour of incompetence ...


The following comment on economic research is from a 2010 article in
the Atlantic
reviewing John Ioannidis' work.

http://www.theatlantic.com/magazine/print/2010/11/lies-damned-lies-and-medical-science/308269/

  Medical research is not especially plagued with wrongness.
   Other meta-research experts have confirmed that similar issues
   distort research in all fields of science, from physics to 
economics

   (where the highly regarded economists J. Bradford DeLong and
   Kevin Lang once showed how a remarkably consistent paucity of
   strong evidence in published economics studies made it unlikely
   that any of them were right).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf

Of S Ellison
Sent: Friday, April 26, 2013 9:08 AM
To: Thomas Adams; peter dalgaard
Cc: r-help
Subject: Re: [R] the joy of spreadsheets (off-topic)



 One might wonder if the Excel error was indeed THAT or
 perhaps a way to get the desired results, give the other
 issues in their analysis?

The prior for the incompetence/malice question is usually best set 
pretty heavily in

favour of incompetence ...

S


***
This email and any attachments are confidential. Any 
use...{{dropped:8}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prop.test vs hand calculated confidence interval

2013-04-03 Thread Albyn Jones
It might help to look at the documentation.
Typing 

   ?prop.test

on the command line.  That would reveal various items of
interest, including a section labeled Details.  A close
reading of that section turns up the explanation:

  The confidence interval is computed by inverting the score test.

There are also journal references given.

albyn

On Wed, Apr 03, 2013 at 06:17:56PM -0400, Sarah Goslee wrote:
 One of the joys of R is that it's open source: you can read the code
 for prop.test yourself and see what's happening.
 
 In this case, simply typing
 prop.test
 at the command line will provide it, although without comments.
 
 Sarah
 
 
 
 On Wed, Apr 3, 2013 at 6:10 PM, David Arnold dwarnol...@suddenlink.net 
 wrote:
  Hi,
 
  This code:
 
  n=40
  x=17
  phat=x/n
  SE=sqrt(phat*(1-phat)/n)
  zstar=qnorm(0.995)
  E=zstar*SE
  phat+c(-E,E)
 
  Gives this result:
 
  [1] 0.2236668 0.6263332
 
  The TI Graphing calculator gives the same result.
 
  Whereas this test:
 
  prop.test(x,n,conf.level=0.99,correct=FALSE)
 
  Give this result:
 
  0.2489036 0.6224374
 
  I'm wondering why there is a difference.
 
  D.
 
 
 
 --
 Sarah Goslee
 http://www.functionaldiversity.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prop.test correct true and false gives same answer

2013-03-27 Thread Albyn Jones
?prop.test is helpful.

Continuity correction is used only if it does not exceed the
difference between sample and null proportions in absolute value.

albyn

On Wed, Mar 27, 2013 at 02:04:51PM -0700, David Arnold wrote:
 All,
 
 How come both of these are the same.  Both say 1-sample proportions test
 without continuity correction. I would suspect one would say without and
 one would say with.
 
 
  prop.test(118,236,.5,correct=FALSE,conf.level=0.95)
 
   1-sample proportions test without continuity correction
 
 data:  118 out of 236, null probability 0.5 
 X-squared = 0, df = 1, p-value = 1
 alternative hypothesis: true p is not equal to 0.5 
 95 percent confidence interval:
  0.4367215 0.5632785 
 sample estimates:
   p 
 0.5 
 
  prop.test(118,236,.5,correct=TRUE,conf.level=0.95)
 
   1-sample proportions test without continuity correction
 
 data:  118 out of 236, null probability 0.5 
 X-squared = 0, df = 1, p-value = 1
 alternative hypothesis: true p is not equal to 0.5 
 95 percent confidence interval:
  0.4367215 0.5632785 
 sample estimates:
   p 
 0.5 
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/prop-test-correct-true-and-false-gives-same-answer-tp4662659.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] order statistic of multivariate normal

2013-03-21 Thread Albyn Jones
R^n for n  1 is not an ordered set.

albyn

On Thu, Mar 21, 2013 at 02:32:44PM -0700, David Winsemius wrote:
 
 On Mar 21, 2013, at 1:44 PM, li li wrote:
 
  Hi all,
  Is there an R function that computes the probabilty or quantiles of
  order statistics of multivariate normal?
   Thank you.
 
 There is an mvtnorm package. I don't know what you mean by quantiles of 
 order statistics of multivariate normal, though. 
 -- 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exponentiate very large numbers

2013-02-05 Thread Albyn Jones

I stayed out of this one thinking it was probably a homework exercise.
After others have responded, I'll go ahead with my gloss on
Bill's function...

The specific problem is really of the form

 exp(a) - exp(a+eps) = exp(a)*(1-exp(eps))

So even though we can't compute exp(1347), we can
compute 1-exp(eps) for eps of the desired size.
The given eps was


1351+log(.1) -1347

[1] 1.697415

So the answer is exp(1347)*(1-exp(1.697415)),
and as Bill noted, you can compute the log
of the absolute value of that quantity...


1347 + log(abs(1-exp(1.697415)))

[1] 1348.495

albyn






On 2013-02-05 10:01, William Dunlap wrote:

Are there any tricks I can use to get a real result
for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or
exponential form?


  log.a - 1347
  log.b - 1351
  f0 - function(log.a, log.b) exp(log.a) - exp(log(0.1) + log.b)

will not work because f0(1347,1351) is too big to represent as
a double precision number (abs(result)10^308)).
If you are satisfied with computing log(result) you can do it
with some helper function:
   addOnLogScale - function (e1, e2)
   {
   # log(exp(e1) + exp(e2))
   small - pmin(e1, e2)
   big - pmax(e1, e2)
   log1p(exp(small - big)) + big
   }
   subtractOnLogScale - function (e1, e2)
   {
  # log(abs(exp(e1) - exp(e2)))
   small - pmin(e1, e2)
   big - pmax(e1, e2)
   structure(log1p(-exp(small - big)) + big, isPositive = e1  
e2)

   }

as

   f1 - function(log.a, log.b)  {
# log(abs(exp(log.a) - exp( log(0.1) + log.b))), avoiding 
overflow

subtractOnLogScale( log.a, log(0.1) + log.b )
   }

E.g.,
   f0(7,3)
  [1] 1094.625
   exp(f1(7,3))
  [1] 1094.625
  attr(,isPositive)
  [1] TRUE

With your log.a and log.b we get
   f1(1347, 1351)
  [1] 1348.495
  attr(,isPositive)
  [1] FALSE

You can use the Rmpfr package to compute the results to any desired
precision.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf

Of francesca casalino
Sent: Monday, February 04, 2013 8:00 AM
To: r-help@r-project.org
Subject: Re: [R] Exponentiate very large numbers

I am sorry I have confused you, the logs are all base e:

ln(a) = 1347
ln(b) = 1351

And I am trying to solve this expression:

exp( ln(a) ) - exp( ln(0.1) + ln(b) )


Thank you.

2013/2/4 francesca casalino francy.casal...@gmail.com:
 Dear R experts,

 I have the logarithms of 2 values:

 log(a) = 1347
 log(b) = 1351

 And I am trying to solve this expression:

 exp( ln(a) ) - exp( ln(0.1) + ln(b) )

 But of course every time I try to exponentiate the log(a) or 
log(b)
 values I get Inf. Are there any tricks I can use to get a real 
result

 for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or
 exponential form?


 Thank you very much for the help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing linear regression coefficients to a slope of 1

2012-11-24 Thread Albyn Jones
Is this homework?  
 
PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html

albyn

On Sat, Nov 24, 2012 at 07:27:25PM -0500, Catriona Hendry wrote:
 Hi!
 
 I have a question that is probably very basic, but I cannot figure out how
 to do it. I simply need to compare the significance of a regression slope
 against a slope of 1, instead of the default of zero.
 
 I know this topic has been posted before, and I have tried to use the
 advice given to others to fix my problem. I tried the offset command based
 on one of these advice threads as follows:
 
 Regression - lm(y~x+offset(1*x))
 
 but this resulted in a regression line that was plotted perpendicular to
 the data when added with the abline function.
 
 I would be extremely grateful for your help!!
 
 Thanks!!
 
 Cat
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing linear regression coefficients to a slope of 1

2012-11-24 Thread Albyn Jones

Dear Cat

My apologies for presuming...

Here's a primitive solution:  compute a t-statistic or CI.

t = (beta-hat - 1)/SE(beta-hat), compare to qt(.975, res.df)

Or Better, compute the 95% confidence interval

  beta-hat + c(-1,1)*qt(.975, res.df)*SE(beta-hat)

albyn

On 2012-11-24 18:05, Catriona Hendry wrote:

Hi,

@ Albyn, David.. No, its not homework. Its basic groundwork for 
testing
allometric relationships for a graduate project I am working on. I 
read the
guide before posting, I spent half the day trying to understand how I 
am

going wrong based on the advice given to others.

@Bert, David... I apologise for the lack of code, I wasn't sure how 
to

explain my problem and I guess I went about it the wrong way.

I do think this is what I need to be doing, I am testing allometric
relationships of body size against a predicted isometric (1:1)
relationship. So I would like to know if the relationship between my
variables deviates from that.

Hopefully the information below will be what is needed.

Here is the part of the code relevant to the regression and plot:



plot(Contrast_log_MTL_ALL, Contrast_log_FTL_ALL)



Regression_PhyloContrasts_ALL - lm(Contrast_log_FTL_ALL ~

Contrast_log_MTL_ALL, offset=1*Contrast_log_MTL_ALL)
abline(Regression_PhyloContrasts_ALL)

the plot that resulted is attached as an image file.


Below are the vectors of my variables. The are converted from other 
values
imported and indexed from a csv file, so unfortunately I don't have 
matrix

set up for them.

  Contrast_log_FTL_ALL Contrast_Log_MTL_ALL  83 0.226593 0.284521  84
0.165517 0.084462  85 -0.1902 -0.0055  86 0.585176 0.639916  87 
-0.01078
0.118011  88 0.161142 0.073762  89 -0.08566 -0.04788  90 -0.13818 
-0.0524

91 -0.02504 -0.21099  92 -0.05027 -0.07594  93 -0.11399 -0.07251  94
-0.07299 -0.08247  95 -0.09507 -0.04817  96 0.207591 0.151695  97 
-0.14224
-0.05097  98 0.06375 -0.0229  99 0.04607 0.06246  100 0.257389 
0.190531  101
-0.0612 -0.10902  102 -0.1981 -0.24698  103 -0.12328 -0.36942  104 
0.269877

0.341989  105 0.125377 0.227183  106 0.087038 -0.05962  107 0.114929
0.096112  108 0.252807 0.305583  109 -0.0895 -0.08586  110 -0.38483 
-0.20671
111 -0.72506 -0.63785  112 -0.37212 -0.21458  113 0.010348 0.117577  
114
-0.09625 -0.0059  115 -0.26291 -0.25986  116 0.056922 0.064041  117 
0.051472

-0.09747  118 -0.05691 0.075005  119 0.117095 -0.15497  120 -0.01329
-0.12473  121 0.098725 0.020522  122 -0.0019 -0.01998  123 -0.12446 
-0.02312
124 0.019234 0.031391  125 0.385366 0.391766  126 0.495518 0.468946  
127

-0.09251 -0.08045  128 0.147965 0.139117  129 -0.03143 -0.02319  130
-0.19801 -0.14924  131 0.014104 -0.01917  132 0.031872 -0.01381  133
-0.01412 -0.04381  134 -0.12864 -0.08527  135 -0.07179 -0.03525  136 
0.31003
0.29553  137 -0.09347 -0.11903  138 -0.10706 -0.16654  139 0.078655 
0.065509
140 0.08279 -0.00766  141 0.181885 0.001414  142 0.345818 0.496323  
143

0.235044 0.095073  144 -0.03022 0.039918  145 0.042577 0.136586  146
0.064208 0.001379  147 -0.02237 -0.03009  148 -3.55E-05 0.040197  149
0.011168 0.087116  150 0.019964 0.071822  151 -0.04602 -0.06616  152
0.083087 0.038592  153 0.032078 0.107237  154 -0.21108 -0.22347  155
0.122959 0.297917  156 -0.05898 0.012547  157 -0.07584 -0.21588  158
-0.00929 -0.06864  159 -0.01211 -0.04559  160 0.090948 0.136582  161
0.016974 0.018259  162 -0.04083 0.016245  163 -0.20328 -0.31678






On Sat, Nov 24, 2012 at 8:22 PM, Bert Gunter gunter.ber...@gene.com 
wrote:



1. The model is correct :  lm( y~ x + offset(x))
( AFAICS)

2. Read the posting guide, please: Code? I do not know what you mean 
by:


 this resulted in a regression line that was plotted perpendicular 
to

the data when added with the abline function.

Of course, maybe someone else will groc this.

3. I wonder if you really want to do what you are doing, anyway. For
example, in comparing two assays to see whether they give similar
results, you would **not** do what you are doing. If you care to 
follow up
on this, I suggest you post complete context to a statistical 
mailing list,
not here, like stats.stackexchange .com.  Also, feel free to ignore 
me, of

course. I'm just guessing.

Cheers,
Bert

Cheers,
Bert


On Sat, Nov 24, 2012 at 4:27 PM, Catriona Hendry 
hen...@gwmail.gwu.eduwrote:



Hi!

I have a question that is probably very basic, but I cannot figure 
out how
to do it. I simply need to compare the significance of a regression 
slope

against a slope of 1, instead of the default of zero.

I know this topic has been posted before, and I have tried to use 
the
advice given to others to fix my problem. I tried the offset 
command based

on one of these advice threads as follows:

Regression - lm(y~x+offset(1*x))

but this resulted in a regression line that was plotted 
perpendicular to

the data when added with the abline function.

I would be extremely grateful for your help!!

Thanks!!

Cat

[[alternative HTML version deleted]]


Re: [R] How to do an infinite sum in R

2012-11-16 Thread Albyn Jones
Perhaps it would help to think before you compute.

albyn

On Fri, Nov 16, 2012 at 09:30:32AM -0800, Simon Bolivar wrote:
 I'm having trouble to do an infinite sum in R
 
 I want to do the infinite sum of 1/(1+n)
 
 how would I do this in R?
 
 Thank You
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-do-an-infinite-sum-in-R-tp4649770.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to generate a set of random numbers that sum to 1 with uniform distribution of elements

2012-11-07 Thread Albyn Jones
What uniform distribution do you want for the columns?  The average
value of X_k given \sum X_i = 1 must be 1/n.  If you start with 
X_i ~ U(0,1), the result must be skewed.

If you generate X_i uniformly distributed on (0, 2/n), the conditional
distribution given the sum is 1 will be less skewed.

  U - matrix(runif(1000*100)/50,nrow=1000)
  s - apply(U,1,sum)
  V - U/s
  qqplot(U[,1],V[,1])

albyn

On Wed, Nov 07, 2012 at 05:02:10AM -0800, Bärbel wrote:
 Hi,
 I am looking for a way to generate a matrix of random numbers in a way that
 each row of the matrix would sum to 1 and that the numbers in the columns of
 the matrix would have a uniform distribution. So far I have found several
 ways to create random numbers that would sum to 1, but then the distribution
 of the individual elements is more or less skewed - there are much more
 small numbers than larger ones. 
 Any help and suggestions?
 - Bärbel
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-generate-a-set-of-random-numbers-that-sum-to-1-with-uniform-distribution-of-elements-tp4648695.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] peer-reviewed (or not) publications on R

2012-10-30 Thread Albyn Jones

More links on reproducible research:

Opinion: Open and Free: Software and Scientific Reproducibility
Seismological Research Letters
Volume 83 · Number 5 · September/October 2012

Reproducible Research in Computational Science
Roger D. Peng
Science 2 December 2011: 1226-1227.

albyn

On 2012-10-30 9:05, R. Michael Weylandt wrote:
On Tue, Oct 30, 2012 at 2:22 PM, Paul Artes 
paul_h_ar...@yahoo.co.uk wrote:

Dear Friends,

I'm contributing to a paper on a new R package for a clinical 
(medicine,
ophthalmology) audience, and part of the mission is to encourage 
people who
might be occasional users of Excel or SPSS, to become more familiar 
with R.
I'd really appreciate any pointers to more recent papers that 
describe R,
it's growth (statistics on user base, number of packages, volume of 
help

list traffic) and application in many diverse fields. Published
peer-reviewed papers of course would be best, but I'd appreciate any
pointers to other resources and compilations that might float around
somewhere. Is there anything bibliometric (number of citations)?  I 
will

happily send something back to the list...

Best wishes

Paul



Two possible starting points would be the Journal of Statistical
Software or the R Journal.

There's also this interesting paper -- http://arxiv.org/abs/1210.0530
-- which doesn't touch R to the best of my memory, but explains why
FOSS + Science is a good idea and sketches (one group's ideas of) 
best

practices.

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mac Text editors

2012-09-26 Thread Albyn Jones

Have you looked at aquamacs? (emacs for the mac).
its at aquamacs.org.

albyn

On 2012-09-26 17:48, Steven Wolf wrote:

Hi everyone,

I've recently moved from using a windows machine to a Mac (some might
call it an upgrade, others not…I'll let you be the judge).  Once I
started using Notepad ++ on my windows machine, I really began to 
like
it.  Unfortunately, I'm not sure what the free text editor options 
are

for the Mac (Notepad ++ is windows only).  I've dabbled with Linux
before and used Emacs/ESS there.  However, I seem to remember 
fighting

pretty hard to make that work and the OSX file structure isn't that
intuitive to me yet.  (For example, where is my .emacs file?)

What text editors are best for the Mac, keeping in mind that I'm
probably going to use them via the command line interface (e.g. X11 
or

Terminal).

Thanks!
-Steve

___
Steven Wolf
Research Associate
CREATE for STEM Institute
115 H Erickson Hall
Michigan State University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Course

2012-08-28 Thread Albyn Jones
Dear Pedro 

in your R session, enter the commands

   license()
   RShowDoc(COPYING)

   R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.

Those imply no restriction on charging a fee for presenting courses in R.

albyn  

On Tue, Aug 28, 2012 at 03:42:54PM +0200, Pedro Henrique Lamarão Souza wrote:
  
 
 Hello, 
 
 I am a student of Materials Engineering and I want to minister an
 introductory course of R at the university I study here in Brazil. I know R is
 a free software, but I just want to know if I do need a special authorization
 for doing it. The course will be totaly free and I also will not receive any
 money for doing it. The idea is just to show the program. 
 
 --
 
 
 Atenciosamente,
 Pedro Lamar??o
 ITEC/UFPA/PPGEM/GPEMAT
  
   [[alternative HTML version deleted]]
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova of lme objects (model1, model2) gives different results depending on order of models

2012-05-31 Thread Albyn Jones
No, both yield the same result: reject the null hypothesis,
which always corresponds to the restricted (smaller) model.

albyn

On Thu, May 31, 2012 at 12:47:30PM +0100, Chris Beeley wrote:
 Hello-
 
 I understand that it's convention, when comparing two models using
 the anova function anova(model1, model2), to put the more
 complicated (for want of a better word) model as the second model.
 However, I'm using lme in the nlme package and I've found that the
 order of the models actually gives opposite results. I'm not sure if
 this is supposed to be the case or if I have missed something
 important, and I can't find anything in the Pinheiro and Bates book
 or in ?anova, or in Google for that matter which unfortunately only
 returns results about ANOVA which isn't much help. I'm using the
 latest version of R and nlme, just checked both.
 
 Here is the code and output:
 
  PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal,
 random=~1|Case, na.action=na.omit)
 
  PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal,
 random=~1|Case, na.action=na.omit,
 +  correlation=corAR1(form=~Date|Case))
 
  anova(PHQmodel1, PHQmodel2) # accept model 2
 Model df  AIC  BIClogLik   Test
 L.Ratio p-value
 PHQmodel1 1  8 48784.57 48840.43 -24384.28
 PHQmodel2 2  9 48284.68 48347.51 -24133.34 1 vs 2 501.8926 .0001
 
  PHQmodel1=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal,
 random=~1|Case, na.action=na.omit,
 +  correlation=corAR1(form=~Date|Case))
 
  PHQmodel2=lme(PHQ~Age+Gender+Date*Treatment, data=compfinal,
 random=~1|Case, na.action=na.omit)
 
  anova(PHQmodel1, PHQmodel2) # accept model 2
  Model df  AIC  BIClogLik   Test
 L.Ratio p-value
 PHQmodel1 1  9 48284.68 48347.51 -24133.34
 PHQmodel2 2  8 48784.57 48840.43 -24384.28 1 vs 2 501.8926 .0001
 
 In both cases I am led to accept model 2 even though they are
 opposite models. Is it really just that you have to put them in the
 right order? It just seems like if there were say four models you
 wouldn't necessarily be able to determine the correct order.
 
 Many thanks,
 Chris Beeley, Institute of Mental Health, UK
 
 ...session info follows
 
  sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: i386-pc-mingw32/i386 (32-bit)
 
 locale:
 [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
 Kingdom.1252
 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
 [5] LC_TIME=English_United Kingdom.1252
 
 attached base packages:
 [1] grid  stats graphics  grDevices utils datasets
 methods   base
 
 other attached packages:
  [1] gridExtra_0.9  RColorBrewer_1.0-5 car_2.0-12
 nnet_7.3-1 MASS_7.3-17
  [6] xtable_1.7-0   psych_1.2.4languageR_1.4
 nlme_3.1-104   ggplot2_0.9.1
 
 loaded via a namespace (and not attached):
  [1] colorspace_1.1-1 dichromat_1.2-4  digest_0.5.2 labeling_0.1
 lattice_0.20-6   memoise_0.1
  [7] munsell_0.3  plyr_1.7.1   proto_0.3-9.2
 reshape2_1.2.1   scales_0.2.1 stringr_0.6
 [13] tools_2.15.0
 
  packageDescription(nlme)
 Package: nlme
 Version: 3.1-104
 Date: 2012-05-21
 Priority: recommended
 Title: Linear and Nonlinear Mixed Effects Models
 Authors@R: c(person(Jose, Pinheiro, comment = S version),
 person(Douglas, Bates, comment =
up to 2007), person(Saikat, DebRoy, comment = up
 to 2002), person(Deepayan,
Sarkar, comment = up to 2005), person(R-core, email
 = r-c...@r-project.org, role =
c(aut, cre)))
 Author: Jose Pinheiro (S version), Douglas Bates (up to 2007),
 Saikat DebRoy (up to 2002), Deepayan
Sarkar (up to 2005), the R Core team.
 Maintainer: R-core r-c...@r-project.org
 Description: Fit and compare Gaussian linear and nonlinear
 mixed-effects models.
 Depends: graphics, stats, R (= 2.13)
 Imports: lattice
 Suggests: Hmisc, MASS
 LazyLoad: yes
 LazyData: yes
 License: GPL (= 2)
 BugReports: http://bugs.r-project.org
 Packaged: 2012-05-23 07:28:59 UTC; ripley
 Repository: CRAN
 Date/Publication: 2012-05-23 07:37:45
 Built: R 2.15.0; x86_64-pc-mingw32; 2012-05-29 12:36:01 UTC; windows
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Study design question; MLB; pay and performance.

2012-04-19 Thread Albyn Jones
There is an ASA section on statistics in sports, you might start
looking there...

http://www.amstat.org/sections/sis/

albyn

On Thu, Apr 19, 2012 at 10:05:39AM -0500, N. S. Miceli, Ph.D. wrote:
 Dear List Members,
 
 I am in the process of designing a study examining pay and
 performance in Major League Baseball across several seasons, and
 before I get too deeply into it, I'd like to ask whether the group
 members think that performance across seasons is independent, or if
 it needs to be treated like a time-series variable so that lack of
 independence can be controlled.
 
 Any ideas or considerations that need to be taken into account would
 be appreciated.
 
 Regards,
 
 Nick Miceli
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a loop with an indefinite end term

2012-04-10 Thread Albyn Jones
Here are a couple of constructions that work.  

albyn
===

num - rep(0,10)
for (i in 2:10)   { 
  num[i] - num[i-1] + 5 
  if(num[i]  20) break
} 
 num
 [1]  0  5 10 15 20 25  0  0  0  0

or

num - rep(0,10)
done - FALSE
i - 2
while(!done){
 num[i] - num[i-1] + 5
 if(num[i]  20) done - TRUE
 i - i + 1
}
 num
 [1]  0  5 10 15 20 25  0  0  0  0


On Tue, Apr 10, 2012 at 10:48:34AM -0400, Steve Lavrenz wrote:
 Everyone, 
 
 I'm very new to R, especially when it comes to loops and functions, so
 please bear with me if this is an elementary question. I cannot seem to
 figure out how to construct a loop which runs a function until a certain
 value is computed. For example, say I have the following: 
 
 num = numeric (10) 
 num [1] = 0 
 for (i in 2:10)   { 
   num [i] = num [i-1] + 5 
 } 
 
 This adds 5 to the preceding spot of a vector of length 10 to get the value
 in the current spot. However, say I don't just want to run this for 10
 spots; rather I want to run it until a certain value (say, 100) is computed.
 How I construct my loop to do this? 
 
 Thanks!
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a loop with an indefinite end term

2012-04-10 Thread Albyn Jones
What's wrong with

   num - rep(0,10)
   done - FALSE
   i - 2
   while(!done){
num[i] - num[i-1] + 5
if(num[i]  20) done - TRUE
i - i + 1
}
   num - num[1:(i-1)]

You can delete the unused tail when you finish.  Iteratively
reassigning to a vector of undefined length is possible too, but that
is much slower:

 x - numeric()
 system.time(for(i in 1:10){ x[i] - i})
   user  system elapsed 
 29.641  27.509  57.163 

 y - numeric(10)
 system.time(for(i in 1:10){ y[i] - i})
   user  system elapsed 
  0.537   0.000   0.532 
 

albyn


On Tue, Apr 10, 2012 at 11:53:08AM -0400, Steve Lavrenz wrote:
 Albyn,
 
 Thanks for your help. This however, still assumes that I have to define an
 array of length 10. Is there a way that I can construct this so that my
 array is exactly as long as the number of spots I need to reach my threshold
 value?
 
 Thanks,
 
 -Steve
 
 -Original Message-
 From: Albyn Jones [mailto:jo...@reed.edu] 
 Sent: Tuesday, April 10, 2012 11:46 AM
 To: Steve Lavrenz
 Cc: r-help@r-project.org
 Subject: Re: [R] Creating a loop with an indefinite end term
 
 Here are a couple of constructions that work.  
 
 albyn
 ===
 
 num - rep(0,10)
 for (i in 2:10)   { 
   num[i] - num[i-1] + 5 
   if(num[i]  20) break
 } 
  num
  [1]  0  5 10 15 20 25  0  0  0  0
 
 or
 
 num - rep(0,10)
 done - FALSE
 i - 2
 while(!done){
  num[i] - num[i-1] + 5
  if(num[i]  20) done - TRUE
  i - i + 1
 }
  num
  [1]  0  5 10 15 20 25  0  0  0  0
 
 
 On Tue, Apr 10, 2012 at 10:48:34AM -0400, Steve Lavrenz wrote:
  Everyone,
  
  I'm very new to R, especially when it comes to loops and functions, so 
  please bear with me if this is an elementary question. I cannot seem 
  to figure out how to construct a loop which runs a function until a 
  certain value is computed. For example, say I have the following:
  
  num = numeric (10)
  num [1] = 0 
  for (i in 2:10)   { 
num [i] = num [i-1] + 5
  }
  
  This adds 5 to the preceding spot of a vector of length 10 to get the 
  value in the current spot. However, say I don't just want to run this 
  for 10 spots; rather I want to run it until a certain value (say, 100) is
 computed.
  How I construct my loop to do this? 
  
  Thanks!
  
  
  [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 --
 Albyn Jones
 Reed College
 jo...@reed.edu
 
 
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prediciting sports team scores

2012-01-17 Thread Albyn Jones

Robin Lock at St Lawrence has done this for hockey, see
http://it.stlawu.edu/~chodr/faq.html

As I recall, he has a poisson regression model with parameters for  
offense and defense, and perhaps home 'field' advantage.


I confess I am skeptical that this is the right approach for football  
- teams adjust their strategy and tactics as a function of the  
opponent and the current match score.  Teams are trying to maximize  
the probability of getting a result, not the probability of scoring  
goals.  The poisson model corresponds to a constant rate for scoring.


albyn

Quoting kerry1912 kerry1...@hotmail.com:


I am working on predicitng the scores for a days worth of matches of team
sports. I have already collected data for the teams for the season we are
concentrating on.

I have been fitting poisson models for football games and have worked out
what model is best and which predictor variables are most important.

We would now like to predict the probability distribution for the scores for
each team. eg. What is the probability of Manchester United vs Chelsea
ending 1-1?

--
View this message in context:  
http://r.789695.n4.nabble.com/Prediciting-sports-team-scores-tp4303708p4303708.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate quantiles of a custom function

2012-01-04 Thread Albyn Jones
FWIW, the integral of a mixture density is the same mixture of the  
CDFs, so you can use the pbeta functions:


 pcustom - function(x) (pbeta(x,2,6) + pbeta(x,6,2))/2

albyn

Quoting Gerhard felds...@gmx.net:


Am Dienstag, 3. Januar 2012, 19:51:36 schrieb Prof. Dr. Matthias Kohl:

D - AbscontDistribution(d = function(x) dbeta(x, 2, 6) + dbeta(x,6,2),
low = 0, up = 1, withStand = TRUE)


Dear all,

thank you all again for your help.

So, summing up, (in case this might be useful to other beginners - like me)
this is how it can be done:


library(distr)

dcustom - function(x) {
  (dbeta(x,2,6) + dbeta(x,6,2))/2# I need to divide by 2 to get 1 as
 # result of 
integration;
}

pcustom - function(x) {
  integrate(dmyspeaker,0,x)$value
}

D - AbscontDistribution(d = dcustom, low = 0, up = 1, withStand = TRUE)

qcustom - function(x){
  q(D)(x)
}


Best,

Gerhard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate quantiles of a custom function

2012-01-03 Thread Albyn Jones

What do quantiles mean here?  If you have a mixture density, say

   myf - function(x,p0) p0*dbeta(x,2,6) + (1-p0)*dbeta(x,6,2)

then I know what quantiles mean.  To find the Pth quantile use uniroot  
to solve for the x such that myf(x,p0) - P =0.


albyn

Quoting VictorDelgado victor.m...@fjp.mg.gov.br:



Gerhard wrote



Suppose I create a custom function, consisting of two beta-distributions:

myfunction - function(x) {
  dbeta(x,2,6) + dbeta(x,6,2)
}

How can I calculate the quantiles of myfunction?

Thank you in advance,

Gerhard




Gehard, if do you want to know the quantiles of the new distribution created
by myfunction. Maybe you can also do:

x - seq(0,1,.01) # insert your 'x'
q - myfunction(x)
# And:
quantile(x)

  0%  25%  50%  75% 100%
0.00 1.476177 2.045389 2.581226 2.817425

# This gives the sample quantiles. You can also look foward to simulations
(like Bert Gunter had suggested) to know better the properties of
distributions quantiles obtained after 'myfunction'.



-
Victor Delgado
cedeplar.ufmg.br P.H.D. student
www.fjp.mg.gov.br reseacher
--
View this message in context:  
http://r.789695.n4.nabble.com/calculate-quantiles-of-a-custom-function-tp4256887p4257551.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculate quantiles of a custom function

2012-01-03 Thread Albyn Jones

right.  replace dbetas with pbetas.

albyn

Quoting Duncan Murdoch murdoch.dun...@gmail.com:


On 03/01/2012 1:33 PM, Albyn Jones wrote:

What do quantiles mean here?  If you have a mixture density, say

myf- function(x,p0) p0*dbeta(x,2,6) + (1-p0)*dbeta(x,6,2)

then I know what quantiles mean.  To find the Pth quantile use uniroot
to solve for the x such that myf(x,p0) - P =0.


You would want to integrate first...

Duncan Murdoch


albyn

Quoting VictorDelgadovictor.m...@fjp.mg.gov.br:



 Gerhard wrote



 Suppose I create a custom function, consisting of two beta-distributions:

 myfunction- function(x) {
   dbeta(x,2,6) + dbeta(x,6,2)
 }

 How can I calculate the quantiles of myfunction?

 Thank you in advance,

 Gerhard




 Gehard, if do you want to know the quantiles of the new  
distribution created

 by myfunction. Maybe you can also do:

 x- seq(0,1,.01) # insert your 'x'
 q- myfunction(x)
 # And:
 quantile(x)

   0%  25%  50%  75% 100%
 0.00 1.476177 2.045389 2.581226 2.817425

 # This gives the sample quantiles. You can also look foward to simulations
 (like Bert Gunter had suggested) to know better the properties of
 distributions quantiles obtained after 'myfunction'.



 -
 Victor Delgado
 cedeplar.ufmg.br P.H.D. student
 www.fjp.mg.gov.br reseacher
 --
 View this message in context:
  
http://r.789695.n4.nabble.com/calculate-quantiles-of-a-custom-function-tp4256887p4257551.html

 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding all triangles in a graph

2011-12-27 Thread Albyn Jones

Taral

The general problem of finding subgraphs with a given structure (motifs)
is hard (ie computationally expensive).  There is some literature...

have you looked at the package igraph, function graph.motifs()?

albyn

Quoting Taral tar...@gmail.com:


I have the adjacency matrix of a graph. I'm trying to find all
triangles (embeddings of C_3). This doesn't work:

index = function(l) seq(l)[l]
pairs = do.call(rbind, lapply(seq(nrow(adj)), function(x) cbind(x,
index(adj[x,]
triangles = do.call(rbind, apply(pairs, 1, function(x) cbind(x,
index(adj[x[1],]  adj[x[2],]

I'm absolutely certain I've gone down the wrong path here. O great R
gurus, your guidance would be most welcome.

--
Taral tar...@gmail.com
Please let me know if there's any further trouble I can give you.
    -- Unknown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical tests and measures for cone-like distributions?

2011-12-21 Thread Albyn Jones
Have you tried plotting log(y) vs log(x)?

albyn

On Wed, Dec 21, 2011 at 02:18:56PM +, Matthew Gwynne wrote:
 Hi,
 
 Are there any special statistical tests, or functions in R I could use
 to measure cone-like distributions?
 
 I have several data-sets, which I've been plotting parts of as 2D
 plots,  where I get a cone-like distribution of the data points.
 That is, the data appears to be bounded by two non-parallel lines,
 starting at the origin, giving rise to a cone-like appearance.
 
 I guess I could work out it's bounding lines fairly easily, but this
 seems perhaps a fairly naive thing to do, and I was wondering if there
 are any standard tests to do? I guess one might consider the angle of
 the cone, how much of the cone is filled out, density of points
 within the cone and so on.
 
 I don't even know if thinking of a cone makes any sense, what it
 would mean or whether I should simply be thinking of a linear
 regression line with some non-constant variance?
 
 An example plot is at
 http://cs.swan.ac.uk/~csmg/aes/plots/canon_1_3_4_r1_v_cfs.pdf if
 anyone is interested.
 
 Thanks in advance!
 
 Matthew Gwynne
 http://cs.swan.ac.uk/~csmg/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] map at fips level using multiple variables

2011-12-08 Thread Albyn Jones
since each county gets a single color, you would need to combine two
color schemes.  I am skeptical that this will work well, but you could
try using rgb().  For example - code one variable using the red
component, the other using the blue component.

albyn

On Wed, Dec 07, 2011 at 11:14:27PM -0500, bby2...@columbia.edu wrote:
 Hi David,
 
 Sorry it sounds vague.
 
 Here is my current code, which gives the distribution of family size
 at US county level. You will see on a US map family size
 distribution represented by different colors.
 
 Now if i have another variable income, which has 3 categories(50k,
 50k-80k,80k). How do I show 5x3 categories on the map? I guess I
 could always create a third variable to do this. Just wondering
 maybe there is a function to do this readily?
 
 Thank you!
 Bonnie Yuan
 
 y=data1$size
 x11()
 hist(y,nclass=15) # histogram of y
 fivenum(y)
 # specify the cut-off point of y
 y.colorBuckets=as.numeric(cut(y, c(1,2, 3,6)))
 # legend showing the cut-off points.
 legend.txt=c(0-1,1-2,2-3,3-6,6)
 colorsmatched=y.colorBuckets[match(county.fips$fips,fips[,1])]
 x11()
 map(county, col = colors[colorsmatched], fill = TRUE, resolution = 0)
 map(state, col = white, fill = FALSE, add = TRUE, lty = 1, lwd = 0.2)
 title(Family Size)
 legend(bottom, legend.txt, horiz = TRUE, fill = colors, cex=0.7)
 
 
 
 Quoting David Winsemius dwinsem...@comcast.net:
 
 
 On Dec 7, 2011, at 6:12 PM, bby2...@columbia.edu wrote:
 
 Hi, I just started playing with county FIPS feature in maps
 package  which allows geospatial visualization of variables on
 US county  level. Pretty cool.
 
 Got code?
 
 
 I did some search but couldn't find answer to this question--how
 can I map more than 2 variables on US map?
 
 2 variables is a bit on the vague side for programming purposes.
 
 For example, you can map by the breakdown of income or family
 size.  How do you further breakdown based on the values of both
 variables  and show them on the county FIPS level?
 
 breakdown suggests a factor construct. If so, then :
 
 ?interaction
 
 But the show part of the question remains very vague.
 
  Can't you be a bit more specific? What DO you want?
 
 -- 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Discrepancy with p.value from t.test

2011-11-01 Thread Albyn Jones
The print method is the issue:

 t.out - t.test(b1,b2,var.equal=T)
 t.out$p.value
[1] 4.108001e-38
 t.out$statistic
t 
-15.93656 

albyn


On Tue, Nov 01, 2011 at 02:40:15PM -0500, Jonathan Thayn wrote:
 Sometimes the p.value returned by t.test() is the same that I calculate using 
 pt() and sometimes it's not. I don't understand the difference. I'm sure 
 there is a simple explanation but I haven't been able to find it, even after 
 looking at the code for t.test.default. I apologize if this is a basic and 
 obvious question. For example:
 
  data(sleep)
  t.test(extra~group,data=sleep,var.equal=T)
 
 # the p.value returned is 0.07939
 
  2*pt(-1.8608,18)   # using the t.statistic and the df returned above
 [1] 0.0791887
 
 These p.values are the same. However, they are different when I use a 
 different dataset:
 
  data(beavers)
  b1 - beaver1$temp
  b2 - beaver2$temp
  t.test(b1,b2,var.equal=T)
 
 # the p.value returned is 2.2e-16
 
  2*pt(-15.9366,212)   # using the t.statistic and the df returned above
 [1] 4.10686e-38
 
 
 Jonathan B. Thayn, Ph.D.
 Illinois State University
 Department of Geography and Geology
 200A Felmley Hall
 Normal, Illinois 61790
 
 (309) 438-8112
 jth...@ilstu.edu
 my.ilstu.edu/~jthayn
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumVar and cumSkew

2011-09-15 Thread Albyn Jones
Wow, this takes me back a ways :-)

As I recall, BMDP used updating algorithms to compute sample means,
variances, and covariances in one pass through the dataset.

You need the updating algorithm for means:

  \bar{X}_{n} =  \frac{n-1}{n}\bar{X}_{n-1}+\frac{X_n}{n}

You can work out the algorithm for variances using the same idea,
or consult:

   Algorithms for Computing the Sample Variance: 
   Analysis and Recommendations Algorithms for Computing the Sample Variance: 
   Tony F. Chan, Gene H. Golub, Randall J. LeVeque
   The American Statistician, Vol. 37, No. 3 (Aug., 1983), pp. 242-247

I think the updating and other algorithms are described therein.

albyn

On Thu, Sep 15, 2011 at 02:57:40PM -0700, D_Tomas wrote:
 Hi there, 
 
 I need to do the same thing as cumsum but with the variance and skewness. I
 have tried to do a loop for like this: 
var.value - vector(mode = numeric, length = length(daily))
 for (i in (1:length(daily))) { 
var.value[i] - var(daily[1:i])
 }
 
 But because my dataset is so huge, I run out of memory.
 
 
 Any ideas?!?!
 
 Much appreciate it!
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/cumVar-and-cumSkew-tp3816899p3816899.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Density function: Area under density plot is not equal to 1. Why?

2011-09-08 Thread Albyn Jones
Look at

area - sum(a$y)*(a$x[1]-a$y[2])

The problem appears to be a$x[1]-a$y[2]; that is not the length of
the base of an approximating rectangle, whatever it is :-)

albyn

On Thu, Sep 08, 2011 at 11:36:23AM -0400, Gonçalo Ferraz wrote:
 Hi, I have a vector 'data' of 58 probability values (bounded between 0 and 1) 
 and want to draw a probability density function of these values. For this, I 
 used the commands:
 
 data - runif(58)
 
 a - density(data, from=0, to=1)
 plot(a, type=l,lwd=3)
 
 But then, when I try to approximate the area under the plotted curve with the 
 command:
 
 area - sum(a$y)*(a$x[1]-a$y[2])
 
 I get an area that is clearly smaller than 1.
 
 Strangely, if I don't bound the density function with 'to=0,from=1' (which is 
 against my purpose because it extends the pdf beyond the limits of a 
 probability value), I get an area of 1.000. This suggests that I am computing 
 the area well, but using the density function improperly.
 
 Why is this happening? Does anyone know how to constrain the density function 
 while still getting a true pdf (summing to 1 under the curve) at the end? 
 Should I use a different function? I read through the density function notes 
 but could not figure out a solution.
 
 Thank you!
 
 Gonçalo
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating p-value for 1-tailed test in a linear model

2011-08-22 Thread Albyn Jones
For H_0: beta = 0, then the correct p-value is

 pt(tvalue,df)

regardless of the sign of tvalue.  Negative tvalues of large magnitude
will yield small p-values.

albyn

On Mon, Aug 22, 2011 at 05:22:06PM +, Ben Bolker wrote:
 Campomizzi, Andrew J acampomizzi at neo.tamu.edu writes:
 
  On 20/08/11 10:20, Andrew Campomizzi wrote:
   Hello,
  
   I'm having trouble figuring out how to calculate a p-value for a 1-tailed
   test of beta_1 in a linear model fit using command lm.  My model has only 
   1
   continuous, predictor variable.  I want to test the null hypothesis beta_1
   is= 0.  I can calculate the p-value for a 2-tailed test using the code
   2*pt(-abs(t-value), df=degrees.freedom), where t-value and 
   degrees.freedom
   are values provided in the summary of the lm.  The resulting p-value is 
   the
   same as provided by the summary of the lm for beta_1.  I'm unsure how to
   change my calculation of the p-value for a 1-tailed test.
  
 
   Isn't it just 
 
 pt(tvalue,df=degrees.freedom,lower.tail=FALSE)
 
 if the value is positive (and expected to be positive) or
 
 pt(tvalue,df=degrees.freedom)
 
 if the value is negative (and expected to be negative)?
 
   In fact, if the value is in the expected direction, I think you
 can just leave out the multiplication by 2 and get the right answer ...
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate two sets of random numbers that are correlated

2011-08-11 Thread Albyn Jones
Here is a simple method

z1 - rpois(n,mu1)
z2 - rpois(n,mu1)
z3 - rpois(n,mu2)

Y - z1 + z3
X - z2 + z3

Cov(X,Y) = mu2, so Cor(X,Y) = mu2/(mu1+mu2)

albyn

On Thu, Aug 11, 2011 at 09:01:50AM -0700, Kathie wrote:
 almost forgot. In fact, I want to generate correlated Poisson random vectors.
 Thank you anyway
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/generate-two-sets-of-random-numbers-that-are-correlated-tp3736161p3736287.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] useR! 2011 T-shirt competition

2011-05-27 Thread Albyn Jones
Me too, or three!

albyn

On Fri, May 27, 2011 at 12:47:45PM -0400, Carl Witthoft wrote:
 More important question:  can us non-attendees buy one?
 
 I made myself a UseR t-shirt a couple yrs ago and wouldn't mind
 adding to my collection.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with 2-D plot of k-mean clustering analysis

2011-05-18 Thread Albyn Jones
One idea:  Pick the three largest clusters, their centers determine a plane.
project your data into that plane.

albyn

On Wed, May 18, 2011 at 06:55:39PM +0200, Claudia Beleites wrote:
 Hi Meng,
 
   I would like to use R to perform k-means clustering on my data which
 included 33 samples measured with ~1000 variables. I have already used
 kmeans package for this analysis, and showed that there are 4 clusters in my
 data. However, it's really difficult to plot this cluster in 2-D format
 since the huge number of variables. One possible way is to project the
 multidimensional space into 2-D platform, but I could not find any good way
 to do that. Any suggestions or comments will be really helpful!
 For suggestions it would be extremely helpful to tell us what kind
 of variables your 1000 variables are.
 
 Parallel coordinate plots plot values over (many) variables. Whether
 this is useful, depends very much on your variables: E.g. I have
 spectral channels, they have an intrinsic order and the values have
 physically the same meaning (and almost the same range), so the
 parallel coordinate plot comes naturally (it produces in fact the
 spectra).
 
 Claudia
 
 
 
 Thanks,
 
 Meng
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 -- 
 Claudia Beleites
 Spectroscopy/Imaging
 Institute of Photonic Technology
 Albert-Einstein-Str. 9
 07745 Jena
 Germany
 
 email: claudia.belei...@ipht-jena.de
 phone: +49 3641 206-133
 fax:   +49 2641 206-399
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power Analysis

2011-04-18 Thread Albyn Jones
First, note that you are doing two separate power calculations,
one with n=2 and sd = 1.19, the other with n=3 and sd = 4.35.
I will assume this was on purpose.  Now...

 power.t.test(n = 2, delta = 13.5, sd = 1.19, sig.level = 0.05)

 Two-sample t test power calculation 

  n = 2
  delta = 13.5
 sd = 1.19
  sig.level = 0.05
  power = 0.9982097
alternative = two.sided

Now, with n=2, the power is already .99.  With n=1, there are zero df.
So, what n corresponds to a power of .8?  

 power.t.test(n = 1.6305, delta = 13.5, sd = 1.19, sig.level = 0.05)

 Two-sample t test power calculation 

  n = 1.6305
  delta = 13.5
 sd = 1.19
  sig.level = 0.05
  power = 0.8003734
alternative = two.sided

It looks like 1.63 subjects will do the job :-)

Finally, look at the power.t.test function, there is a line that explains
your error message:

 else if (is.null(n)) 
n - uniroot(function(n) eval(p.body) - power, c(2, 1e+07))$root

power.t.test() is making the sensible assumption that we only care about 
sample sizes of at least n = 2

albyn

On Mon, Apr 18, 2011 at 02:31:19PM -0700, Schatzi wrote:
 I am trying to do a power analysis to get the number of replicas per
 treatment.
 
 If I try to get the power it works just fine:
 setn=c(2,3)
 sdx=c(1.19,4.35)
 power.t.test(n = setn, delta = 13.5, sd = sdx, sig.level = 0.05,power =
 NULL)
 
 If I go the other way to obtain the n I have problems.
 sdx=c(1.19,4.35)
 pow=c(.8,.8)
 power.t.test(n = NULL, delta = 13.5, sd = sdx, sig.level = 0.05, power =
 0.8)
 
 Is there any way to do this? Thank you.
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Power-Analysis-tp3458786p3458786.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power Analysis

2011-04-18 Thread Albyn Jones
Yes, Richard Savage used to call this inter ocular data;
the answer should leap up and strike you right between the eyes...

albyn

On Mon, Apr 18, 2011 at 05:23:05PM -0500, David Cross wrote:
 It seems to me, with deltas this large (relative to the SD), that a 
 significance test is a moot point!
 
 David Cross
 d.cr...@tcu.edu
 www.davidcross.us
 
 
 
 
 On Apr 18, 2011, at 5:14 PM, Albyn Jones wrote:
 
  First, note that you are doing two separate power calculations,
  one with n=2 and sd = 1.19, the other with n=3 and sd = 4.35.
  I will assume this was on purpose.  Now...
  
  power.t.test(n = 2, delta = 13.5, sd = 1.19, sig.level = 0.05)
  
  Two-sample t test power calculation 
  
   n = 2
   delta = 13.5
  sd = 1.19
   sig.level = 0.05
   power = 0.9982097
 alternative = two.sided
  
  Now, with n=2, the power is already .99.  With n=1, there are zero df.
  So, what n corresponds to a power of .8?  
  
  power.t.test(n = 1.6305, delta = 13.5, sd = 1.19, sig.level = 0.05)
  
  Two-sample t test power calculation 
  
   n = 1.6305
   delta = 13.5
  sd = 1.19
   sig.level = 0.05
   power = 0.8003734
 alternative = two.sided
  
  It looks like 1.63 subjects will do the job :-)
  
  Finally, look at the power.t.test function, there is a line that explains
  your error message:
  
  else if (is.null(n)) 
 n - uniroot(function(n) eval(p.body) - power, c(2, 1e+07))$root
  
  power.t.test() is making the sensible assumption that we only care about 
  sample sizes of at least n = 2
  
  albyn
  
  On Mon, Apr 18, 2011 at 02:31:19PM -0700, Schatzi wrote:
  I am trying to do a power analysis to get the number of replicas per
  treatment.
  
  If I try to get the power it works just fine:
  setn=c(2,3)
  sdx=c(1.19,4.35)
  power.t.test(n = setn, delta = 13.5, sd = sdx, sig.level = 0.05,power =
  NULL)
  
  If I go the other way to obtain the n I have problems.
  sdx=c(1.19,4.35)
  pow=c(.8,.8)
  power.t.test(n = NULL, delta = 13.5, sd = sdx, sig.level = 0.05, power =
  0.8)
  
  Is there any way to do this? Thank you.
  
  --
  View this message in context: 
  http://r.789695.n4.nabble.com/Power-Analysis-tp3458786p3458786.html
  Sent from the R help mailing list archive at Nabble.com.
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
  
  -- 
  Albyn Jones
  Reed College
  jo...@reed.edu
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MLE where loglikelihood function is a function of numerical solutions

2011-04-10 Thread Albyn Jones

Hi Kristian

The obvious approach is to treat it like any other MLE problem:  evaluation
of the log-likelihood is done as often as necessary for the optimizer  
you are using: eg a call to optim(psi,LL,...)  where LL(psi) evaluates  
the log likelihood at psi.  There may be computational shortcuts that  
would work if you knew that LL(psi+eps) were well approximated by  
LL(psi), for the values of eps used to evaluate numerical derivatives  
of LL.  Of course, then you might need to write your own custom  
optimizer.


albyn

Quoting Kristian Lind kristian.langgaard.l...@gmail.com:


Hi there,

I'm trying to solve a ML problem where the likelihood function is a function
of two numerical procedures and I'm having some problems figuring out how to
do this.

The log-likelihood function is of the form L(c,psi) = 1/T sum [log (f(c,
psi)) - log(g(c,psi))], where c is a 2xT matrix of data and psi is the
parameter vector. f(c, psi) is the transition density which can be
approximated. The problem is that in order to approximate this we need to
first numerically solve 3 ODEs. Second, numerically solve 2 non-linear
equations in two unknowns wrt the data. The g(c,psi) function is known, but
dependent on the numerical solutions.
I have solved the ODEs using the deSolve package and the 2 non-linear
equations using the BB package, but the results are dependent on the
parameters.

How can I write a program that will maximise this log-likelihood function,
taking into account that the numerical procedures needs to be updated for
each iteration in the maximization procedure?

Any help will be much appreciated.


Kristian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MLE where loglikelihood function is a function of numerical solutions

2011-04-10 Thread Albyn Jones
to clarify: by  if you knew that LL(psi+eps) were well approximated  
by LL(psi), for the values of eps used to evaluate numerical  
derivatives of LL. 

I mean the derivatives of LL(psi+eps) are close to the derivatives of LL(psi),
and perhaps you would want the hessian to be close as well.

albyn

Quoting Albyn Jones jo...@reed.edu:


Hi Kristian

The obvious approach is to treat it like any other MLE problem:  evaluation
of the log-likelihood is done as often as necessary for the  
optimizer you are using: eg a call to optim(psi,LL,...)  where  
LL(psi) evaluates the log likelihood at psi.  There may be  
computational shortcuts that would work if you knew that LL(psi+eps)  
were well approximated by LL(psi), for the values of eps used to  
evaluate numerical derivatives of LL.  Of course, then you might  
need to write your own custom optimizer.


albyn

Quoting Kristian Lind kristian.langgaard.l...@gmail.com:


Hi there,

I'm trying to solve a ML problem where the likelihood function is a function
of two numerical procedures and I'm having some problems figuring out how to
do this.

The log-likelihood function is of the form L(c,psi) = 1/T sum [log (f(c,
psi)) - log(g(c,psi))], where c is a 2xT matrix of data and psi is the
parameter vector. f(c, psi) is the transition density which can be
approximated. The problem is that in order to approximate this we need to
first numerically solve 3 ODEs. Second, numerically solve 2 non-linear
equations in two unknowns wrt the data. The g(c,psi) function is known, but
dependent on the numerical solutions.
I have solved the ODEs using the deSolve package and the 2 non-linear
equations using the BB package, but the results are dependent on the
parameters.

How can I write a program that will maximise this log-likelihood function,
taking into account that the numerical procedures needs to be updated for
each iteration in the maximization procedure?

Any help will be much appreciated.


Kristian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there an exact binomial k-sample test of equivalence?

2011-03-10 Thread Albyn Jones

Presumably the null hypothesis is that at least one of the differences
is larger in absolute magnitude than the chosen epsilon.  I expect that
your procedure would be conservative: if it rejects the null  
hypothesis, then you are ok, but presumably what you really want would  
be based on a joint confidence region for all the proportions.


albyn

Quoting ?ukasz R?c?awowicz lukasz.reclawow...@gmail.com:


Hi, I've got one silly question for evening.

I don't know is this reasonable, but can test with two the most extreme
proportions from the samples could be good enough evidence for testing
equivalence, or should I have to look for something else...?

--
Mi³ego dnia

[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test for equivalence

2011-02-14 Thread Albyn Jones
Reading the original post it was clear to me that the poster was looking for
a test of equivalence, but obviously there was room for interpretation!

albyn

On Mon, Feb 14, 2011 at 09:46:13AM -0700, Greg Snow wrote:
 Reading the original post it is fairly clear that the original poster's 
 question does not match with the traditional test of equivalence, but rather 
 is trying to determine distinguishable or indistinguishable.  If the test 
 in my suggestion is statistically significant (and note I did not suggest 
 only testing the interaction) then that meets one possible interpretation of 
 distinguishable, a non-significant result could mean either equivalence or 
 low power, the combination of which could be an interpretation of 
 indistinguishable.
 
 I phrased my response as a question in hopes that the original poster would 
 think through what they really wanted to test and get back to us with further 
 details.  It could very well be that my response is very different from what 
 they were thinking, but explaining how it does not fit will better help us 
 understand the real problem.
 
 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: Albyn Jones [mailto:jo...@reed.edu]
  Sent: Sunday, February 13, 2011 9:53 PM
  To: Greg Snow
  Cc: syrvn; r-help@r-project.org
  Subject: Re: [R] Test for equivalence
  
  testing the null hypothesis of no interaction is not the same as a
  test of equivalence for the two differences.  There is a literature on
  tests of equivalence.  First you must develop a definition of
  equivalence, for example the difference is in the interval (-a,a).
  Then, for example,  you test the null hypothesis that the difference
  is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests).  One
  simple procedure: check to see if the 90% CI for the difference
  (difference of the differences or the interaction effect) is contained
  in the interval (-a,a).
  
  albyn
  
  Quoting Greg Snow greg.s...@imail.org:
  
   Does it make sense for you to combine the 2 data sets and do a 2-way
   anova with treatment vs. control as one factor and experiment number
   as the other factor?  Then you could test the interaction and
   treatment number factor to see if they make a difference.
  
   --
   Gregory (Greg) L. Snow Ph.D.
   Statistical Data Center
   Intermountain Healthcare
   greg.s...@imail.org
   801.408.8111
  
  
   -Original Message-
   From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
   project.org] On Behalf Of syrvn
   Sent: Saturday, February 12, 2011 7:30 AM
   To: r-help@r-project.org
   Subject: [R] Test for equivalence
  
  
   Hi!
  
   is there a way in R to check whether the outcome of two different
   experiments is statistically distinguishable or indistinguishable?
  More
   preciously, I used the wilcoxon test to determine the differences
   between
   controls and treated subjects for two different experiments. Now I
   would
   like to check whether the two lists of analytes obtained are
   statistically
   distinguishable or indistinguishable
  
   I tried to use a equivalence test from the 'equivalence' package in
  R
   but it
   seems that this test is not applicable to my problem. The test in
  the
   'equivalence' package just determines similarity between two
  conditions
   but
   I need to compare the outcome of two different experiments.
  
   My experiments are constructed as follows:
  
   Exp1:
   8 control samples
   8 treated samples
   - determine significantly changes (List A)
  
   Exp2:
   8 control samples
   8 treated samples
   - determine significantly changes (List B)
  
  
   Now i would like to check whether List A and List B are
  distinguishable
   or
   indistinguishable.
  
   Any advice is very much appreciated!
  
   Best,
   beginner
   --
   View this message in context: http://r.789695.n4.nabble.com/Test-
  for-
   equivalence-tp3302739p3302739.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/posting-
   guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
 
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible

Re: [R] Test for equivalence

2011-02-13 Thread Albyn Jones
testing the null hypothesis of no interaction is not the same as a  
test of equivalence for the two differences.  There is a literature on  
tests of equivalence.  First you must develop a definition of  
equivalence, for example the difference is in the interval (-a,a).   
Then, for example,  you test the null hypothesis that the difference  
is in [a,inf) or (-inf,-a] (a TOST, or two one sided tests).  One  
simple procedure: check to see if the 90% CI for the difference  
(difference of the differences or the interaction effect) is contained  
in the interval (-a,a).


albyn

Quoting Greg Snow greg.s...@imail.org:

Does it make sense for you to combine the 2 data sets and do a 2-way  
anova with treatment vs. control as one factor and experiment number  
as the other factor?  Then you could test the interaction and  
treatment number factor to see if they make a difference.


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of syrvn
Sent: Saturday, February 12, 2011 7:30 AM
To: r-help@r-project.org
Subject: [R] Test for equivalence


Hi!

is there a way in R to check whether the outcome of two different
experiments is statistically distinguishable or indistinguishable? More
preciously, I used the wilcoxon test to determine the differences
between
controls and treated subjects for two different experiments. Now I
would
like to check whether the two lists of analytes obtained are
statistically
distinguishable or indistinguishable

I tried to use a equivalence test from the 'equivalence' package in R
but it
seems that this test is not applicable to my problem. The test in the
'equivalence' package just determines similarity between two conditions
but
I need to compare the outcome of two different experiments.

My experiments are constructed as follows:

Exp1:
8 control samples
8 treated samples
- determine significantly changes (List A)

Exp2:
8 control samples
8 treated samples
- determine significantly changes (List B)


Now i would like to check whether List A and List B are distinguishable
or
indistinguishable.

Any advice is very much appreciated!

Best,
beginner
--
View this message in context: http://r.789695.n4.nabble.com/Test-for-
equivalence-tp3302739p3302739.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integral of PDF

2010-12-02 Thread Albyn Jones
To really understaand it you will have to look at the fortran code
underlying integrate.  I tracked it back through a couple of layers
(dqagi, dqagie, ...  just use google, these are old netlib
subroutines) then decided I ought to get back to grading papers :-)

It looks like the integral is split into the sum of two integrals,
(-Inf,0] and [0,Inf).  


 integrate(function(x) dnorm(x, 350,50), 0, Inf)
1 with absolute error  1.5e-05
 integrate(function(x) dnorm(x, 400,50), 0, Inf)
1.068444e-05 with absolute error  2.1e-05
 integrate(function(x) dnorm(x, 500,50), 0, Inf)
8.410947e-11 with absolute error  1.6e-10
 integrate(function(x) dnorm(x, 500,50), 0, 1000)
1 with absolute error  7.4e-05

The integral from 0 to Inf is the lim_{x - Inf} of the integral
over [0,x].  I suspect that the algorithm is picking an interval
[0,x], evaluating the integral over that interval, and then performing
some test to decide whether to expand the interval.  When the initial
interval doesn't contain much, expanding a little won't add enough to
trigger another iteration.  

albyn

On Thu, Dec 02, 2010 at 04:21:34PM -0500, Doran, Harold wrote:
 The integral of any probability density from -Inf to Inf should equal 1, 
 correct? I don't understand last result below.
 
  integrate(function(x) dnorm(x, 0,1), -Inf, Inf)
 1 with absolute error  9.4e-05
 
  integrate(function(x) dnorm(x, 100,10), -Inf, Inf)
 1 with absolute error  0.00012
 
  integrate(function(x) dnorm(x, 500,50), -Inf, Inf)
 8.410947e-11 with absolute error  1.6e-10
 
  all.equal(integrate(function(x) dnorm(x, 500,50), -Inf, Inf)$value, 0)
 [1] TRUE
 
  sessionInfo()
 R version 2.10.1 (2009-12-14)
 i386-pc-mingw32
 
 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 other attached packages:
 [1] statmod_1.4.6  mlmRev_0.99875-1   lme4_0.999375-35   
 Matrix_0.999375-33 lattice_0.17-26
 
 loaded via a namespace (and not attached):
 [1] grid_2.10.1   nlme_3.1-96   stats4_2.10.1 tools_2.10.1
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integral of PDF

2010-12-02 Thread Albyn Jones
On Thu, Dec 02, 2010 at 06:23:45PM -0500, Ravi Varadhan wrote:

 A simple solution is to locate the mode of the integrand, which should be
 quite easy to do, and then do a coordinate shift to that point and then
 integrate the mean-shifted integrand using `integrate'.
 
 Ravi.

Translation:  think before you compute!

albyn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Albyn Jones
do you have factors (categorical variables) in the model?  it could be
just a parameterization difference.

albyn

On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
 Dear R developers,
 
 I have noticed a discrepancy between the coefficients returned by R's glm()
 for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
 and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
 Fisher's scoring, but the difference is something like 100 SE on the
 intercept.  What accounts for such a huge difference?
 
 Thank you for your time.
 
 Ben Godlove
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to detect if a vector is FP constant?

2010-11-08 Thread Albyn Jones
how about 

   all.equal(x,rep(mean(x),length(x)))
or

   all.equal(x,rep(mean(x),length(x), tolerance=...)

albyn

On Mon, Nov 08, 2010 at 06:45:00PM -0600, Hadley Wickham wrote:
 Hi all,
 
 What's the equivalent to length(unique(x)) == 1 if want to ignore
 small floating point differences?  Should I look at diff(range(x)) or
 sd(x) or something else?  What cut off should I use?
 
 If it helps to be explicit, I'm interested in detecting when a vector
 is constant for the purpose of visual display.  In other words, if I
 rescale x to [0, 1] do I have enough precision to get at least 100
 unique values.
 
 Thanks!
 
 Hadley
 
 -- 
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clustering

2010-10-28 Thread Albyn Jones
I have worked with seismic data measured at 100hz, and had no trouble
locating events in long records (several times the size of your
dataset).  20 minutes is high frequency?  what kind of waves are
these?  what is the wavelength? some details would help.

albyn

On Thu, Oct 28, 2010 at 05:00:10AM -0700, dpender wrote:
 
 I am looking to use R in order to determine the number of extreme events for
 a high frequency (20 minutes) dataset of wave heights that spans 25 years
 (657,432) data points.
 
 I require the number, spacing and duration of the extreme events as an
 output.
 
 I have briefly used the clusters function in evd package.
 
 Can anyone suggest a more appropriate package to use for such a large
 dataset?
 
 Thanks,
 
 Doug
 
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Clustering-tp3017056p3017056.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] (slightly) - OpenOffice Calc and text files

2010-10-13 Thread Albyn Jones
How about emacs?

albyn

On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote:
 Hello all,
 .
 Have any of you found a nice (or at least predictable) way to use OO Calc to 
 edit files like this?  If it insists on thinking for me, I wish it would 
 think in 24 hour time and 4 digit years :)  I work on Linux, so Excel is off 
 the table, but another spreadsheet or text editor would be a viable option, 
 as would configuration changes to Calc.
 
 Bill
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The future of R - Ross Ihaka stirs discussions around the web

2010-09-13 Thread Albyn Jones
There was an award session at the Vancouver JSM where both Ross Ihaka
and Robert Gentleman spoke.  The simply-start-over post sounds very
much like Ross's JSM talk.  Robert's response was much more positive -
coming from the perspective of developing BioConductor I think.  I
don't recall the details, but the cartoon version is something like R
isn't really broken.

albyn

On Mon, Sep 13, 2010 at 09:51:39PM +0200, Tal Galili wrote:
 Hello all,
 
 There is currently a (very !) lively discussions happening around the
 web, surrounding the following topics:
 1) Is R efficient? (scripting wise, and performance wise)
 2) Should R be written from scratch?
 3) What should be the license of R (if it was made a new)?
 
 Very serious people have taken part in the debates so far.  I hope to let
 you know of the places I came by, so you might be able to follow/participate
 in these (IMHO) important discussions.
 
 The discussions started in the response for the following blog post on
 Xi'An's blog:
 http://xianblog.wordpress.com/2010/09/06/insane/
 Followed by the (short) response post by Ross Ihaka:
 http://xianblog.wordpress.com/2010/09/13/simply-start-over-and-build-something-better/
 Other discussions started to appear on Andrew Gelman's blog:
 http://www.stat.columbia.edu/~cook/movabletype/archives/2010/09/ross_ihaka_to_r.html
 And (many) more responses started to appear in the hackers news website:
 http://news.ycombinator.com/item?id=1687054
 
 
 I hope these discussions will have fruitful results for our community,
 Tal
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mathematica and R

2010-07-14 Thread Albyn Jones
Take a look at Sage, which is an open source alternative.  It already  
integrates R  (http://www.sagemath.org)


albyn

Quoting David Bickel davidbickel.com+rh...@gmail.com:

What are some effective ways to leverage the strengths of R and  
Mathematica for the analysis of a single data set?


More specifically, are there any functions that can assist with any  
of the following?

1. Calling an R function from Mathematica.
2. Calling a Mathematica function from R.
3. Using XML or another reliable data format to pass vectors,  
matrices, and/or lists from one environment to the other.


Any advice would be appreciated.

David

--
David R. Bickel, PhD
Associate Professor
Ottawa Institute of Systems Biology
Biochem., Micro. and I. Department
Mathematics and Statistics Department
University of Ottawa
451 Smyth Road
Ottawa, Ontario K1H 8M5

http://www.statomics.com

Office Tel: (613) 562-5800 ext. 8670
Office Fax: (613) 562-5185

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Entropy of a Markov chain

2010-07-14 Thread Albyn Jones
The transition matrix is a collection of conditional distributions.   
it would seem natural to compute the entropy of the stationary  
distribution.


albyn

Quoting Wilson, Andrew a.wil...@lancaster.ac.uk:


Does anyone have any R code for computing the entropy of a simple
first or second order Markov chain, given a transition matrix something
like the following (or the symbol vector from which it is computed)?

   AGRe  ARIe  CSRe  DIRe  DSCe   eos
HRMe  SPTe  TOBe
   AGRe 0.000 0.000 0.000 0.000 1.000 0.000
0.000 0.000 0.000
   ARIe 0.000 0.000 0.000 0.000 0.000 1.000
0.000 0.000 0.000
   CSRe 0.000 0.000 0.000 0.000 1.000 0.000
0.000 0.000 0.000
   DIRe 0.000 0.000 0.000 0.000 1.000 0.000
0.000 0.000 0.000
   DSCe 0.167 0.167 0.000 0.000 0.167 0.000
0.167 0.167 0.167
   eos  0.000 0.000 0.000 0.000 1.000 0.000
0.000 0.000 0.000
   HRMe 0.000 0.000 0.000 1.000 0.000 0.000
0.000 0.000 0.000
   NMSe 0.000 0.000 0.000 0.000 1.000 0.000
0.000 0.000 0.000
   TOBe 0.000 0.000 1.000 0.000 0.000 0.000
0.000 0.000 0.000

[The second order matrix would have column names incorporating both
prior states - e.g. SPTe.TOBe.]

I looked around at the various simple entropy functions but couldn't
find anything for this specific problem - they seem mostly to assume a
single numerical vector as input.

Many thanks in advance for any help,

Andrew Wilson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very OT: World Cup Statistics

2010-06-07 Thread Albyn Jones
Paul

The FIFA database doesn't have times that goals are scored either.
The best I have found is at http://www.worldcup-history.com/, but you
have to check individual match reports for the times that goals are
scored.

albyn

On Mon, Jun 07, 2010 at 09:50:34PM +0100, Paul wrote:
 Hello,

 Sorry for the very Off TOpic post, but I need some data on past football 
 (soccer) world cups.  I'd like to find (or calculate) the time to the first 
 goal of each match (where a goal was scored).

 I''ve looked at the uefa website and can't find what I want, maybe someone 
 here can help ?

 Thanks

 Paul.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random walk

2010-05-10 Thread Albyn Jones
Sums of correlated increments have the same correlation as the original
variables...

 library(mvtnorm)
 X- matrix(0,nrow=1000,ncol=2)
 for(i in 1:1000){
 Y - rmvnorm(1000,mean=mu,sigma=S)
 X[i,] - apply(Y,2,sum)
 }
 cor(Y)
  [,1]  [,2]
[1,] 1.000 0.4909281
[2,] 0.4909281 1.000

So, unless you meant that you want the _sample_ correlation to be
pre-specified, you are all set.

albyn

On Sun, May 09, 2010 at 09:20:25PM -0400, Sergio Andrés Estay Cabrera wrote:
 Hi everybody,


 I am trying to generate two random walks with an specific correlation, for 
 example, two random walks of 200 time steps with a correlation 0.7.

 I built the random walks with:

 x-cumsum(rnorm(200, mean=0,sd=1))
 y-cumsum(rnorm(200, mean=0,sd=1))

 but I don't know how to fix the correlation between them.

 With white noise is easy to fix the correlation using the function rmvnorm 
 in the package mvtnorm

 I surfed in the web in the searchable mail archives in the R web site but 
 no references appears.

 If you have some advices to solve this problems I would be very thankful.

 Thanks in advance.

 Sergio A. Estay
 *CASEB *
 Departamento de Ecología
 Universidad Catolica de Chile

 -- 
 “La disciplina no tiene ningún mérito en circunstancias ideales. ” – Habor 
 Mallow

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cluster with mahalanobis distance

2010-03-03 Thread Albyn Jones
Note: this procedure assumes that all clusters have the same covariance matrix.

albyn

On Wed, Mar 03, 2010 at 01:23:37PM -0800, Phil Spector wrote:
 The manhattan distance and the Mahalanobis distances are quite different.  
 One of the main differences is that a covariance matrix is necessary to 
 calculate the Mahalanobis
 distance, so it's not easily accomodated by dist.  There is a function in 
 base R which does calculate the Mahalanobis
 distance -- mahalanobis().  So if you pass a distance matrix
 calculated by mahalanobis() to the clustering function, you'll
 get what you want.
   - Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu


 On Wed, 3 Mar 2010, Tal Galili wrote:

 when you create the distance function to put into the hclust, use:

 dist(x, method = manhattan)


 Tal



 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Wed, Mar 3, 2010 at 9:14 PM, naama nw...@technion.ac.il wrote:


 How can I perform cluster analysis using the mahalanobis distance instead
 of
 the euclidean distance?
 thank you
 Naama Wolf

 --
 View this message in context:
 http://n4.nabble.com/cluster-with-mahalanobis-distance-tp1577038p1577038.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization challenge

2010-01-13 Thread Albyn Jones
The key idea is that you are building a matrix that contains the
solutions to smaller problems which are sub-problems of the big
problem.  The first row of the matrix SSQ contains the solution for no
splits, ie SSQ[1,j] is just the sum of squares about the overall mean
for reading chapters1 through j in one day.  The iteration then uses
row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
chapters in m-1 days) is part of the overall optimal solution, you
have already computed it, and so don't ever need to recompute it.

   TS = SSQ[m-1,j]+(SSQ1[j+1])

computes the vector of possible solutions for SSQ[m,n] (n chapters in n days) 
breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 to
n in 1 day.  j is a vector in the function, and min(TS) is the minimum
over choices of j, ie SSQ[m,n].

At the end, SSQ[128,239] is the optimal value for reading all 239
chapters in 128 days.  That's just the objective function, so the rest
involves constructing the list of optimal cuts, ie which chapters are
grouped together for each day's reading.  That code uses the same
idea... constructing a list of lists of cutpoints.

statisticians should study a bit of data structures and algorithms!

albyn

On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
 WOW, your results give about half the variance of my best optim run (possibly 
 due to my suboptimal use of optim).
 
 Can you describe a little what the algorithm is doing?
 
 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: Albyn Jones [mailto:jo...@reed.edu]
  Sent: Tuesday, January 12, 2010 5:31 PM
  To: Greg Snow
  Cc: r-help@r-project.org
  Subject: Re: [R] optimization challenge
  
  Greg
  
  Nice problem: I wasted my whole day on it :-)
  
  I was explaining my plan for a solution to a colleague who is a
  computer scientist, he pointed out that I was trying to re-invent the
  wheel known as dynamic programming.  here is my code, apparently it is
  called bottom up dynamic programming.  It runs pretty quickly, and
  returns (what I hope is :-) the optimal sum of squares and the
  cut-points.
  
  function(X=bom3$Verses,days=128){
  # find optimal BOM reading schedule for Greg Snow
  # minimize variance of quantity to read per day over 128 days
  #
  N = length(X)
  Nm1 = N-1
  SSQ- matrix(NA,nrow=days,ncol=N)
  Cuts - list()
  #
  #  SSQ[i,j]: the ssqs about the overall mean for the optimal partition
  #   for i days on the chapters 1 to j
  #
  M = sum(X)/days
  CS = cumsum(X)
  SSQ[1,]= (CS-M)^2
  Cuts[[1]]= as.list(1:N)
  #
  for(m in 2:days){
  Cuts[[m]]=list()
  #for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
  for(n in m:N){
CS = cumsum(X[n:1])[n:1]
SSQ1 = (CS-M)^2
j = (m-1):(n-1)
TS = SSQ[m-1,j]+(SSQ1[j+1])
SSQ[m,n] = min(TS)
k = min(which((min(TS)== TS)))+m-1
Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
  }
  }
  list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
  }
  
  $SSQ
  [1] 11241.05
  
  $Cuts
[1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31
  34  37
   [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73
  75  77
   [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103
  105 106
   [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135
  137 138
   [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163
  164 166
   [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194
  196 199
  [109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228
  234 236
  [127] 238 239
  
  
  
  
  On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
   I have a challenge that I want to share with the group.
  
   This is not homework (but I may assign it as such if I teach the
  appropriate class again) and I have found one solution, so don't need
  anything urgent.  This is more for fun to see if others can find a
  better solution than I did.
  
   The challenge:
  
   I want to read a book in a given number of days.  I want to read an
  integer number of chapters each day (there are more chapters than
  days), no stopping part way through a chapter, and at least 1 chapter
  each day.  The chapters are very non uniform in length (some very
  short, a few very long, many in between) so I would like to come up
  with a reading schedule that minimizes the variance of the length of
  the days readings (read multiple short chapters on the same day, long
  chapters are the only one read that day).  I also want to read through
  the book in order (no skipping ahead to combine short chapters that are
  not naturally next to each other.
  
   My thought was that the optim function with method=SANN would be an
  appropriate approach, but my first couple of tries did

Re: [R] optimization challenge

2010-01-13 Thread Albyn Jones
Hi Aaron!  It's always nice to see a former student doing well.

Thanks for the notes and references, too!

albyn

On Wed, Jan 13, 2010 at 07:29:57PM -0500, Aaron Mackey wrote:
 FYI, in bioinformatics, we use dynamic programming algorithms in similar
 ways to solve similar problems of finding guaranteed-optimal partitions in
 streams of data (usually DNA or protein sequence, but sometimes numerical
 data from chip-arrays).  These path optimization algorithms are often
 called Viterbi algorithms, a web search for which should provide multiple
 references.
 
 The solutions are not necessarily unique (there may be multiple
 paths/partitions with identical integer maxima in some systems) and there is
 much research on whether the optimal solution is actually the one you want
 to work with (for example, there may be a fair amount of probability mass
 within an area/ensemble of suboptimal solutions that overall have greater
 posterior probabilities than does the optimal solution singleton).  See
 Chip Lawrence's PNAS paper for more erudite discussion, and references
 therein: www.pnas.org/content/105/9/3209.abstract
 
 -Aaron
 
 P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at
 Reed back in 1993, which started me down a somewhat windy road to
 statistical genomics!
 
 --
 Aaron J. Mackey, PhD
 Assistant Professor
 Center for Public Health Genomics
 University of Virginia
 amac...@virginia.edu
 
 
 On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan rvarad...@jhmi.edu wrote:
 
  Greg - thanks for posting this interesting problem.
 
  Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
  the algorithm guaranteed to find a best solution? (2) can there be
  multiple solutions (it seems like there can be more than 1 solution
  depending on the data)?, and (3) is there a good reference for this and
  similar algorithms?
 
  Thanks  Best,
  Ravi.
 
 
  
  ---
 
  Ravi Varadhan, Ph.D.
 
  Assistant Professor, The Center on Aging and Health
 
  Division of Geriatric Medicine and Gerontology
 
  Johns Hopkins University
 
  Ph: (410) 502-2619
 
  Fax: (410) 614-9625
 
  Email: rvarad...@jhmi.edu
 
  Webpage:
 
  http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
  tmlhttp://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml
 
 
 
 
  
  
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On
  Behalf Of Albyn Jones
  Sent: Wednesday, January 13, 2010 1:19 PM
  To: Greg Snow
  Cc: r-help@r-project.org
  Subject: Re: [R] optimization challenge
 
  The key idea is that you are building a matrix that contains the
  solutions to smaller problems which are sub-problems of the big
  problem.  The first row of the matrix SSQ contains the solution for no
  splits, ie SSQ[1,j] is just the sum of squares about the overall mean
  for reading chapters1 through j in one day.  The iteration then uses
  row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
  chapters in m-1 days) is part of the overall optimal solution, you
  have already computed it, and so don't ever need to recompute it.
 
TS = SSQ[m-1,j]+(SSQ1[j+1])
 
  computes the vector of possible solutions for SSQ[m,n] (n chapters in n
  days)
  breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
  to
  n in 1 day.  j is a vector in the function, and min(TS) is the minimum
  over choices of j, ie SSQ[m,n].
 
  At the end, SSQ[128,239] is the optimal value for reading all 239
  chapters in 128 days.  That's just the objective function, so the rest
  involves constructing the list of optimal cuts, ie which chapters are
  grouped together for each day's reading.  That code uses the same
  idea... constructing a list of lists of cutpoints.
 
  statisticians should study a bit of data structures and algorithms!
 
  albyn
 
  On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
   WOW, your results give about half the variance of my best optim run
  (possibly due to my suboptimal use of optim).
  
   Can you describe a little what the algorithm is doing?
  
   --
   Gregory (Greg) L. Snow Ph.D.
   Statistical Data Center
   Intermountain Healthcare
   greg.s...@imail.org
   801.408.8111
  
  
-Original Message-
From: Albyn Jones [mailto:jo...@reed.edu]
Sent: Tuesday, January 12, 2010 5:31 PM
To: Greg Snow
Cc: r-help@r-project.org
Subject: Re: [R] optimization challenge
   
Greg
   
Nice problem: I wasted my whole day on it :-)
   
I was explaining my plan for a solution to a colleague who is a
computer scientist, he pointed out that I was trying to re-invent the
wheel known as dynamic programming.  here is my code, apparently it is
called bottom up dynamic

Re: [R] optimization challenge

2010-01-12 Thread Albyn Jones
Greg

Nice problem: I wasted my whole day on it :-)

I was explaining my plan for a solution to a colleague who is a
computer scientist, he pointed out that I was trying to re-invent the
wheel known as dynamic programming.  here is my code, apparently it is
called bottom up dynamic programming.  It runs pretty quickly, and
returns (what I hope is :-) the optimal sum of squares and the
cut-points.

function(X=bom3$Verses,days=128){
# find optimal BOM reading schedule for Greg Snow
# minimize variance of quantity to read per day over 128 days
#
N = length(X)
Nm1 = N-1
SSQ- matrix(NA,nrow=days,ncol=N)
Cuts - list()
#
#  SSQ[i,j]: the ssqs about the overall mean for the optimal partition 
#   for i days on the chapters 1 to j
#
M = sum(X)/days
CS = cumsum(X)
SSQ[1,]= (CS-M)^2
Cuts[[1]]= as.list(1:N)
#
for(m in 2:days){
Cuts[[m]]=list()
#for(i in 1:(m-1)) Cuts[[m]][[i]] = Cuts[[m-1]][[i]]
for(n in m:N){
  CS = cumsum(X[n:1])[n:1]
  SSQ1 = (CS-M)^2
  j = (m-1):(n-1)
  TS = SSQ[m-1,j]+(SSQ1[j+1])
  SSQ[m,n] = min(TS)
  k = min(which((min(TS)== TS)))+m-1
  Cuts[[m]][[n]] = c(Cuts[[m-1]][[k-1]],n)
}
}
list(SSQ=SSQ[days,N],Cuts=Cuts[[days]][[N]])
}

$SSQ
[1] 11241.05

$Cuts
  [1]   2   4   7   9  11  13  15  16  17  19  21  23  25  27  30  31  34  37
 [19]  39  41  44  46  48  50  53  56  59  60  62  64  66  68  70  73  75  77
 [37]  78  80  82  84  86  88  89  91  92  94  95  96  97  99 100 103 105 106
 [55] 108 110 112 113 115 117 119 121 124 125 126 127 129 131 132 135 137 138
 [73] 140 141 142 144 145 146 148 150 151 152 154 156 157 160 162 163 164 166
 [91] 167 169 171 173 175 177 179 181 183 185 186 188 190 192 193 194 196 199
[109] 201 204 205 207 209 211 213 214 215 217 220 222 223 225 226 228 234 236
[127] 238 239




On Tue, Jan 12, 2010 at 11:33:36AM -0700, Greg Snow wrote:
 I have a challenge that I want to share with the group.
 
 This is not homework (but I may assign it as such if I teach the appropriate 
 class again) and I have found one solution, so don't need anything urgent.  
 This is more for fun to see if others can find a better solution than I did.
 
 The challenge:
 
 I want to read a book in a given number of days.  I want to read an integer 
 number of chapters each day (there are more chapters than days), no stopping 
 part way through a chapter, and at least 1 chapter each day.  The chapters 
 are very non uniform in length (some very short, a few very long, many in 
 between) so I would like to come up with a reading schedule that minimizes 
 the variance of the length of the days readings (read multiple short chapters 
 on the same day, long chapters are the only one read that day).  I also want 
 to read through the book in order (no skipping ahead to combine short 
 chapters that are not naturally next to each other.
 
 My thought was that the optim function with method=SANN would be an 
 appropriate approach, but my first couple of tries did not give very good 
 results.  I have since come up with an optim with SANN solution that gives 
 what I consider good results (but I accept that better is possible).
 
 Below is a data frame with the lengths of the chapters for the book that 
 originally sparked the challenge for me (but the general idea should work for 
 any book).  Each row represents a chapter (in order) with 3 different 
 measures of the length of the chapter.
 
 For this challenge I want to read the book in 128 days (there are 239 
 chapters).
 
 I will post my solutions in a few days, but I want to wait so that my 
 direction does not influence people from trying other approaches (if there is 
 something better than optim, that is fine).
 
 Good luck for anyone interested in the challenge,
 
 The data frame:
 
 bom3 - structure(list(Chapter = structure(1:239, .Label = c(1 Nephi 1, 
 1 Nephi 2, 1 Nephi 3, 1 Nephi 4, 1 Nephi 5, 1 Nephi 6, 
 1 Nephi 7, 1 Nephi 8, 1 Nephi 9, 1 Nephi 10, 1 Nephi 11, 
 1 Nephi 12, 1 Nephi 13, 1 Nephi 14, 1 Nephi 15, 1 Nephi 16, 
 1 Nephi 17, 1 Nephi 18, 1 Nephi 19, 1 Nephi 20, 1 Nephi 21, 
 1 Nephi 22, 2 Nephi 1, 2 Nephi 2, 2 Nephi 3, 2 Nephi 4, 
 2 Nephi 5, 2 Nephi 6, 2 Nephi 7, 2 Nephi 8, 2 Nephi 9, 
 2 Nephi 10, 2 Nephi 11, 2 Nephi 12, 2 Nephi 13, 2 Nephi 14, 
 2 Nephi 15, 2 Nephi 16, 2 Nephi 17, 2 Nephi 18, 2 Nephi 19, 
 2 Nephi 20, 2 Nephi 21, 2 Nephi 22, 2 Nephi 23, 2 Nephi 24, 
 2 Nephi 25, 2 Nephi 26, 2 Nephi 27, 2 Nephi 28, 2 Nephi 29, 
 2 Nephi 30, 2 Nephi 31, 2 Nephi 32, 2 Nephi 33, Jacob 1, 
 Jacob 2, Jacob 3, Jacob 4, Jacob 5, Jacob 6, Jacob 7, 
 Enos 1, Jarom 1, Omni 1, Words of Mormon 1, Mosiah 1, 
 Mosiah 2, Mosiah 3, Mosiah 4, Mosiah 5, Mosiah 6, Mosiah 7, 
 Mosiah 8, Mosiah 9, Mosiah 10, Mosiah 11, Mosiah 12, 
 Mosiah 13, Mosiah 14, Mosiah 15, Mosiah 16, Mosiah 17, 
 Mosiah 18, Mosiah 19, Mosiah 20, Mosiah 21, Mosiah 22, 
 Mosiah 23, Mosiah 24, Mosiah 25, Mosiah 26, Mosiah 27, 
 Mosiah 28, Mosiah 29, Alma 

Re: [R] physical distributions

2009-12-21 Thread Albyn Jones
Is the Lorentz distribution another name for the Cauchy distribution 
with density

  f(x) = \frac{1}{\pi} \frac{1}{1+x^2}

possibly with location and scale parameters?

albyn

On Mon, Dec 21, 2009 at 05:04:05PM +0100, Markus Häge wrote:
 Hi there,
 
 I was looking for a Lorentz-distribution in R but I didn't find any
 physical distribution. Is there s.th which I can use for fitting.
 
 thanks
 
 Markus
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MLE for a t distribution

2009-12-10 Thread Albyn Jones
k - infinity gives the normal distribution.  You probably don't care
much about the difference between k=1000 and k=10, so you might
try reparametrizing df on [1,infinity) to a parameter on [0,1]...

albyn

On Thu, Dec 10, 2009 at 02:14:26PM -0600, Barbara Gonzalez wrote:
 Given X1,...,Xn ~ t_k(mu,sigma) student t distribution with k degrees
 of freedom, mean mu and standard deviation sigma, I want to obtain the
 MLEs of the three parameters (mu, sigma and k). When I try traditional
 optimization techniques I don't find the MLEs. Usually I just get
 k-infty. Does anybody know of any algorithms/functions in R that can
 help me obtain the MLEs? I am especially interested in the MLE for k,
 the degrees of freedom.
 
 Thank you!
 
 Barbara
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plots with k-means

2009-10-31 Thread Albyn Jones

what is the dimension of your data?

you might try projecting the points into planes defined by 3 cluster centers.

plot, for each cluster, a density plot or histogram of distances to  
the cluster center, and perhaps overlay the density curve for points  
not in the cluster.


albyn

Quoting Iuri Gavronski i...@proxima.adm.br:


Hi,

I'm doing a k-means cluster with 6 clusters and 15 variables. Any
suggestions on how to plot the results?

I've tried the standard xy plot, but couldn't get much of it.

Thansk in advance,

Iuri.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plots with k-means

2009-10-31 Thread Albyn Jones

oops, I see the answer in your question: 15...

Quoting Albyn Jones jo...@reed.edu:


what is the dimension of your data?

you might try projecting the points into planes defined by 3 cluster centers.

plot, for each cluster, a density plot or histogram of distances to  
the cluster center, and perhaps overlay the density curve for points  
not in the cluster.


albyn

Quoting Iuri Gavronski i...@proxima.adm.br:


Hi,

I'm doing a k-means cluster with 6 clusters and 15 variables. Any
suggestions on how to plot the results?

I've tried the standard xy plot, but couldn't get much of it.

Thansk in advance,

Iuri.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast cumulative matrix multiplication

2009-10-30 Thread Albyn Jones
If the matrices are not all the same size, then the order of  
computation will make a difference.  simple example:  A is 1xn, B is  
nx1, C is 1xn.

A(BC) takes n^3 multiplies, while (AB)C requires 2n.

albyn

Quoting Todd Schneider todd.w.schnei...@gmail.com:


Hi all,

I am looking for a function like cumprod() that works for matrix
multiplication.

In other words, I have matrices [M1, M2, ..., Mn], and I want to calculate
[M1, M1%*%M2, M1%*%M2%*%M3, ..., M1%*%...%*%Mn] as quickly as possible.
Right now I'm using a for() loop but it seems like there should be a faster
way.

Any help is appreciated!

Thanks,

Todd Schneider
todd.w.schnei...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what's the R code for wavelet decomposition (Haar transformation)?

2009-10-19 Thread Albyn Jones
There is a dwt() in package:waveslim, reading the help file:

 dwt(x, wf=la8, n.levels=4, boundary=periodic)

wf: Name of the wavelet filter to use in the decomposition.  By
  default this is set to 'la8', the Daubechies orthonormal
  compactly supported wavelet of length L=8 (Daubechies, 1992),
  least asymmetric family. 

I don't see a list of wavelet filter names in the documentation, but
as I recall, there is one in one of the functions: wave.filter().
Look at the code for wave.filer, it includes the following code:

switch(name, haar = select.haar(), d4 = select.d4(), mb4 = select.mb4(), 
w4 = select.w4(), bs3.1 = select.bs3.1(), fk4 = select.fk4(), 
d6 = select.d6(), fk6 = select.fk6(), d8 = select.d8(), 
fk8 = select.fk8(), la8 = select.la8(), mb8 = select.mb8(), 
bl14 = select.bl14(), fk14 = select.fk14(), d16 = select.d16(), 
la16 = select.la16(), mb16 = select.mb16(), la20 = select.la20(), 
bl20 = select.bl20(), fk22 = select.fk22(), mb24 = select.mb24(), 
stop(Invalid selection for wave.filter))

try

dwt(x, wf=haar)

albyn

On Mon, Oct 19, 2009 at 02:41:56PM -0500, stephen sefick wrote:
 What package are you using?  There are quite a few functions that do
 wavelet decomposition.  Have you tried an R site search?
 
 Stephen
 
 On Fri, Oct 16, 2009 at 3:43 PM, Zhen Li z...@bios.unc.edu wrote:
  Dear all,
 
  Using R function dwt, it seems that I cannot specify the wavelet
  transformation like Haar. What's the R code for wavelet decomposition which
  allows me to specify Haar wavelet transformation? Of course, if it can
  include db2, that is even better. In general, I want an R function like
  matlab code dwt. Thanks in advance!
 
  Zhen Li
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Stephen Sefick
 
 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods.  We are mammals, and have not exhausted the
 annoying little problems of being mammals.
 
   -K. Mullis
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating a stochastic matrix with a specified second dominant eigenvalue

2009-10-16 Thread Albyn Jones
ooops!  I tested it on a 3x3, and a 4x4, and it worked both times.
Then I ran it once more and pasted it in without looking carefully...

Here's another half-baked idea.  Generate a random graph with given
degree distribution, turn the incidence matrix into a stochastic
matrix (ie the transition probability is 1/d where d is the degree of
the vertex).  Now all we need is the connection between the degree
distribution and the second eigenvalue :-)  If you want lambda
close to one, perhaps you could generate two graphs, then 
add vertices joining the two graphs until the second eigenvalue
is the right size  of course, this will give a random lambda,
not exactly your chosen value.   

albyn

On Fri, Oct 16, 2009 at 10:32:48AM -0400, Ravi Varadhan wrote:
 A valiant attempt, Albyn!  
 
 Unfortunately, the matrix B is not guaranteed to be a stochastic matrix.  In
 fact, it is not even guaranteed to be a real matrix.  Your procedure can
 generate a B that contains negative elements or even complex elements.  
 
   M = matrix(runif(9),nrow=3)
   M = M/apply(M,1,sum)
   e=eigen(M)
   e$values[2]= .7  
  Q = e$vectors  
  Qi = solve(Q)  
  B = Q %*% diag(e$values) %*% Qi
  
  eigen(B)$values
 [1]  1.  0.7000 -0.04436574
  apply(B,1,sum)
 [1] 1 1 1
  B
[,1]  [,2]   [,3]
 [1,] 0.77737077 0.3340768 -0.1114476
 [2,] 0.20606226 0.2601840  0.5337537
 [3,] 0.08326022 0.2986603  0.6180794
 
 
 Note that B[1,3] is negative.
 
 Another example:
 
   M = matrix(runif(9),nrow=3)
   M = M/apply(M,1,sum)
   e=eigen(M)
   e$values[2]= .95  
  Q = e$vectors  
  Qi = solve(Q)  
  B = Q %*% diag(e$values) %*% Qi
  
  eigen(B)$values
 [1]  1.-0.i  0.9500+0.i -0.09348883-0.02904173i
  apply(B,1,sum)
 [1] 1+0i 1-0i 1+0i
  B
 [,1] [,2] [,3]
 [1,] 0.6558652-0.550613i 0.2408879+0.2212234i 0.1032469+0.3293896i
 [2,] 0.1683119+1.594515i 0.6954317-0.7378503i 0.1362564-0.8566647i
 [3,] 0.2812210-2.462385i 0.2135648+1.2029636i 0.5052143+1.2594216i
 
 
 Note that B has complex elements.
 
 So, I took your algorithm and embedded it in an iterative procedure to keep
 repeating your steps until it found a B matrix that is real and
 non-negative.  Here is that function:
 
 e2stochMat - function(N, e2, maxiter) {
 iter - 0
 
 while (iter = maxiter) {
 iter - iter + 1
 M - matrix(runif(N*N), nrow=N)
 M - M / apply(M,1,sum)
 e - eigen(M)
 e$values[2] -e2  
 Q - e$vectors  
 B - Q %*% diag(e$values) %*% solve(Q)
 real - all (abs(Im(B))  1.e-16)
 positive - all (Re(B)  0)
 if (real  positive) break
 }
 list(stochMat=B, iter=iter)
 }
 
   e2stochMat(N=3, e2=0.95, maxiter=1)  # works
   e2stochMat(N=5, e2=0.95, maxiter=1)  # fails
 
 This works for very small N, say, N = 3, but it fails for larger N.  The
 probability of success is a decreasing function of N and e2.  So, the
 algorithm fails for large N and for values of e2 close to 1.
 
 Thanks for trying.
 
 Best,
 Ravi.
 
 
 ---
 
 Ravi Varadhan, Ph.D.
 
 Assistant Professor, The Center on Aging and Health
 
 Division of Geriatric Medicine and Gerontology 
 
 Johns Hopkins University
 
 Ph: (410) 502-2619
 
 Fax: (410) 614-9625
 
 Email: rvarad...@jhmi.edu
 
 Webpage:
 http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
 tml
 
  
 
 
 
 
 
 -Original Message-
 From: Albyn Jones [mailto:jo...@reed.edu] 
 Sent: Thursday, October 15, 2009 6:56 PM
 To: Ravi Varadhan
 Cc: r-help@r-project.org
 Subject: Re: [R] Generating a stochastic matrix with a specified second
 dominant eigenvalue
 
 I just tried the following shot in the dark:
 
 generate an N by N stochastic matrix, M.  I used
 
  M = matrix(runif(9),nrow=3)
  M = M/apply(M,1,sum)
  e=eigen(M)
  e$values[2]= .7  (pick your favorite lambda, you may need to fiddle 
with the others to guarantee this is second largest.)
  Q = e$vectors
  Qi = solve(Q)
  B = Q %*% diag(e$values) %*% Qi
 
  eigen(B)$values
 [1]  1.  0.7000 -0.08518772
  apply(B,1,sum)
 [1] 1 1 1
 
 I haven't proven that this must work, but it seems to.  Since you can
 verify that it worked afterwards, perhaps the proof is in the pudding.
 
 albyn
 
 
 
 On Thu, Oct 15, 2009 at 06:24:20PM -0400, Ravi Varadhan wrote:
  Hi,
  
   
  
  Given a positive integer N, and a real number \lambda such that 0 
 \lambda
   1,  I would like to generate an N by N stochastic matrix (a matrix with
  all the rows summing to 1), such that it has the second largest eigenvalue
  equal to \lambda (Note: the dominant eigenvalue of a stochastic matrix is
  1).  
  
   
  
  I don't care what the other eigenvalues are.  The second eigenvalue is
  important in that it governs the rate at which the random process given by
  the stochastic matrix converges to its

Re: [R] Generating a stochastic matrix with a specified second dominant eigenvalue

2009-10-15 Thread Albyn Jones
I just tried the following shot in the dark:

generate an N by N stochastic matrix, M.  I used

 M = matrix(runif(9),nrow=3)
 M = M/apply(M,1,sum)
 e=eigen(M)
 e$values[2]= .7  (pick your favorite lambda, you may need to fiddle 
   with the others to guarantee this is second largest.)
 Q = e$vectors
 Qi = solve(Q)
 B = Q %*% diag(e$values) %*% Qi

 eigen(B)$values
[1]  1.  0.7000 -0.08518772
 apply(B,1,sum)
[1] 1 1 1

I haven't proven that this must work, but it seems to.  Since you can
verify that it worked afterwards, perhaps the proof is in the pudding.

albyn



On Thu, Oct 15, 2009 at 06:24:20PM -0400, Ravi Varadhan wrote:
 Hi,
 
  
 
 Given a positive integer N, and a real number \lambda such that 0  \lambda
  1,  I would like to generate an N by N stochastic matrix (a matrix with
 all the rows summing to 1), such that it has the second largest eigenvalue
 equal to \lambda (Note: the dominant eigenvalue of a stochastic matrix is
 1).  
 
  
 
 I don't care what the other eigenvalues are.  The second eigenvalue is
 important in that it governs the rate at which the random process given by
 the stochastic matrix converges to its stationary distribution.
 
  
 
 Does anyone know of an algorithm to do this?
 
  
 
 Thanks for any help,
 
 Ravi.
 
 
 ---
 
 Ravi Varadhan, Ph.D.
 
 Assistant Professor, The Center on Aging and Health
 
 Division of Geriatric Medicine and Gerontology 
 
 Johns Hopkins University
 
 Ph: (410) 502-2619
 
 Fax: (410) 614-9625
 
 Email: rvarad...@jhmi.edu
 
 Webpage:
 http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.
 html
 http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
 tml
 
  
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parameters of Beta distribution

2009-10-08 Thread Albyn Jones

Maithili

I find it really hard to believe that a beta distribution would be a  
reasonable probability model for loss data.  There would have to be an  
upper bound on the size of losses.   What is the process that  
generates the data.  Is there any natural upper bound?  Why is there a  
lower bound greater than zero?


That said, the MLE's would be the min and max, but those will  
underestimate the range of a beta.  It is an elementary exercise to  
see why with the uniform[0,B] (ie beta(1,1)), for which the expected  
value of the max of a sample of size n is B*n/(n+1).  If you have a  
lot of data, this may not bother you.  For an arbitrary beta  
distribution you would have 4 parameters to estimate...probably a  
Bayes estimator would be easiest.


I'll put this one away for an exercise my next math stats course...

albyn

Quoting Maithili Shiva maithili_sh...@yahoo.com:


Dear Albyn,
 Thanks for your reply.
 Yes A and B are unknown. I was just thinking to assign -
 A = min(amounts) and B = max(amounts).
 The actual loss data I am dealing with is large. I am trying to fit  
some statistical distributions to this data. I already have done  
with R code pertaining to many other distributions like Normal,  
Weibull, Pareto, Generalized extreme Value distribution etc. So just  
want to know how to estimate the parameters if I need to check  
whether the Beta distribution fits the loss data.
 Is it possible for you to guide me how to estimate A and B or can I  
assume A = min(amounts) and B = max(Amounts)

 Regards
 Maithili
   --- On Wed, 7/10/09, Albyn Jones jo...@reed.edu wrote:


From: Albyn Jones jo...@reed.edu
Subject: Re: [R] Parameters of Beta distribution
To: jlu...@ria.buffalo.edu
Cc: Maithili Shiva maithili_sh...@yahoo.com,  
r-help@r-project.org, r-help-boun...@r-project.org

Date: Wednesday, 7 October, 2009, 3:30 PM


Are A and B known?  That is, are there known upper and lower bounds
for this credit loss data?  If not, you need to think about how to
estimate those bounds.  Why do you believe the data have a beta distribution?

albyn


On Wed, Oct 07, 2009 at 09:03:31AM -0400, jlu...@ria.buffalo.edu wrote:

Rescale your data x to  (x-A)/(B-A).




Maithili Shiva maithili_sh...@yahoo.com
Sent by: r-help-boun...@r-project.org
10/07/2009 08:39 AM

To
r-help@r-project.org
cc

Subject
[R] Parameters of Beta distribution






 Supose I have a data pertaining to credit loss as
 amounts -
c(46839.50,31177.12,35696.69,21192.57,29200.91,42049.64,42422.19,
44976.18, 32135.36,47936.57,27322.91,37359.09,43179.60, 48381.02,
45872.38, 28057.30,44643.83,36156.33,16037.62, 45432.28)
 I am trying to fit Beta distribution (two parameters distribution but
where lower bound and upper bounds are NOT  0 and 1 respectively). For
this I need to estimate the two parameters of Beta distribution. I found
some code in VGAM pacakge but it deals with standard Beta distribution
i.e. lower bound (say A) = 0 and upper bound (say B) = 1.
 How do I estimate the parameters of the Beta distribution for above data
where A and B are not 0's?
 Please guide.
 Thanking you in advance
 Maithili


   Add whatever you love to the Yahoo! India homepage. Try now!
http://in.yahoo.com/trynew
  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





  Now, send attachments up to 25MB with Yahoo! India Mail. Learn  
how. http://in.overview.mail.yahoo.com/photos


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parameters of Beta distribution

2009-10-08 Thread Albyn Jones

Quoting David Winsemius dwinsem...@comcast.net:


In insurance situation there is typically a cap on the covered  
losses and there is also typically an amount below which it would  
not make sense to offer a policy. So a minimum and a maximum are  
sensible assumptions about loss distributions in may real modeling  
situations.


--
David.


is that cap the same for every policy, or are there different types of  
policies with different bounds?  If there are caps, then other  
probability models mentioned like the Weibull don't seem reasonable,  
unless they are truncated distribution models.


albyn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parameters of Beta distribution

2009-10-07 Thread Albyn Jones
Are A and B known?  That is, are there known upper and lower bounds
for this credit loss data?  If not, you need to think about how to
estimate those bounds.  Why do you believe the data have a beta distribution?

albyn


On Wed, Oct 07, 2009 at 09:03:31AM -0400, jlu...@ria.buffalo.edu wrote:
 Rescale your data x to  (x-A)/(B-A).
 
 
 
 
 Maithili Shiva maithili_sh...@yahoo.com 
 Sent by: r-help-boun...@r-project.org
 10/07/2009 08:39 AM
 
 To
 r-help@r-project.org
 cc
 
 Subject
 [R] Parameters of Beta distribution
 
 
 
 
 
 
  
 Supose I have a data pertaining to credit loss as
  
 amounts - 
 c(46839.50,31177.12,35696.69,21192.57,29200.91,42049.64,42422.19, 
 44976.18, 32135.36,47936.57,27322.91,37359.09,43179.60, 48381.02, 
 45872.38, 28057.30,44643.83,36156.33,16037.62, 45432.28)
  
 I am trying to fit Beta distribution (two parameters distribution but 
 where lower bound and upper bounds are NOT  0 and 1 respectively). For 
 this I need to estimate the two parameters of Beta distribution. I found 
 some code in VGAM pacakge but it deals with standard Beta distribution 
 i.e. lower bound (say A) = 0 and upper bound (say B) = 1.
  
 How do I estimate the parameters of the Beta distribution for above data 
 where A and B are not 0's?
  
 Please guide.
  
 Thanking you in advance
  
 Maithili 
 
 
   Add whatever you love to the Yahoo! India homepage. Try now! 
 http://in.yahoo.com/trynew
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scale or not to scale that is the question - prcomp

2009-08-19 Thread Albyn Jones
scaling changes the metric, ie which things are close to each other.
there is no reason to expect the picture to look the same when you 
change the metric.

On the other hand, your two pictures don't look so different to me.
It appears that the scaled plot is similar to the unscaled plot, with
the roles of the second and third pc reversed, ie the scaled plot is
similar but rotated and distorted.  For example, the observations
forming the strip across the bottom of the first plot form a vertical
strip on the right hand side of the second plot.

albyn

On Wed, Aug 19, 2009 at 02:31:23PM +0200, Petr PIKAL wrote:
 Dear all
 
 here is my data called rglp
 
 structure(list(vzorek = structure(1:17, .Label = c(179/1/1, 
 179/2/1, 180/1, 181/1, 182/1, 183/1, 184/1, 185/1, 
 186/1, 187/1, 188/1, 189/1, 190/1, 191/1, 192/1, 
 R310, R610L), class = factor), iep = c(7.51, 7.79, 5.14, 
 6.35, 5.82, 7.13, 5.95, 7.27, 6.29, 7.5, 7.3, 7.27, 6.46, 6.95, 
 6.32, 6.32, 6.34), skupina = c(7.34, 7.34, 5.14, 6.23, 6.23, 
 7.34, 6.23, 7.34, 6.23, 7.34, 7.34, 7.34, 6.23, 7.34, 6.23, 6.23, 
 6.23), sio2 = c(0.023, 0.011, 0.88, 0.028, 0.031, 0.029, 0.863, 
 0.898, 0.95, 0.913, 0.933, 0.888, 0.922, 0.882, 0.923, 1, 1), 
 p2o5 = c(0.78, 0.784, 1.834, 1.906, 1.915, 0.806, 1.863, 
 0.775, 0.817, 0.742, 0.783, 0.759, 0.787, 0.758, 0.783, 3, 
 2), al2o3 = c(5.812, 5.819, 3.938, 5.621, 3.928, 3.901, 5.621, 
 5.828, 4.038, 5.657, 3.993, 5.735, 4.002, 5.728, 4.042, 6, 
 5), dus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c(ano, ne), class = 
 factor)), .Names = c(vzorek, 
 iep, skupina, sio2, p2o5, al2o3, dus), class = data.frame, 
 row.names = c(NA, 
 -17L))
 
 and I try to do principal component analysis. Here is one without scaling
 
 fit-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2)
 biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
 
 you can see that data make 3 groups according to variables sio2 and dus 
 which seems to be reasonable as lowest group has different value of dus = 
 ano while highest group has low value of sio2.
 
 But when I do the same with scale=T
 
 fit-prcomp(~iep+sio2+al2o3+p2o5+as.numeric(dus), data=rglp, factors=2, 
 scale=T)
 biplot(fit, choices=2:3,xlabs=rglp$vzorek, cex=.8)
 
 I get completely different picture which is not possible to interpret in 
 such an easy way.
 
 So if anybody can advice me if I shall follow recommendation from help 
 page (which says
 The default is FALSE for consistency with S, but in general scaling is 
 advisable.
 or if I shall stay with scale = FALSE and with simply interpretable 
 result?
  
 Thank you.
 
 Petr Pikal
 petr.pi...@precheza.cz
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] truncating values into separate categories

2009-07-31 Thread Albyn Jones

It appears that your difficulty lies in miscounting the number of intervals.


cut(NP, breaks=c(0,1,2,3,4,max(NP)))
 [1] (0,1] (0,1] (1,2] (0,1] (0,1] (1,2] (1,2] (0,1] (3,4] (0,1] NA  
 (4,6] (2,3] (2,3] (0,1]
[16] (4,6] (2,3] (4,6] (0,1] (4,6] (0,1] (1,2] (1,2] (1,2] (3,4] (3,4]  
(0,1] (1,2] (0,1] (2,3]
[31] (2,3] (0,1] (1,2] (1,2] (0,1] (1,2] (0,1] (1,2] (1,2] (2,3] (0,1]  
(0,1] (3,4] (3,4] (0,1]

[46] (0,1] (0,1] (1,2] (1,2] (1,2]
Levels: (0,1] (1,2] (2,3] (3,4] (4,6]


cut(NP, breaks=c(0,1,2,3,max(NP)),labels=c(1,2,3,4+))
 [1] 112112214+   1NA 4+   3 
314+   34+
[19] 14+   12224+   4+   12133 
12212

[37] 1223114+   4+   111222
Levels: 1 2 3 4+


a=cut(NP, breaks=c(0,1,2,3,max(NP)),labels=c(1,2,3,4+))
table(a,exclude=NULL)

a
   123   4+ NA
  19   15691

Generally it is better to let R keep track of the NA's for you.

albyn


Quoting PDXRugger j_r...@hotmail.com:



Hi all,
  Simple question which i thought i had the answer but it isnt so simple for
some reason.  I am sure someone can easily help.  I would like to categorize
the values in NP into 1 of the five values in Per, with the last
category(4) representing values =4(hence 4:max(NP)).  The problem is that
R is reading max(NP) as multiple values instead of range so the lengths of
the labels and the breaks are not matching.  Suggestions?

Per - c(NA, 1, 2, 3,4)

NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, 2, 2, 2,
4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2,
2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2)

Person_CAT - cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per)

--
View this message in context:  
http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24749046.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calculate growth rate of a variable

2009-07-24 Thread Albyn Jones
Your data will have all sorts of patterns (diurnal, seasonal) in
addition to long term trend.  I'd start by smoothing out the cyclic
patterns with loess or gam, then use a secant approximation to the
slope on the smoothed series.  

albyn

On Fri, Jul 24, 2009 at 06:13:00PM +0530, Yogesh Tiwari wrote:
 Dear R Users,
 If a variable, say CO2(ppm), is varying with time. Then how to calculate CO2
 (ppm) growth rate /a-1
 I have CO2 time series (1991-2000), as:
 
 time, year, month, day, hour, min, sec, lat, long, height, CO2
 1991.476722 1991 6 24 0 5 0 -38.93 145.15 4270 353.680
 1991.476741 1991 6 24 0 15 0 -39.20 145.22 4270 353.950
 1991.476747 1991 6 24 0 18 0 -39.43 145.28 4270 353.510
 ---
 2000.740788 2000 9 28 3 5 0 -38.00 145.00 2280 366.750
 2000.740794 2000 9 28 3 8 0 -38.00 145.00 1830 366.550
 2000.740803 2000 9 28 3 13 0 -38.00 145.00 1220 370.550
 
 -- 
 Yogesh K. Tiwari (Dr.rer.nat),
 Scientist,
 Indian Institute of Tropical Meteorology,
 Homi Bhabha Road,
 Pashan,
 Pune-411008
 INDIA
 
 Phone: 0091-99 2273 9513 (Cell)
 : 0091-20-25904350 (O)
 Fax: 0091-20-258 93 825
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assign question

2009-07-20 Thread Albyn Jones
I don't think you want assign() here.

 x1 = rnorm(20)
 min(x1)
[1] -0.9723398

 min(eval(paste(x,1,sep=)))  # not the solution
[1] x1

 min(eval(as.name(paste(x,1,sep=  # a solution
[1] -0.9723398

try:

for(i in 1:27) {
   xener[i] - min(eval(as.name((paste(sa,i,sep=)
   }

albyn

On Mon, Jul 20, 2009 at 01:26:05PM -0500, Erin Hodgess wrote:
 Dear R People:
 
 I have several vectors, sa1, sa2,...sa27 of varying lengths.
 
 I want to produce one vector xener[1:27] which has the minimum of each sa[i].
 
 I'm trying to set up a loop and use the assign statement, but here are
 my results:
 
  for(i in 1:27) {
 + xener[i] - min(assign(paste(sa,i,sep=)))
 + }
 Error in assign(paste(sa, i, sep = )) :
   element 2 is empty;
the part of the args list of '.Internal' being evaluated was:
(x, value, envir, inherits)
 
 
 Any suggestions would be most welcome.
 
 Thanks in advance,
 Erin
 
 
 -- 
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] putting circles/buffers arround points on a 2D graph/plot

2009-07-16 Thread Albyn Jones
It sounds like you might want to draw the convex hull for each group
of points.  There is a package called chplot which appears to
do this, though I haven't used it...

albyn

On Thu, Jul 16, 2009 at 06:23:54PM +0100, Ahmed, Sadia wrote:
 Hi, 
 
 I'll try to keep my question brief and to the point, to anyone who helps 
 'THANK YOU!'
 
 I want to get a circle/ring/buffer or some other form of drawn line around 
 certain points on a x,y plot, the points usually don't form a circle, often 
 they form a 'wobbly' shape, so ideally I'd like something that follows the 
 shape the points make.
 
 I'll start by explaining what I've got (I'm sorry if it's a little long, but 
 it's all important for explaining what I want to do):
 
 I have a simple 2D plot of an x variable, called 'percent.forest'', and a y 
 variable called 'metric'. the data for this plot is in a large data frame 
 with a third variable called 'Buffer', there are 5 buffer sizes (250, 500, 
 1000, 3000, 1).
 
 to create the plot I simply used 
 plot(percent.forest,metric)
 
 to add in the data about buffer size, I have extracted the percent forest and 
 metric data by:
 buff250-subset(data1,data1$Buffer==250)
 buff500-subset(data1,data1$Buffer==500)
 buff1000-subset(data1,data1$Buffer==1000)
 buff3000-subset(data1,data1$Buffer==3000)
 buff1-subset(data1,data1$Buffer==1)
 
 i then used the 'points' function to plot the percent forest, metric data 
 points in different colours according to buffer size, using this code (first 
 2 lines divides the data appropriately, 3rd line is points):
 metric.250-(log(buff250$metric))
 percent.forest.250-(buff250$percent.forest)
 points(percent.forest.250,metric.250,col='skyblue')
 
 I repeated this for all 5 buffer sizes, so on the plot 
 (percent.forest,metric) the points are coloured differently according to the 
 size of the buffer used.
 This works fine (the next bit is the problem):
 
 Does anyone know how I can get R to draw around the different coloured 
 points, so that i get 5 rings/wiggly lines on the plot that clearly shows the 
 extent of each of the buffers, you can sort of see a pattern in the colours 
 but its not as clear as it would be if I could draw around it). 
 
 If anyone wants to see a picture of the plot (if my explanation isn't clear 
 enough) I'd be happy to email one directly to you (the R instructions says 
 that the mailing list scraps attachments so I've not put one on here)
 
 Thank you for reading this, 
 Any help will be greatly appreciated,
 sadia  
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix inversion-different answers from LAPACK and LINPACK

2009-06-17 Thread Albyn Jones
As you seem to be aware, the matrix is poorly conditioned:

 kappa(PLLH,exact=TRUE)
[1] 115868900869

It might be worth your while to think about reparametrizing.

albyn

On Wed, Jun 17, 2009 at 11:37:48AM -0400, avraham.ad...@guycarp.com wrote:
 
 Hello.
 
 I am trying to invert a matrix, and I am finding that I can get different
 answers depending on whether I set LAPACK true or false using qr. I had
 understood that LAPACK is, in general more robust and faster than LINPACK,
 so I am confused as to why I am getting what seems to be invalid answers.
 The matrix is ostensibly the Hessian for a function I am optimizing. I want
 to get the parameter correlations, so I need to invert the matrix. There
 are times where the standard solve(X) returns an error, but solve(qr(X,
 LAPACK=TRUE)) returns values. However, there are times, where the latter
 returns what seems to be bizarre results.
 
 For example, an example matrix is PLLH (Pareto LogLikelihood Hessian)
 
   alphatheta
 alpha 1144.6262175141619082 -0.01290205604828788
 theta   -0.012902056048  0.00155437688485563
 
 Running plain solve (PLLH) or solve (qr(PLLH)) returns:
 
[,1]  [,2]
 alpha0.0137466171688024  1141.53956787721
 theta 1141.5395678772131305 101228592.41439932585
 
 However, running solve(qr(PLLH, LAPACK=TRUE)) returns:
 
   [,1]  [,2]
 [1,]0.01374661716880240.0137466171688024
 [2,] 1141.5395678772131305 1141.5395678772131305
 
 which seems wrong as the original matrix had identical entries on the off
 diagonals.
 
 I am neither a programmer nor an expert in matrix calculus, so I do not
 understand why I should be getting different answers using different
 libraries to perform the ostensibly same function. I understand the
 extremely small value of d²X/d(theta)² (PLLH[2,2]) may be contributing to
 the error, but I would appreciate confirmation, or correction, from the
 experts on this list.
 
 Thank you very much,
 
 --Avraham Adler
 
 
 
 PS: For completeness, the QR decompositions per R under LINPACK and
 LAPACK are shown below:
 
  qr(PLLH)
 $qr
   alpha theta
 alpha -1144.6262175869414932095 0.0129020653695122277
 theta-0.112768491646264 0.987863187747112
 
 $rank
 [1] 2
 
 $qraux
 [1] 1.993641619511209 0.987863187747112
 
 $pivot
 [1] 1 2
 
 attr(,class)
 [1] qr
 
 
  qr(PLLH, LAPACK=TRUE)
 $qr
alpha theta
 alpha -1144.62621758694149320945 0.0129020653695122277
 theta-0.0563842458249248 0.987863187747112
 
 $rank
 [1] 2
 
 $qraux
 [1] 1.993642 0.00
 
 $pivot
 [1] 1 2
 
 attr(,useLAPACK)
 [1] TRUE
 attr(,class)
 [1] qr
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SAGE: was large factorials

2009-04-23 Thread Albyn Jones
On Wed, Apr 22, 2009 at 08:26:51PM -0700, Ben Bolker wrote:

 
   ???  octave is a Matlab clone, not a Mathematica clone (I would be
 interested in an open source Mathematica clone ...) ???
 

You might take a look at Sage.  It is not a mathematica clone, but 
open source mathematical software which has a development community 
similar to that of R.   

http://www.sagemath.org/

albyn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] non-positive definite matrix remedies?

2009-03-11 Thread Albyn Jones
That's an interesting problem.  

My first thought was to choose the closest positive definite matrix to
the given matrix, say in the least squares sense.  However, the
2x2 diagonal matrix with diagonal (1,0) makes it clear that there
isn't a closest pd symmetric matrix.

Perhaps multiple imputation would work: impute a complete data
matrix X, compute polycor(X), and repeat.  Will the average of these
positive definite matrices be positive definite I think it would if
you were computing pearson correlations, but I am not sure about the
polychoric case.

   albyn

On Wed, Mar 11, 2009 at 04:20:27PM -0600, Matthew Keller wrote:
 Hi all,
 
 For computational reasons, I need to estimate an 18x18 polychoric
 correlation matrix two variables at a time (rather than trying to
 estimate them all simultaneously using ML). The resulting polychoric
 correlation matrix I am getting is non-positive definite, which is
 problematic because I'm using this matrix later on as if it were a
 legitimately estimated correlation matrix (in order to fit an SEM
 model). I could add to the diagonal I suppose until it becomes
 positive definite. Does anyone have any other ideas on how to deal
 with this problem, and what the strengths and weaknesses of different
 approaches are?
 
 Thanks in advance,
 
 Matt
 
 
 -- 
 Matthew C Keller
 Asst. Professor of Psychology
 University of Colorado at Boulder
 www.matthewckeller.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cholesky Decomposition in R

2009-03-10 Thread Albyn Jones
try Cholesky() in package Matrix.

albyn

On Tue, Mar 10, 2009 at 02:33:01PM -0700, Manli Yan wrote:
   Hi everyone:
   I try to use r to do the Cholesky Decomposition,which is A=LDL',so far I
 only found how to decomposite A in to  LL' by using chol(A),the function
 Cholesky(A) doesnt work,any one know other command to decomposte A in to
 LDL'
 
   My r code is:
 library(Matrix)
 A=matrix(c(1,1,1,1,5,5,1,5,14),nrow=3)
 
  chol(A)
  [,1] [,2] [,3]
 [1,]111
 [2,]022
 [3,]003
 
  Cholesky(A)
 Error in function (classes, fdef, mtable)  :
   unable to find an inherited method for function Cholesky, for signature
 matrix
 
 whatz wrong???
 thanks~
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract statistics from lm()

2009-02-28 Thread Albyn Jones

Look at the data structure produced by summary()


names(summary(lm.D9))

 [1] call  terms residuals coefficients
 [5] aliased   sigma dfr.squared
 [9] adj.r.squared fstatisticcov.unscaled

Now look at the data structure for the coefficients in the summary:


summary(lm.D9)$coefficients

Estimate Std. Error   t value Pr(|t|)
(Intercept)5.032  0.2202177 22.850117 9.547128e-15
groupTrt  -0.371  0.3114349 -1.191260 2.490232e-01


class(summary(lm.D9)$coefficients)

[1] matrix


summary(lm.D9)$coefficients[,3]

(Intercept)groupTrt
  22.850117   -1.191260

albyn

Quoting Bogaso bogaso.christo...@gmail.com:



Hi, perhaps this question was answered previously however I could not find
them. My problem is how how to extract a particular statistic from the
result given by lm(). For e.g.

ctl - c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt - c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group - gl(2,10,20, labels=c(Ctl,Trt))
weight - c(ctl, trt)

summary(lm.D9 - lm(weight ~ group))


Call:
lm(formula = weight ~ group)

Residuals:
Min  1Q  Median  3Q Max
-1.0710 -0.4938  0.0685  0.2462  1.3690

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)   5.0320 0.2202  22.850 9.55e-15 ***
groupTrt -0.3710 0.3114  -1.1910.249
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.07308,Adjusted R-squared: 0.02158
F-statistic: 1.419 on 1 and 18 DF,  p-value: 0.249


Here I want to extract the values of t-stat, Pr(|t|) individually. Can
anyone please guide me how to get them?
--
View this message in context:  
http://www.nabble.com/Extract-statistics-from-lm%28%29-tp22265770p22265770.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Incorrect p value for binom.test?

2009-02-05 Thread Albyn Jones
The computation 2*sum(dbinom(c(10:25),25,0.061)) does not correspond
to any reasonable definition of p-value.  For a symmetric
distribution, it is fine to use 2 times the tail area of one tail.
For an asymetric distribution, this is silly.

The standard definition given in elementary texts is usually somthing like

 the probability of observing a test statistic at least as 
  extreme as the observed value

or more formally as 

  the smallest significance level at which the observed result would
  lead to rejection of the null hypothesis

Either definition requires further decisions (what does at least as
extreme mean?).  In an asymetric distribution, at least as far from
E(X|H0)  is not a good interpretation, since deviations in one direction 
may be much less probable than deviations in the other direction.  

Peter's interpretation corresponds both to the interpretation of at
least as extreme as at least as improbable, and also to the
smallest significance level interpretation for the test implemented
in binom.test, ie the Clopper-Pearson exact test.  2 times the upper
tail area corresponds to neither.  The fact that it is implemented in
SAS and appears in a text do not rescue it from that fundamental
failure to make sense.

albyn

On Thu, Feb 05, 2009 at 09:48:11PM +0100, Peter Dalgaard wrote:
 Michael Grant wrote:
 I believe the binom.test procedure is producing one tailed p values
 rather than the two tailed value implied by the alternative hypothesis
 language.  A textbook and SAS both show 2*9.94e-07 = 1.988e-06 as the
 two tailed value.  As does the R summation syntax from R below.  It
 looks to me like the alternative hypothesis language should be revised
 to something like  ... greater than or equal to ...  Am I mistaken?


 Yes. Or maybe, it is a matter of definition. The problem is that

  (0:25)[dbinom(0:25,25,.061) = dbinom(10,25,.061)]
  [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

 so with R's definition of more extreme, all such values are in the upper 
 tail.

 Actually, if you look at the actual distribution, I think you'll agree that 
 it is rather difficult to define a lower tail with positive probability 
 that corresponds to X = 10.

  round(dbinom(0:25,25,.061),6)
  [1] 0.207319 0.336701 0.262476 0.130726 0.046708 0.012744
  [7] 0.002760 0.000487 0.71 0.09 0.01 0.00
 [13] 0.00 0.00 0.00 0.00 0.00 0.00
 [19] 0.00 0.00 0.00 0.00 0.00 0.00
 [25] 0.00 0.00

 In any case, you would be hard pressed to find a subset of 0:25 that has 
 the probability that SAS and your textbook claims as the p value.



  
 M.C.Grant

  
 2*sum(dbinom(c(10:25),25,0.061))

 [1] 1.987976e-06

  
 binom.test(10,25,0.061)

  
 Exact binomial test

  
 data:  10 and 25 
 number of successes = 10, number of trials = 25, p-value = 9.94e-07

 alternative hypothesis: true probability of success is not equal to
 0.061 
 95 percent confidence interval:

  0.2112548 0.6133465 
 sample estimates:

 probability of success 
0.4


  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 -- 
O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
 ~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] testing for bimodal distribution

2009-02-03 Thread Albyn Jones
dip {diptest} is Hartigan's dip test.

albyn

On Tue, Feb 03, 2009 at 05:42:34PM -0500, Andrew Yee wrote:
 I'm not sure where to begin with this, but I was wondering if someone could
 refer me to an R package that would test to see if a distribution fits a
 bimodal distribution better than a unimodal distribution.
 Thanks,
 Andrew
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Goodness of fit for gamma distributions

2009-01-29 Thread Albyn Jones

it is easy to make a qqplot for the gamma; suppose that the sample parameters
are 1.101 and 2.49, the data in x:

 plot(qgamma(ppoints(x),1.101,2.49),sort(x))

see also lattice:qqmath

albyn

Quoting Dan31415 d.m.mitch...@reading.ac.uk:



Ah yes, that does produce a nice plot. Can i just ask what exactly it is
showing. It seems to me to be a sort of Q-Q plot but with a different set of
axes. Is this correct, if so do the same interpretation rules apply for this
plot, i.e. departures from either end of the curve show poor fitting of the
extreme data.

thanks for your help Remko, its been very helpful.

Dann



Remko Duursma-2 wrote:


It sounds like you just want to graph it though. For gammas, it's nice
to graph the log of the density, because
the tail is so thin and long, so you don't see much otherwise:

mydata - rgamma(1, shape=1.1, rate=2.5)

# now suppose you fit a gamma distribution, and get these estimated
parameters:
shapeest - 1.101
rateest - 2.49

h - hist(mydata, breaks=50, plot=FALSE)
plot(h$mids, log(h$density))
curve(log(dgamma(x, shape=shapeest, rate=rateest)), add=TRUE)


#Remko


-
Remko Duursma
Post-Doctoral Fellow

Centre for Plant and Food Science
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908



On Wed, Jan 28, 2009 at 1:13 AM, Dan31415 d.m.mitch...@reading.ac.uk
wrote:


Thanks for that Remko, but im slightly confused because isnt this testing
the
goodness of fit of 2 slightly different gamma distributions, not of how
well
a gamma distribution is representing the data.

e.g.

data.vec-as.vector(data)

(do some mle to find the parameters of a gamma distribution for data.vec)

xrarea-seq(-2,9,0.05)
yrarea-dgamma(xrarea,shape=7.9862,rate=2.6621)

so now yrarea is the gamma distribution and i want to compare it with
data.vec to see how well it fits.

regards,
Dann


Remko Duursma-2 wrote:


Hi Dann,

there is probably a better way to do this, but this works anyway:

# your data
gamdat - rgamma(1, shape=1, rate=0.5)

# comparison to gamma:
gamsam - rgamma(1, shape=1, rate=0.6)

qqplot(gamsam,gamdat)
abline(0,1)


greetings
Remko


-
Remko Duursma
Post-Doctoral Fellow

Centre for Plant and Food Science
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908



On Tue, Jan 27, 2009 at 3:38 AM, Dan31415 d.m.mitch...@reading.ac.uk
wrote:


I'm looking for goodness of fit tests for gamma distributions with
large
data
sizes. I have a matrix with around 10,000 data values in it and i have
fitted a gamma distribution over a histogram of the data.

The problem is testing how well that distribution fits. Chi-squared
seems
to
be used more for discrete distributions and kolmogorov-smirnov seems
that
large sample sizes make it had to evaluate the D statistic. Also i
haven't
found a qq plot for gamma, although i think this might be an
appropriate
test.

in summary
-is there a gamma goodness of fit test that doesnt depend on the sample
size?
-is there a way of using qqplot for gamma distributions, if so how
would
you
calculate it from a matrix of data values?

regards,
Dann
--
View this message in context:
http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21668711.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
View this message in context:
http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21686095.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
View this message in context:  
http://www.nabble.com/Goodness-of-fit-for-gamma-distributions-tp21668711p21725468.html

Sent from the R help mailing list 

Re: [R] Precision in R

2009-01-14 Thread Albyn Jones

Yes, computing WB.%*%t(WB) may be the problem, by either method.

if the goal is to compute the inverse of WB%*%t(WB), you should  
consider computing the singular value or QR decomposition for the  
matrix WB.

If WB = Q%*%R, where Q is orthogonal, then WB %*% t(WB) =R %*%t(R),
and the inverse of WB%*%t(WB) is inv.t(R)%*% inv.R.

computing (WB) %*% t(WB) squares the condition number of the matrix.   
This is similar to the loss of precision that occurs when you compute  
the variance via mean(X^2)-mean(X)^2.


albyn


Quoting dos Reis, Marlon marlon.dosr...@agresearch.co.nz:


Dear All,
I'm preparing a simple algorithm for matrix multiplication for a
specific purpose, but I'm getting some unexpected results.
If anyone could give a clue, I would really appreciate.
Basically what I want to do is a simple matrix multiplication:
(WB) %*% t(WB).
The WB is in the disk so I compared to approaches:
-   Load 'WB' using 'read.table' (put it in WB.tmp) and then to the
simple matrix multiplication
WB.tmp%*%t(WB.tmp)

-   Scan each row of WB and do the cross products 'sum(WB.i*WB.i)'
and 'sum(WB.i*WB.j)', which proper arrangement leads to WBtWB.

Comparing these two matrices, I get the very similar values, however
when I tried their inverse, WBtWB leads to a singular system.
 I've tried different tests and my conclusion is that  my precision
problem is related to cross products 'sum(WB.i*WB.i)' and
'sum(WB.i*WB.j)'.
Does it makes sense?
Thanks,
Marlon



WB.tmp%*%t(WB.tmp)

   WB.i   WB.i   WB.i
WB.i 1916061939 2281366606  678696067
WB.i 2281366606 3098975504 1092911209
WB.i  678696067 1092911209  452399849


WBtWB

   [,1]   [,2]   [,3]
[1,] 1916061939 2281366606  678696067
[2,] 2281366606 3098975504 1092911209
[3,]  678696067 1092911209  452399849



WBtWB-WB.tmp%*%t(WB.tmp)

 WB.i  WB.i  WB.i
WB.i 2.861023e-06  4.768372e-07  4.768372e-07
WB.i 4.768372e-07  3.814697e-06 -2.622604e-06
WB.i 4.768372e-07 -2.622604e-06  5.960464e-08


solve(WB.tmp%*%t(WB.tmp))

  WB.i  WB.i   WB.i
WB.i -41692.80  58330.89  -78368.17
WB.i  58330.89 -81608.66  109642.09
WB.i -78368.17 109642.09 -147305.32


solve(WBtWB)

Error in solve.default(WBtWB) :
  system is computationally singular: reciprocal condition number =
2.17737e-17




 WB.tmp-NULL
 WBtWB-matrix(NA,n,n)
  for (i in 1:n)
  {
   setwd(Home.dir)
   WB.i-scan(WB.dat, skip = (i-1), nlines = 1)
   WB.tmp-rbind(WB.tmp,WB.i)
   WBtWB[i,i]-sum(WB.i*WB.i)
   if (in)
{
  for (j in (i+1):n)
   {
  setwd(Home.dir)
  WB.j-scan(WB.dat, skip = (j-1), nlines = 1)
  WBtWB[i,j]-sum(WB.i*WB.j)
  WBtWB[j,i]-sum(WB.i*WB.j)
   }
 }
   }


===
Attention: The information contained in this message and...{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to plot histogram plot and fitted distributions on the same graph

2009-01-08 Thread Albyn Jones
You are plotting the histogram in the frequency scale.  A quick look  
at the doc page for hist() would reveal the freq option:


  hist(x,freq=FALSE)

then you can add the densities with lines()

albyn

Quoting Xin Shi jasonshi...@hotmail.com:


Dear:



I am trying to plot the histogram graph for my observed data. Then plot
fitted distribution on the same graph of histogram plot in R.



1.  histogram plot y.
2.  based on 1, plotting y1 v. x;
3.  based on 1, plotting y2 v. x;
4.  based on 1, plotting y3 v. x;



All of these four plots must be on the same graph.



However, I found the difficulty is that the y-axis and x-axis for histogram
plot and fitted distribution plot are different.



For histogram plot, y presents the frequency and x presents events.



For fitted distribution plots, y presents the probability and x presents
another variable.



However, I found out I need histogram plot rather than barplot. This is
major problem of this work.



The code I used:



par(font=1,font.lab=10,font.axis=6)

pts18=barplot(y,
ylim=c(0,0.2),xlim=c(2,52),axes=FALSE,border=TRUE,names.arg=x,col=white)

axis(2,las=1)

lines(spline(pts18,y1,n=300,method=natural),type=l,lty=1)

lines(spline(pts18,y2,n=300,method=natural),type=l,lty=2)

lines(spline(pts18,y3,n=300,method=natural),type=l,lty=5)



The data are:



The observed data:



y-c(0.098441926, 0.166430595, 0.121813031, 0.104815864, 0.074362606,

0.075779037, 0.055949008, 0.040368272, 0.03470255, 0.029745042,

0.032577904, 0.02266289, 0.014872521, 0.014872521, 0.010623229,

0.01203966, 0.01203966, 0.008498584, 0.009206799, 0.009915014,

0.006373938, 0.003541076, 0.001416431, 0.001416431, 0.005665722,

0.002124646, 0.000708215, 0.001416431, 0.004249292, 0.002832861,

0.004957507, 0.002124646, 0.000708215, 0, 0.000708215, 0.002124646,

0.001416431, 0.001416431, 0.001416431, 0, 0.000708215)



Fitted distribution 1:



y1-c(0.03419162, 0.154201321, 0.129581481, 0.108892454, 0.091506645,

0.07689666, 0.064619311, 0.054302168, 0.045632264, 0.0383466,

0.032224168, 0.027079245, 0.022755763, 0.01912257, 0.016069453,

0.013503798, 0.01134, 0.009535987, 0.008013468, 0.006734034,

0.005658876, 0.004755378, 0.003996132, 0.003358108, 0.002821952,

0.002371398, 0.00199278, 0.001674612, 0.001407243, 0.001182562,

0.000993753, 0.00083509, 0.00070176, 0.000589716, 0.000495562,

0.00041644, 0.000349951, 0.000294078, 0.000247125, 0.000207669,

0.000174513)



Fitted distribution 2:



y2-c(0.078909441, 0.188048499, 0.117871979, 0.089827482, 0.072368317,

0.059928019, 0.050453301, 0.042948906, 0.036851702, 0.031809247,

0.027584779, 0.024010745, 0.020963795, 0.01835029, 0.016097393,

0.014147335, 0.012453559, 0.010978051, 0.009689433, 0.008561564,

0.007572497, 0.006703683, 0.005939358, 0.005266055, 0.00467,

0.004147912, 0.003684531, 0.003274633, 0.002911751, 0.00259025,

0.002305216, 0.002052353, 0.001827898, 0.001628552, 0.001451415,

0.001293939, 0.001153881, 0.001029262, 0.000918338, 0.000819567,

0.000731589)



Fitted distribution 3:



y3-c(0.09844545, 0.174856171, 0.1190666, 0.093021492, 0.075639902,

0.062740817, 0.052668044, 0.044568247, 0.037931599, 0.032423244,

0.027808545, 0.023915327, 0.020612892, 0.01779946, 0.015394205,

0.013331948, 0.011559483, 0.010032949, 0.008715898, 0.007577845,

0.006593146, 0.005740133, 0.005000424, 0.004358371, 0.003800615,

0.003315725, 0.002893892, 0.002526689, 0.002206859, 0.001928146,

0.001685148, 0.001473194, 0.001288243, 0.001126794, 0.00098581,

0.000862657, 0.000755047, 0.000660991, 0.000578759, 0.000506847,

0.000443945)



x- c(0, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,

60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115,

120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170,

175, 180, 185, 190, 200)



Many Thanks!



Xin




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drawing from an empirical distribution

2009-01-06 Thread Albyn Jones
the empirical distribution gives probability 1/n to each of n observations.
rather than sampling the unit interval, just resample the dataset.
If x is your dataset, and you want an independent sample of size k,

sample(x,size=k,replace=TRUE)

albyn

On Tue, Jan 06, 2009 at 02:39:17PM -0800, culpritNr1 wrote:
 
 Hi All,
 
 Does anybody know if there is a simple way to draw numbers from an empirical
 distribution?
 
 I know that I can plot the empirical cumulative distribution function this
 easy:
 plot(ecdf(x))
 
 Now I want to pick a number between 0 and 1 and go back to domain of x.
 Sounds simple to me.
 
 Any suggestion?
 
 Thank you,
 
 Your culprit
 (everybody needs a culprit)
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/Drawing-from-an-empirical-distribution-tp21320810p21320810.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] uniroot() problem

2008-12-30 Thread Albyn Jones
One can't tell for sure without seeing the function, but I'd guess  
that you have   a numerical issue.  Here is an example to reflect upon:



f=function(x) (exp(x)-exp(50))*(exp(x)+exp(50))
uniroot(f,c(0,100))

$root
[1] 49.7

$f.root
[1] -1.640646e+39

$iter
[1] 4

$estim.prec
[1] 6.103516e-05


.Machine$double.eps^0.25/2

[1] 6.103516e-05

uniroot thinks it has converged, at least in relative terms.  Note  
that the estimated precision is related to the machine epsilon, used  
in the default value for tol.  try fiddling with the tol argument.



uniroot(f,c(0,100),tol=1/10^12)

$root
[1] 50

$f.root
[1] 1.337393e+31

$iter
[1] 4

$estim.prec
[1] 5.186962e-13

albyn


Quoting megh megh700...@yahoo.com:



I have a strange problem with uniroot() function. Here is the result :


uniroot(th, c(-20, 20))

$root
[1] 4.216521e-05

$f.root
[1] 16.66423

$iter
[1] 27

$estim.prec
[1] 6.103516e-05

Pls forgive for not reproducing whole code, here my question is how f.root
can be 16.66423? As it is finding root of a function, it must be near Zero.
Am I missing something?

--
View this message in context:  
http://www.nabble.com/uniroot%28%29-problem-tp21227702p21227702.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing distributions

2008-12-19 Thread Albyn Jones
On Fri, Dec 19, 2008 at 05:44:47AM -0800, kdebusk wrote:
 I have data sets from three different sites. None of the data sets are
 normally distributed and can not be log-transformed to a normal
 distribution. I would like to compare the data sets to see if any of
 the sites are similar to each other, and would like something more
 powerful than a non-parametric wilcoxen test. Does anyone have any
 suggestions on more powerful ways of comparing non-normal data sets?
 If it helps, the data sets follow a Gamma distribution. Thanks for all
 input.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

If the data are gamma distributed, then a glm with family Gamma
should be a good option.

albyn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.