Re: [R] 0 x 0 matrix

2009-09-04 Thread Ted Harding
On 04-Sep-09 10:45:27, Markku Karhunen wrote:
 True. Should have read ?diag.
 
 However, this provokes a more general question: Is there some way I  
 can declare some scalar and _all its functions_ as matrices?
 
 For instance, I would like to
 
 A = as.matrix(0.98)
 B = function(A)
 C = diag(sqrt(B))
 
 so that all scalars are explicitly [1,1] matrices.
 BR, Markku

Hmmm, it might be a good idea to explain why you want to do this.
For instance:

  M - matrix(c(1,2,3,4),nrow=2)
  c - matrix(2,nrow=1)
  c%*%M
  # Error in c %*% M : non-conformable arguments
  c*M
  # Error in c * M : non-conformable arrays
  c+M
  # Error in c + M : non-conformable arrays

So what would you want to use the [1,1]-matrix scalars for, that
cannot be done just using them as numbers?

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Sep-09   Time: 14:51:52
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function for all 2^N subsets of N

2009-09-01 Thread Ted Harding
Greetings all!
I have been searching the Site for a function, say subsets,
such that for instance

  subsets(10)

would return a (say) matrix of indices to the 2^10 subsets of
N items -- perhaps in the form of 2^10 rows each of which is
10 entries each either TRUE or FALSE. Or 1 or 0. Or ...

I can of course write my own, using good old looping technology
or similar, but it would be good to find one which did it quick
and snappy, at the compiled level.

A Site Search in Function on all subsets didn't seem to yield
anything of the kind, which surprised me. Maybe I overlooked
something ...

(This is prompted by the recent OT discussion on HT vs. HH,
to which I want to respond later).

With thanks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-09   Time: 09:09:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for all 2^N subsets of N

2009-09-01 Thread Ted Harding

On 01-Sep-09 08:33:41, Gerrit Eichner wrote:
 Maybe
 
 expand.grid( rep( list( 0:1), 10))
 
 does what you want.
   Best regards  --  Gerrit

Thanks! That does seem to do the job. I hadn't thought of expand.grid().
Ted.

 On Tue, 1 Sep 2009, 
 ted.hard...@manchester.ac.uk wrote:
 
 Greetings all!
 I have been searching the Site for a function, say subsets,
 such that for instance

  subsets(10)

 would return a (say) matrix of indices to the 2^10 subsets of
 N items -- perhaps in the form of 2^10 rows each of which is
 10 entries each either TRUE or FALSE. Or 1 or 0. Or ...

 I can of course write my own, using good old looping technology
 or similar, but it would be good to find one which did it quick
 and snappy, at the compiled level.

 A Site Search in Function on all subsets didn't seem to yield
 anything of the kind, which surprised me. Maybe I overlooked
 something ...

 (This is prompted by the recent OT discussion on HT vs. HH,
 to which I want to respond later).

 With thanks,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 01-Sep-09   Time: 09:09:09
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-09   Time: 09:42:18
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Offtopic, HT vs. HH in coin flips

2009-09-01 Thread Ted Harding
On 31-Aug-09 19:16:33, Erik Iverson wrote:
 Dear R-help, 
 Could someone please try to explain this paradox to me? What is
 more likely to show up first in a string of coin tosses, Heads
 then Tails, or Heads then Heads?  
 
##generate 2500 strings of random coin flips
 ht - replicate(2500,
 paste(sample(c(H, T), 100, replace = TRUE),
   collapse = ))
 
## find first occurrence of HT
 mean(regexpr(HT, ht))+1#mean of HT position, 4
 
## find first occurrence of HH
 mean(regexpr(HH, ht))+1#mean of HH position, 6
 
 FYI, this is not homework, I have not been in school in years.
 I saw a similar problem posed in a blog post on the Revolutions R
 blog, and although I believe the answer, I'm having a hard time
 figuring out why this should be? 
 
 Thanks,
 Erik Iverson

Be very careful about the statement of the problem!

[1] The probability that HH will occur first (i.e. before HT)
is the same as the probability that HT will occur first (i.e.
before HH).

[2] However, the probability that the first occurrence of HT
will be on a given position of the H is generally not the same
as the probability that the first occurrence of HH will be on
the same position of the first H.

[1]: At the first occurrence of (either HH or HT), there is
an initial string S, ending in an H, followed by either an H
(for HH) or a T (for HT). Both are equally likely.

So the probability that the first occurrence of (either HH or HT)
is an HH is the same as the probability that it is an HT.

[2]: (A) the first occurrence of an HH is in a sequence of
any collection of H and T provided there is no HH in the
sequence, and the last is H, followed by H.
However, HT is allowed to occur in the sequence.

But (B) the first occurrence of an HT is in a sequence of
(zero or more T) followed by (1 or more H) followed by T.
This is the only pattern in which HT does not occur prior to
the final HT.
Similarly, HH is allowed to pccur in the sequence.

The reason that, in general, the probability of HH first occuring
at a given position is different from the probability if HT first
occurring at that position lies in the differences between the
number of possible sequences satisfying (A), and the number of
possible sequences satisfying (B).

The first few cases (HH or HT first occurring at (k+1), so
that the position of the first H in HH or HT is at k) are,
with their probabilities:

k=1:   HH HT
  1/41/4

K=2:  THH HHT
  THT
  1/8 2/8

k=3: TTHH HHHT
 HTHH THHT
  TTHT
 2/16 3/16

k=4:TTTHH T
THTHH THHHT
HTTHH TTHHT
  TTTHT
 3/32 4/32

The HT case is simple:
  P.HT[k] = Prob(1st HT at (k+1)) = k/(2^(k+1))
Exercise for the reader: Sum(P.HT) = 1

The HH case is more interesting. Experimental scribblings on
parer threw up an hypothesis, which I decided to explore in R.
Thanks to Gerrit Eichner for suggestion the use of expand.grid()!

  ## Function to count sequences giving 1st HH on throw k+1
  countHH - function(k){
M - as.matrix(expand.grid(rep(list(0:1),k)))
ix - (M[,k]==1) ## k must be an H (then k+1 will be H)
for(i in (1:(k-1))){ ix-ix( !((M[,i]==1)(M[,i+1]==1)) ) }
sum(ix)
## list(Count=sum(ix),Which=M[ix,])
  }

Now, ignoring the case k=1:

  HHcounts - NULL
  for(i in (2:12)){ HHcounts-c(HHcounts,countHH(i)) }
  rbind((3:13),HHcounts)

  # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
  #3456789   10   111213
  #HHcounts12358   13   21   34   5589   144

Lo and Behold, we have a Fibonnaci sequence! Another exercise for
the reader ...

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-09   Time: 10:38:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Google style

2009-09-01 Thread Ted Harding
On 01-Sep-09 10:25:53, Duncan Murdoch wrote:
 Jim Lemon wrote:
 Duncan Murdoch wrote:
 On 8/31/2009 11:50 AM, Mark Knecht wrote:
 On Mon, Aug 31, 2009 at 6:36 AM, Terry Therneauthern...@mayo.edu 
 wrote:
 SNIP
 The authors borrowed so much else from C, the semicolon would have
 been good too.
 
 Something I have thought myself.
   
 I know real R coders will chuckle
   
 I'd say cringe, rather than chuckle.  This is going to make you waste
 a lot of time some day, when you stare and stare at code like Terry's
 and can't figure out what's wrong with it:

 zed - function(x,y,z) {
x + y
  +z;
   }

 The value of the function is +z, not x+y+z, even though the C part of
 your brain made you type it that way and reads it as one statement in
 the body, not two.
 
 This is getting interesting. One habit I have developed in R to 
 emphasize a line continuation is to always write the above as:

 zed-function(x,y,z) {
  x+y+
  z
 }
   
 
 That's a good habit.  An alternative is to put parentheses around the 
 expression:
 
  (x + y
   + z)
 
 will work.
 The trailing operator signalling to me and the interpreter that
 there's more to come. A semicolon after the z would be innocuous.
 Now I know that this marks me as a crabby old fart who learned
 to program on Hollerith cards where there had to be firm
 conventions on when a line of code ended. Still, given the moiety
 of global warming attributable to endless discussions about how
 many spaces should be used for indentation, I think the use of
 the semicolon as a personal aid to interpretation is at worst a
 harmless affectation.
 
 I think it's worse.  To me, it's like putting in a comment that is 
 wrong, or writing code like this:
 
   one - 2
   x - x + one
 
 Code has meaning, it's not just a bunch of binary instructions to the 
 computer.  If the meaning and the look of the code clash, it is going
 to lead to problems.
 
 Duncan Murdoch

And surely that is precisely the point of Jim's use of ;!
It is, in effect, ignored by R; but to Jim it means This marks the
end of a command. Surely useful, and surely not in the same league
as a comment that is wrong. You may see it as noise, but then
you can filter it out.

As one COF to another, I have to say that Jim's posting took me
back to the early days of my own evolution. That was dandy!
(Dinosaurs are not dead yet).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-09   Time: 11:37:52
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrap inference for the sample median?

2009-08-31 Thread Ted Harding
On 30-Aug-09 16:24:08, Emmanuel Charpentier wrote:
 Le dimanche 30 août 2009 Ã_ 18:43 +0530, Ajay Shah a écrit :
 Folks,
 I have this code fragment:
   set.seed(1001)
   x - c(0.79211363702017, 0.940536712079832, 0.859757602692931,
  0.82529998629531, 0.973451006822,0.92378802164835,
  0.996679563355802,0.943347739494445, 0.992873542980045,
  0.870624707845108,0.935917364493788)
   range(x)
   # from 0.79 to 0.996
 
   e - function(x,d) {
 median(x[d])
   }
 
   b - boot(x, e, R=1000)
   boot.ci(b)
 
 The 95% confidence interval, as seen with `Normal' and `Basic'
 calculations, has an upper bound of 1.0028 and 1.0121.
 
 How is this possible? If I sample with replacement from values which
 are all lower than 1, then any sample median of these bootstrap
 samples should be lower than 1. The upper cutoff of the 95% confidence
 interval should also be below 1.
 
 Nope. Normal and Basic try to adjust (some form of) normal
 distribution to the sample of your statistic. But the median of such a
 small sample is quite far from a normal (hint : it is a discrete
 distribution with only very few possible values, at most as many value
 as the sample. Exercise : derive the distribution of median(x)...).
 
 To convince yourself, look at the histogram of the bootstrap
 distribution of median(x). Contrast with the bootstrap distribution of
 mean(x). Meditate. Conclude...
 HTH,
   Emmanuel Charpentier

Ajay,
You have not said why you are adopting the bootstrap approach.
Maybe you specifically want to study the behaviour of bootstrap
methods, in which case my response below is irrelevant. But if,
on the other hand, you are simply looking for a confidence interval
for the median, you might consider a non-paramatric approach.

This -- which does not depend on distributional assumptions about
the data -- is based on the fact that if med denotes the median
of the distribution of a continuous variables X,

  Prob(X  med) = P(X  med) = 1/2

Hence, if X[1], X[2], ... , X[n] denote the values of a random
sample of X-values in increasing order, then Prob(X[1]  med) = 1/2^n,
and for r  1:

  Prob( (X[r]med) )
  = Prob( (X[1]med) ) +
Prob( (X[1]med)(X[2]med) ) + ... +
Prob( (X[1]med)(X[2]med)...(X[r-1]med)(X[r]med) )

i.e. terms summed over all the r disjoint ways in which X[r]med
can occur. These terms are all straightforward binomial probabilities,
with p = 1/2, so

  Prob( (X[r]med) )
  = (1/2^n) + n*(1/2^n) + ... + choose(n,(r-1))*(1/2^n)
  = pbinom((r-1),n,1/2)

Hence if, for instance, you find the maximum value of r such that
for given n:

  Prob( (X[r]med)  = 0.025

then the probability is at least 0.975 that X[r]  med whatever
the value of med may be. Hence the element of the sorted sample
in the r-th position is a lower 97.5% one-sided confidence limit
for med. Because you are looking at the median, the situation
is symmetrical, i.e. Prob(X  med) = Prob(X  med) = 1/2, so a
corresponding 97.5% upper one-sided confidence limit for med is
the element of the sorted sample in the (n+1-r)-th position.
Hence the pair of these two limits is a 95% confidence interval
for the median.

Now, since your (admittedly small) sample size is n=11, you can
get at it as follows:

Ps - pbinom(q=(0:11), n=11, p=1/2)
round(rbind((0:11),Ps,rev(Ps)), 3)

[,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
   0 0.006 0.033 0.113 0.274 0.500 0.726 0.887 0.967 0.994 1.000 1
   1 1.000 0.994 0.967 0.887 0.726 0.500 0.274 0.113 0.033 0.006 0

Hence the maximum value of r from the first line is r=2, so the second
of 11 sorted observations is the lower limit, and the 10th is the
upper limit, for a 95% confidence interval. In the case of your data,

  x - c(0.79211363702017, 0.940536712079832, 0.859757602692931,
 0.82529998629531, 0.973451006822,0.92378802164835,
 0.996679563355802,0.943347739494445, 0.992873542980045,
 0.870624707845108,0.935917364493788)
  print(sort(x)[c(2,10)],15)
  # [1] 0.825299986295310 0.992873542980045

Note that this is a conservative confidence interval, in that it
is guaranteed that the confidence level

  Prob(LowerLimit  median  UpperLimit )  0.95

and it is the shortest such interval with symmetry. As it happens,
with n=11, this probability is

  1 - 2*pbinom(1,11,1/2) =  0.9882812

so it is very conservative (a consequence of small n).

You could get a less conservative interval by using an asymmetrical
interval, e.g. the 2nd and 9th, or the 3rd and 10th, when the
probability would be

  1 - pbinom(1,11,1/2) - pbinom(2,11,1/2) = 0.9614258

which is pretty close to the target confidence level while still
not being less than the target, but then you have to have a reason
for preferring one ([2,9]) to the other [(3,10)]! Or you could choose
one of them at random ...

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax

Re: [R] write file to date-stamped folder

2009-08-31 Thread Ted Harding
On 31-Aug-09 13:18:38, Henrique Dallazuanna wrote:
 Try this:
 write.table(test.table, file.path(outputDir, test.table.txt),
 sep=\t)

That may not work (depending on the platform OS) if the directory
'outputDir' does not already exist (it will not work on Linux).

In that case, first:

  system(paste(mkdir,outputDir))

Then Henrique's suggestion should work (untested), or simply the
bare-hands construction (tested!):

  write.table(test.table,
  paste(outputDir,/,test.table.txt,sep=),
  sep=\t)

Ted.

 On Mon, Aug 31, 2009 at 4:45 AM, suzylee c.me...@sms.ed.ac.uk wrote:
 

 Hello,

 I would like to be able to write all files produced on one day to an
 output
 directory with that date stamp, or alternatively stamp the date in the
 filename. So that if i run the same code the next day the files will
 not be
 overwritten.

 here's what i have to start with:
 baseDir = getwd()
 outputDir = paste(baseDir,/OutputData-, Sys.Date(),sep=)

 and lets say i want to write the table test.table to a file:

 write.table(test.table, test.table.txt, sep=\t)

 How do i make this write to outputDir?

 Thanks.
 --
 View this message in context:
 http://www.nabble.com/write-file-to-date-stamped-folder-tp25219504p2521
 9504.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O
 
   [[alternative HTML version deleted]]
 


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-09   Time: 14:42:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Google's R Style Guide

2009-08-29 Thread Ted Harding
On 29-Aug-09 17:51:54, diegol wrote:
 Max Kuhn wrote:
 Perhaps this is obvious, but Ive never understood why this is the
 general convention:
 
 An opening curly brace should never go on its own line;
 
 I tend to do this:
 
 f - function()
 {
   if (TRUE)
 {
   cat(TRUE!!\n)
 } else {
   cat(FALSE!!\n)
 }
 }
 
 I favor your approach. BUT I add one more level of indentation.
 Your function would look like:
 
 f - function()
   {
 if (TRUE)
   {
 cat(TRUE!!\n)
   } else {
 cat(FALSE!!\n)
   }
   }
 
 This way I quickly identify the beginning of the function, which is
 the one line at the top of the expression AND sticking to the left
 margin.
 In your code you use this same indentation in the if/else construct.
 I find it also useful for the function itself.

When I want to rely on indentation and vertical alignments to keep
track of program structure, I would tend to write the above like

  f -
  function()
  { if (TRUE)
{
  cat(TRUE!!\n) 
} else
{
  cat(FALSE!!\n)
}
  }

so that an opening { is aligned with the keyword it is associated
with, and then at the end of the block so also is the closing }.

However, in this case (if I keep all the {...} for the sake of
structure) I would also tend to save on lines with

  f -
  function()
  { if (TRUE)
{ cat(TRUE!!\n)  } else
{ cat(FALSE!!\n) }
  }

which is still clear enough for me. This probably breaks most
guidelines! But in practice it depends on what it is, and on
how readily I find I can read it.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Aug-09   Time: 19:26:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] breaking multi-modal histograms down into combinations o

2009-08-28 Thread Ted Harding
On 28-Aug-09 10:34:46, Thomas Groen wrote:
 Dear All,
 
 Does anybody know if there is a functionality in R to break histograms
 that show a clear bi-modal (or multi-modal) distribution into a series
 of unimodal histograms that added up result in the original histogram?
 I was thinking of using QQ-plots (for which tools are available in R),
 and then observing the number of times the observed quantiles cross
 the 1:1 line, but this only gives an indication of  how many peaks
 the current histogram has. Therefore I was wondering whether other
 approaches exist.
 
 Thanks in advance for any suggestions.
 
 p.s. also thanks to those who helped me on my previous question on
 Modelling different combinations of explanatory variables. The leaps
 package and the regsubsets command worked really well!

There are a number of points of information which would help us to
be more specific about suggestions.

1: Do you have the raw data from which the histogram was constructed?
Decomposition of a multimodal sample into constituent unimodal
components is best done by adopting a generic distirbution type
(e.g. Normal) for each component, and then estimating the paramaters
of each component from the data. There is more information (and
there better estimation) in the raw data than in the histogram.

2: Do you have a preferred generic distribution type (e.g. Normal)
for the component distributions?
(If not, and you don't care what distribution you adopt, then
what is to stop you drawing arbitary dividing lines between the
peaks, and asserting that what lies between two consecutive
divisions is one component of the mixture? Then you would end up
with a set of disjoint histograms, one for each component, chosen
in a somewhat arbitrary way. Since you presumably don't intend
that to happen, you presumably have reasons why it should not
happen which would amount to a preference for generic distribution
type).

Once the generic type is chosen, a specific method is indicated.
For example, do an R Site Search on normal mixture in Functions
at:

  http://finzi.psych.upenn.edu/nmz.html

You may want to look at

http://finzi.psych.upenn.edu/R/library/mclust/html/00Index.html
(Model-Based Clustering / Normal Mixture Modeling).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Aug-09   Time: 12:12:36
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Google's R Style Guide

2009-08-28 Thread Ted Harding
On 28-Aug-09 12:59:24, Esmail wrote:
 Perhaps most of you have already seen this?
   
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
 
 Comments/Critiques?
 
 Thanks,
 Esmail
 
 ps: Reminds me of PEP 8 for Python
 
  http://www.python.org/dev/peps/pep-0008/
 
 Maybe not that surprising since Python is also one of the main
 languages used by Google.

I think it is grossly over-prescriptive. For example:
  function names have initial capital letters and no dots
is violated throughout R itself.

Ted.



E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Aug-09   Time: 14:21:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Statistical question about logistic regression simulatio

2009-08-26 Thread Ted Harding
On 26-Aug-09 14:17:40, Denis Aydin wrote:
 Hi R help list
 I'm simulating logistic regression data with a specified odds ratio 
 (beta) and have a problem/unexpected behaviour that occurs.
 
 The datasets includes a lognormal exposure and diseased and healthy 
 subjects.
 
 Here is my loop:
 
 ors - vector()
 for(i in 1:200){
 
# First, I create a vector with a lognormally distributed exposure:
 n - 1 # number of study subjects
 mean - 6
 sd - 1
 expo - rlnorm(n, mean, sd)
 
# Then I assign each study subject a probability of disease with a
# specified Odds ratio (or beta coefficient) according to a logistic
# model:
 inter - 0.01 # intercept
 or - log(1.5) # an odds ratio of 1.5 or a beta of ln(1.5)
 p - exp(inter + or * expo)/(1 + exp(inter + or * expo))
 
# Then I use the probability to decide who is having the disease and who
# is not:
 disease - rbinom(length(p), 1, p) # 1 = disease, 0 = healthy
 
# Then I calculate the logistic regression and extract the odds ratio
 model - glm(disease ~ expo, family = binomial)
 ors[i] - exp(summary(model)$coef[2]) # exponentiated beta = OR
 }
 
 Now to my questions:
 
 1. I was expecting the mean of the odds ratios over all simulations to 
 be close to the specified one (1.5 in this case). This is not the case 
 if the mean of the lognormal distribution is, say 6.
 If I reduce the mean of the exposure distribution to say 3, the mean of
 the simulated ORs is very close to the specified one. So the simulation
 seems to be quite sensitive to the parameters of the exposure
 distribution.
 
 2. Is it somehow possible to stabilize the simulation so that it is 
 not that sensitive to the parameters of the lognormal exposure 
 distribution? I can't make up the parameters of the exposure 
 distribution, they are estimations from real data.
 
 3. Are there general flaws or errors in my approach?
 
 
 Thanks a lot for any help on this!
 
 All the best,
 Denis
 
 -- 
 Denis Aydin

You need to look at the probabilities 'p' being generated by your code.

Taking first your case mean - 6 (and sorting 'expo', and using
whatever seed my system had current at the time):

  n - 1 # number of study subjects
  mean - 6
  sd - 1
  expo - sort(rlnorm(n, mean, sd))
  p - exp(inter + or * expo)/(1 + exp(inter + or * expo))

  p[1:20]
  #  [1] 0.9763438 0.9918962 0.9924002 0.9965314 0.9980887 0.9984698
  #  [7] 0.9993116 0.9993167 0.9994007 0.9994243 0.9996288 0.9997037
  # [13] 0.9998728 0.9998832 0.284 0.346 0.446 0.528
  # [19] 0.561 0.645

so that almost all of your 'p's are very close to 1.0, which means
that almost all or even all) of your responses will be 1. Indeed,
continuiung from the above:

  disease - rbinom(length(p), 1, p)
  disease[1:20]
  #  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  sum(disease)
  # [1] NaN
  sum(is.nan(disease))
  # [1] 710

What has happened here is that the higher values of 'expo' are so
large (in the 1000s) that the calculation of 'p' gives NA, because
the value of exp(inter + or * expo) is +Inf, so the calculation of
'p' is in terms of (+Inf)/(+Inf), which is NA.

Now compare with what happens when mean - 3:

  mean - 3
  sd - 1
  expo - sort(rlnorm(n, mean, sd))
  p - exp(inter + or * expo)/(1 + exp(inter + or * expo))
  p[1:20]
  #  [1] 0.5514112 0.5543155 0.5702318 0.5830025 0.5885994 0.5889078
  #  [7] 0.5908860 0.6004657 0.6029123 0.6042805 0.6048688 0.6122290
  # [13] 0.6123407 0.6135233 0.6137499 0.6139299 0.6153900 0.6181017
  # [19] 0.6184093 0.6203757
  sum(is.na(p))
  #  [1] 0
  max(expo)
  #  [1] 728.0519

So now no NAs (max(expo), though large, is now not large enough to
make the calculation of 'p' yield NA).

These smaller probabilities are now well away from 1.0, so a good mix
of 0 and 1 responses can be expected, although a good number of
the 'p's will still be very close to 1 or will be set equal to 1.

  disease - rbinom(length(p), 1, p)
  disease[1:20]
  #  [1] 0 1 0 0 1 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1

(as expected), and

  sum(disease)
  # [1] 9740

As well as the problem with p = NA --- disease = NaN, when you
have all the probabiltiies close to 1 and (as above) get all the
'disease' outcomes = 1, the resulting attempt to fit the glm will
yield nonsense.

In summary: do not use silly paramater values for the model you
are simulating. It will almost always not work (for reasons
illustrated above), and even if it appreas to work the result
will be highly unreliable. If in doubt, have a look at what you
are getting, along the line, as illustrated above!

The above reasons almost certainly underlie your finding that the
mean of simulated OR estimates is markedly different from the value
which you set when you run the case mean - 6, and the much
better finding when you run the case mean - 3.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 26-Aug-09

Re: [R] robust method to obtain a correlation coeff?

2009-08-24 Thread Ted Harding
On 24-Aug-09 14:47:02, Christian Meesters wrote:
 Hi,
 Being a R-newbie I am wondering how to calculate a correlation
 coefficient (preferably with an associated p-value) for data like:
 
 d[,1]
  [1] 25.5 25.3 25.1   NA 23.3 21.5 23.8 23.2 24.2 22.7 27.6 24.2 ...
 d[,2]
 [1]  0.0 11.1  0.0   NA  0.0 10.1 10.6  9.5  0.0 57.9  0.0  0.0  ...
 
 Apparently corr(d) from the boot-library fails with NAs in the data,

Yes, apparently corr() has no option for dealing with NAs.

 also cor.test cannot cope with a different number of NAs.

On the other hand, cor.test() does have an option na.action
which, by default, is the same as what is in getOption(na.action).

In my R installation, this, by default, is na.omit. This has the
effect that, for any pair in (x,y) where at least one of the pair
is NA, that pair will be omitted from the calculation. For example,
basing two vectors x,y on your data above, and a third z which is y
with an extra NA:

  x-c(25.5,25.3,25.1,NA,23.3,21.5,23.8,23.2,24.2,22.7,27.6,24.2)
  y-c( 0.0,11.1, 0.0,NA, 0.0,10.1,10.6, 9.5, 0.0,57.9, 0.0, 0.0)
  z-y; z[8]-NA

I get
  cor.test(x,y)
  # Pearson's product-moment correlation
  # data:  x and y 
  # t = -1.3986, df = 9, p-value = 0.1954
  # alternative hypothesis: true correlation is not equal to 0 
  # 95 percent confidence interval:
  #  -0.8156678  0.2375438 
  # sample estimates:
  #   cor 
  # -0.422542 
  # cor.test(x,z)
  # Pearson's product-moment correlation
  # data:  x and z 
  # t = -1.3466, df = 8, p-value = 0.215
  # alternative hypothesis: true correlation is not equal to 0 
  # 95 percent confidence interval:
  #  -0.8338184  0.2738824 
  # sample estimates:
  #cor 
  # -0.4298726 

So it has worked in both cases (see the difference in 'df'), despite
the different numbers of NAs in x and z.

For functions such as corr() which do not have provision for omitting
NAs, you can fix it up for yourself before calling the function.
In the case of your two series d[,1], d[,2] you could use an index
variable to select cases:

  ix - (!is.na(d[,1]))(!is.na(d[,2]))
  corr(d[ix,])

With my variables x,y,z I get

  ix.1 - (!is.na(x))(!is.na(y))
  ix.2 - (!is.na(x))(!is.na(z))
  d.1  -cbind(x,y)
  corr(d.1[ix.1,])
  # [1] -0.422542  ## (and -0.422542 from cor.test above as well)
  d.2  - cbind(x,z)
  corr(d.2[ix.2,])
  # [1] -0.4298726 ## (and -0.4298726 from cor.test above as well)

Hoping this helps,
Ted.

 Is there a
 solution to this problem (calculating a correlation coefficient and
 ignoring different number of NAs), e.g. Pearson's corr coeff?
 
 If so, please point me to the relevant piece of documentation.
 
 TIA
 Christian
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Aug-09   Time: 16:26:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unix like commands in R?

2009-08-24 Thread Ted Harding
On 24-Aug-09 21:56:06, zrl wrote:
 Dear List:
 I am trying to find a command in R which is like the unix command
 less or more to show the data in a object of R.
 did anyone can help me on this?
 
 Is there a collection of such unix-like commands in R?
 
 Thanks.
 
 -ZRL

There is a page() command in R -- see '?page'. From your query I take
it you are in fact using either Linux or Unix (or possible a Mac BSD
type OS).

Have a look at  options(pager)  to see what is currently uses.

I have modified my R setup so that it uses 'less' (on Linux) as
the pager program.

I did this in my .Rprofile (in home directory) by putting in the
lines:


.xthelp - function() {
tdir - tempdir()
pgr - paste(tdir, /pgr, sep=)
con - file(pgr, w)
cat(#! /bin/bash\n, file=con)
cat(export HLPFIL=`mktemp , tdir, /R_hlp.XX`\n,
sep=, file=con)
cat(cat  $HLPFIL\nxterm -e less $HLPFIL \n, file=con)
close(con)
system(paste(chmod 755 , pgr, sep=))
options(pager=pgr)
}
.xthelp()
rm(.xthelp)


This opens anything which might be paged in a separate xterm,
so thatfor instance, all responses to ?. come up in a new
xterm being paged by 'less'. Likewise, if you page a large object
(such as a matrix M with 500 rows), using 'page(M), you will
again see the result paged in 'less' in a separate window.

This code was suggested by Roger Bivand in response to a query
of mine back in 2003. The URL is

  http://finzi.psych.upenn.edu/R/Rhelp02/archive/21642.html

This was an improvement on a previous solution of my own, which
is also quoted in the above URL.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Aug-09   Time: 23:32:18
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A matrix calculation

2009-08-23 Thread Ted Harding
The problem with David's proposal is revealed by:

  mat[7:1,]
  #  [,1] [,2] [,3]
  # [1,]7   14   21
  # [2,]6   13   20
  # [3,]5   12   19
  # [4,]4   11   18
  # [5,]3   10   17
  # [6,]29   16
  # [7,]18   15

which simply reverses the rows. Then:

  c(mat[7:1,])
  # [1]  7  6  5  4  3  2  1 14 13 12 11 10  9  8 21 20 19 18 17 16 15

since a matrix is stored by columns -- i.e. as a vector with its
elements ordered down each column, per col1 then col2 then ...

And the following won't work either:

  mat[,1:3]
  #  [,1] [,2] [,3]
  # [1,]18   15
  # [2,]29   16
  # [3,]3   10   17
  # [4,]4   11   18
  # [5,]5   12   19
  # [6,]6   13   20
  # [7,]7   14   21

which is simply the original matrix, and hence:

  c(mat[,1:3])
  # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21

for the same reson as before. What you need to do is based on the
following:

  t(mat[7:1,])
  #  [,1] [,2] [,3] [,4] [,5] [,6] [,7]
  # [1,]7654321
  # [2,]   14   13   12   11   1098
  # [3,]   21   20   19   18   17   16   15

which now has the elements (down the columns) in the order you want.
So:

  c(t(mat[7:1,]))
  # [1]  7 14 21  6 13 20  5 12 19  4 11 18  3 10 17  2  9 16  1  8 15

As desired.
Ted.


On 23-Aug-09 18:53:38, Bogaso wrote:
 
 No no, I actually want following result :
 
 7,   14,   21, 6,   13,   20, 5,   12,   19,
 
 
 
 
 David Winsemius wrote:
 
 
 On Aug 23, 2009, at 2:37 PM, Bogaso wrote:
 

 I have suppose a matrix like that

 mat - matrix(1:21, 7)
 mat
 [,1] [,2] [,3]
 [1,]18   15
 [2,]29   16
 [3,]3   10   17
 [4,]4   11   18
 [5,]5   12   19
 [6,]6   13   20
 [7,]7   14   21

 From this matrix, I want to create a vector like tha :

 c(mat[7,], mat[6,], mat[5,], ., mat[1,])

 Can anyone please guide me, how to do that?
 
   c( mat[7:1,] )
 
 # [1]  7  6  5  4  3  2  1 14 13 12 11 10  9  8 21 20 19 18 17 16 15
 
 -- 
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 View this message in context:
 http://www.nabble.com/A-matrix-calculation-tp25106048p25106224.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-09   Time: 20:17:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on comparing two matrices

2009-08-22 Thread Ted Harding
Steve,
I don't know for sure whether this will help to solve your problem,
but you may be interested to read about the algorithm devised by
David Kendall for sorting 0-1 matrices, as described in

  Incidence matrices, interval graphs and seriation in archeology.
Pacific J. Math. Volume 28, Number 3 (1969), 565-570

which is available on Open Access at Project Euclid:

http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?
view=bodyid=pdf_1handle=euclid.pjm/1102983306

[also = http://tinyurl.com/mw2oap ]



On 22-Aug-09 19:38:47, Steve Lianoglou wrote:
 On Sat, Aug 22, 2009 at 2:45 PM, Michael Kogan michael.ko...@gmx.net
 wrote:
 

 1. Sort the rows after the row sums (greater sums first).
 2. Sort the columns after the first column (columns with ones in the
 first row go left, columns with zeros go right).
 3. Save the left part (all columns with ones in the first row) and
 the right part in separate matrices.
 4. Repeat steps 2 and 3 with both of the created matrices (now taking
 the second row for sorting), repeat until all fragments consist of a
 single column.
 5. Compose the columns to a sorted matrix.
 
 If you want to go about this by implementing the algo you described, I
 think you'd be best suited via some divide-and-conquer/recursion
 route:
 
 Starting from step 2, that is.
 
 -steve
 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-09   Time: 22:24:49
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Special LS estimation problem

2009-08-21 Thread Ted Harding
On 21-Aug-09 16:28:26, megh wrote:
 Hi, I have following kind of model : Y = X1 * a1 + X2 * a2 + error
 
 Here sampled data for Y, X1, X2 are like that :
 
 Y - replicate(10, matrix(rnorm(2),2), simplify = F)
 X1 - replicate(10, matrix(rnorm(4),2), simplify = F)
 X2 - replicate(10, matrix(rnorm(4),2), simplify = F)
 
 My goal is to calculate LS estimates of vectors a1 and a2.
 Can anyone please guide me how to do that in R? Is there any
 special function to handle this kind of problem?
 
 Thanks

In your example, a1 and a2 cannot be vectors, since Y, X1 and X2
are all 2x2 matrices. On the other hand, either or both of a1, a2
could be either a scalar or a 2x2 matrix, any of which would make
X1*a1 + X2*a2 (provided you interpret * as %*% in R) into a
2x2 matrix. But if (say) a1 is a vector it can only be a 2x1
vector (for multiplication conformity), and then X1*a1 (or X1%*%a1
in R) would be a 2x1 vector (in the linear algebra sense, i.e.
a 2x1 matrix in the R sense).

So what is it that you want in the model?

Next, if in fact a1 and a2 are scalars, then in effect you are
looking at

  Y1[1,1] = a1*X1[1,1] + a2*X2[1,1] + error[1,1]
  Y1[1,2] = a1*X1[1,2] + a2*X2[1,2] + error[1,2]
  Y1[2,1] = a1*X1[2,1] + a2*X2[2,1] + error[2,1]
  Y1[2,2] = a1*X1[2,2] + a2*X2[2,2] + error[2,2]

10 times over. The relevant question here is: Are you wanting
to include possible covariance between the 4 elements of 'error'?

One possibility in this case is to treat this as a multilevel
or longitudinal model, with 4 observations per subject, and
correlated error within-subject.

But you really need to spell out more detail of the kind of
model you are seeking to fit. What you described is not enough!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-Aug-09   Time: 18:48:31
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Special LS estimation problem

2009-08-21 Thread Ted Harding
On 21-Aug-09 18:12:45, megh wrote:
 
 Y is 2 dimensional vector, X1 and X2 are 2x2 matrices. 

Sorry -- I mis-read your code! I will think about your problem.
Apologies.
Ted.

 Ted.Harding-2 wrote:
 
 On 21-Aug-09 16:28:26, megh wrote:
 Hi, I have following kind of model : Y = X1 * a1 + X2 * a2 + error
 
 Here sampled data for Y, X1, X2 are like that :
 
 Y - replicate(10, matrix(rnorm(2),2), simplify = F)
 X1 - replicate(10, matrix(rnorm(4),2), simplify = F)
 X2 - replicate(10, matrix(rnorm(4),2), simplify = F)
 
 My goal is to calculate LS estimates of vectors a1 and a2.
 Can anyone please guide me how to do that in R? Is there any
 special function to handle this kind of problem?
 
 Thanks
 
 In your example, a1 and a2 cannot be vectors, since Y, X1 and X2
 are all 2x2 matrices. On the other hand, either or both of a1, a2
 could be either a scalar or a 2x2 matrix, any of which would make
 X1*a1 + X2*a2 (provided you interpret * as %*% in R) into a
 2x2 matrix. But if (say) a1 is a vector it can only be a 2x1
 vector (for multiplication conformity), and then X1*a1 (or X1%*%a1
 in R) would be a 2x1 vector (in the linear algebra sense, i.e.
 a 2x1 matrix in the R sense).
 
 So what is it that you want in the model?
 
 Next, if in fact a1 and a2 are scalars, then in effect you are
 looking at
 
   Y1[1,1] = a1*X1[1,1] + a2*X2[1,1] + error[1,1]
   Y1[1,2] = a1*X1[1,2] + a2*X2[1,2] + error[1,2]
   Y1[2,1] = a1*X1[2,1] + a2*X2[2,1] + error[2,1]
   Y1[2,2] = a1*X1[2,2] + a2*X2[2,2] + error[2,2]
 
 10 times over. The relevant question here is: Are you wanting
 to include possible covariance between the 4 elements of 'error'?
 
 One possibility in this case is to treat this as a multilevel
 or longitudinal model, with 4 observations per subject, and
 correlated error within-subject.
 
 But you really need to spell out more detail of the kind of
 model you are seeking to fit. What you described is not enough!
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 21-Aug-09   Time: 18:48:31
 -- XFMail --
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 View this message in context:
 http://www.nabble.com/%22Special%22-LS-estimation-problem-tp25083103p250
 84866.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-Aug-09   Time: 18:59:54
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PowerCut Killed R - is my code retrievable?

2009-08-19 Thread Ted Harding
On 19-Aug-09 22:35:01, Barry Rowlingson wrote:
 On Wed, Aug 19, 2009 at 10:57 PM, Polwart Calum (County Durham and
 Darlington NHS Foundation Trust)calum.polw...@nhs.net wrote:
 I've been tweaking code for several days on and off in R, cut and
 pasting in from a text editor (I just leave them open all the time).
 _I think I got something that was usable but then a powersurge tripped
 the fuses and unfortunately the machine I was working on doesn't have
 a UPS.
 
  So you were just using the text editor as a scratch pad, and not
 saving it? A half-decent text editor should be saving things to disk
 as you go. For example, in emacs, if your emacs dies while editing a
 file then when you restart it, it will notice and offer to restore it
 from its restore file. If you were using emacs you might have files
 like #foo.R# which are emacs' auto-restore files. Other editors might
 do other things - possibly leaving files in /tmp on a Linux system
 (but they might get zapped by a reboot).

To follow up on what Barry wrote above: Don't be put off by the
recommendation to use EMACS (which is not to everyone's taste).
I use vim, which does much the same sort of thing: there is a hidden
.whatever.swp file which contains a back-trackable history of
the editing changes that have been made to the file. So, if you get
a crash, on re-opening the file in vim you are asked if you want to
recover. The .swp file is *not* zapped by a reboot!

But, in any case, it is very wise to frequently and deliberately
save the current state of your file. In vim's command mode, all you
need to do is to type :w and it is saved. And to get into command
mode (or make sure that you are in it), just press ESC.
So ESC:w does it. Takes 0.5 seconds.

Again, what I routinely do (in Linux) when developing R code is to
have two terminal windows open. In one I am running R. In the other,
beside it, I am editing a file of R code. To run code in R that has
been entered in the code window, I just highlight it with the
mouse in the code window, and then paste it into the R window.

This method also has the advantage that it is easy to scroll back
to code entered earlier, and either paste it into the R window,
or copy it down in the code window if you want to modify it.
(The code window does not have to be a .R file to be sourced
by R -- though it may later become one. Its function is to preserve
a trace of what you have done, and make it easy to modify things
until you get what you want).

Ted.

 Also - is there a better way for the future? _I know some people use
 IDE's but is that for serious programming or for building a small
 function and tweaking it?
 
  Tip #1 is save your text file scratchpads!
  Tip #2 is save your R session regularly (just do 'save.image()' and
 it will save your current R session objects in a .RData file)
  Tip #3 is you could use emacs + ESS as an IDE - you run R within
 emacs, giving you cut n paste of code, syntax highlighting, session
 transcripts, and emacs' protection from data loss on a crash!
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 20-Aug-09   Time: 00:21:42
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding logical vectors

2009-08-13 Thread Ted Harding
On 13-Aug-09 19:21:36, David Huffer wrote:
 When adding several logical vectors I expect each vector will be
 coerced to integers and these vectors will then be added.
 
 That doesn't always seem to be the case.
 
 For example:
 
( f1 - as.factor ( sample ( x , 25 , rep = T ) ) )
[1] x x x x x x x x x x x x x x x x x x x x x x x x x
   Levels: x
( f2 - as.factor ( sample ( y , 25 , rep = T ) ) )
[1] y y y y y y y y y y y y y y y y y y y y y y y y y
   Levels: y
( f3 - as.factor ( sample ( z , 25 , rep = T ) ) )
[1] z z z z z z z z z z z z z z z z z z z z z z z z z
   Levels: z
   
is.na ( f1 [ sample ( 1:25 , 4 ) ] ) -
   + is.na ( f2 [ sample ( 1:25 , 4 ) ] ) -
   + is.na ( f3 [ sample ( 1:25 , 4 ) ] ) - TRUE
   
##  this returns a numeric vector:
   
is.na ( f1 ) + is.na ( f2 ) + is.na ( f3 )
[1] 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 1 2 2 0 1 0 1
   
##  but this returns a logical vector
   
!is.na ( f1 ) + !is.na ( f2 ) + !is.na ( f3 )
[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
[9]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
   [17] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
   [25] FALSE
   
 
 Can someone please explain why the returned value is a logical
 vector when I use the not operator but a numeric vector when I
 don't.
 
 What is special about the !is.na? it returns an object of class
 logical just like the is.na function:
 
all.equal ( class ( !is.na ( f1 ) ) , class ( is.na ( f1 ) ) )
   [1] TRUE
   
 
 Thanks!

I think you need to compare:
[1]
  is.na ( f1 ) + !is.na ( f2 )
  # [1] 2 0 1 1 2 1 0 1 1 
  #[10] 1 0 2 1 1 0 1 1 2
  #[19] 1 1 1 1 1 1 1

with
[2]
  !is.na ( f1 ) + !is.na ( f2 )
  # [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
  #[10] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
  #[19] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

and with
[3]
  (!is.na ( f1 )) + (!is.na ( f2 ))
  # [1] 1 1 2 2 1 2 1 2 2 2 1 1 2 2 1 2 2 1 2 2 2 2 2 2 2

In other words, I think you have been trapped by Precedence:
see '?Syntax'. What seems to happen is that

  is.na ( f1 ) + !is.na ( f2 )

evalutes is.na(f1) as a logical vector, !is.na(f2) as a logical
vector, and adds them getting a numerical result. See [1].

On the other hand, apparently

  !is.na ( f1 ) + !is.na ( f2 )

negates the result of [1] (compare the outputs of [1] and [2])
and hence produces a logical vector (because of the first !).
In other words, the first ! is applied to the result of
is.na ( f1 ) + !is.na ( f2 ).

In the form given in [3], the parentheses ensure that the logical
negations ! are applied before the + is applied, so two logical
vectors are added, with a numerical result.

(I hope I've got this right)!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-Aug-09   Time: 20:50:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm coefficients output confusing

2009-08-13 Thread Ted Harding
 and the base level treatment.
Alternativve systems of cintrasts give different representations.

 If i can get this to work correctly
 can I use the p value to determine which of the hours is significantly
 different to the others - so in this example hour 5 is significantly
 different? Or is it just a case of using the p value from the
anova
 to determine that there is a significant difference between hours
 (in this case) and use a plot to determine which hour(s) are likely
 to be the cause?

Don't trust the individual P-values in the first instance, since with
several levels somce are bound to have more extreme P-values than
others.

However, your anova() test is an overall test for whether one or more
of the means jointly depart significantly from the Null Hypothesis
that they are all equal. Here, your anova() has given a significant
result, so the hour means are not all equal. *Now* you can look at
the individual P-values to see where this may be coming from.
In the list of coefficients, there is only one candidate for
being different from Intercept, and this is Hour 5. Indeed the
P-value for HR5 and the P-value for the anova() are nearly equal
(the latter being slightly larger, allowing for the fact that it
is 1 out of 7).

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-Aug-09   Time: 22:53:22
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random sampling while keeping distribution of nearest ne

2009-08-12 Thread Ted Harding
On 12-Aug-09 22:05:24, Emmanuel Levy wrote:
 Dear All,
 I cannot find a solution to the following problem although I imagine
 that it is a classic, hence my email.
 
 I have a vector V of X values comprised between 1 and N.
 
 I would like to get random samples of X values also comprised between
 1 and N, but the important point is:
 * I would like to keep the same distribution of distances between the X
 values *
 
 For example let's say N=10 and I have V = c(3,4,5,6)
 then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or
 4,5,6,7 etc..
 so that the distribution of distances (3 - 4, 3 -5, 3 - 6, 4 -
 5, 4 - 6 etc ...) is kept constant.
 
 I couldn't find a package that help me with this, but it looks like it
 should be a classic problem so there should be something!
 
 Many thanks in advance for any help or hint you could provide,
 All the best,
 Emmanuel

If I've understood you right, you are basically putting a sequence
with given spacings in a random position amongst the available
positions. In your example, you would randomly choose between
1,2,3,4/2,3,4,5/3,4,5,6/4,5,6,7/5,6,7,8/6,7,8,9/7,8,9,10/

Hence a result Y could be:

  A - min(V)
  L - max(V) - A + 1
  M - (0:(N-L))
  Y - 1 + (V-A) + sample(M,1)

I think this does it!


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 12-Aug-09   Time: 23:49:22
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix power

2009-08-10 Thread Ted Harding
On 10-Aug-09 21:31:30, cindy Guo wrote:
 Hi, All,
 If I  have a symmetric matrix, how can I get the negative square root
 of the matrx, ie. X^(-1/2) ?
 
 Thanks,
 
 Cindy

  X - matrix(c(2,1,1,2),nrow=2)
  X
#  [,1] [,2]
# [1,]21
# [2,]12

  E - eigen(X)
  V - E$values
  Q - E$vectors
  Y - Q%*%diag(1/sqrt(V))%*%t(Q)
  Y
#[,1]   [,2]
# [1,]  0.7886751 -0.2113249
# [2,] -0.2113249  0.7886751

  solve(Y%*%Y)## i.e. find its inverse
#  [,1] [,2]
# [1,]21
# [2,]12

Hence (Y%*%Y)^(-1) = X, or Y = X^(-1/2)

Hopingb this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Aug-09   Time: 22:53:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix power

2009-08-10 Thread Ted Harding
On 10-Aug-09 22:36:03, cindy Guo wrote:
 Hi, Ted,
 Thanks for the sample code. It is exactly what I want. But can
 I ask another question? The matrix for which I want the negative
 square root is a covariance matrix. I suppose it should be positive
 definite, so I can do 1/sqrt(V) as you wrote. But the covariance
 matrix I got in R using the function cov has a lot of negative
 eigenvalues, like -5.338634e-17, so 1/sqrt(V) generates NA's. Can
 you tell what's the problem here.
 
 Thanks,
 Cindy

Cindy,
If that -5.338634e-17 is typical of the lot of negative eigenvalues,
then what you are seeing is the result of R's attempt to calculate
zero eigenvalues, but defeated by the inevitable rounding errors.
In other words, your covariance matrix is singular, and the variables
involved are not linearly independent.

The only thing that is guaranteed about a covariance matrix is that
it is positive semi-definite (not positive definite); in other words
all eigenvalues are positive or zero (mathematically).

For example, if Y=X, var(X) = var(Y) = 1, then
  cov(X,Y) =  1  1
  1  1
which is singular (eigenvalues = 2, 0).

The result of attempting to compute them is subject to rounding errors,
which (for zero eigenvalues) can be slightly negative.

So the covariance matrix in your case would not have an inverse,
still less a negative square root!

The basic problem is that you have luinear dependence between the
variables. To make progress, you would need to find a maximal linearly
independent set (or possibly find the principal components with
nozero weights).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Aug-09   Time: 23:58:00
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How '.' is used?

2009-08-09 Thread Ted Harding
On 09-Aug-09 16:06:52, Peng Yu wrote:
 Hi,
 I know '.' is not a separator in R as in C++. I am wondering where it
 discusses the detailed usage of '.' in R. Can somebody point me a
 webpage, a manual or a book that discuss this?
 
 Regards,
 Peng

To the best of my knowledge, apart from its specific use as a separator
between the integer and fractional parts of a number, . has no specific
use in R, and you can, for instance, use it just as you would use an
alphanumeric character in a name.

For instance, you could do

  . - 1.2345
  .
  # [1] 1.2345

  . - function(x) x^2
  .(12)
  # [1] 144

So, unless there is something I don't know about, there is hardly
anything to discuss about the detailed usage of '.' in R!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-Aug-09   Time: 17:32:54
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How '.' is used?

2009-08-09 Thread Ted Harding
On 09-Aug-09 16:53:32, Douglas Bates wrote:
 On Sun, Aug 9, 2009 at 11:32 AM, Ted
 Hardingted.hard...@manchester.ac.uk wrote:
 On 09-Aug-09 16:06:52, Peng Yu wrote:
 Hi,
 I know '.' is not a separator in R as in C++. I am wondering where it
 discusses the detailed usage of '.' in R. Can somebody point me a
 webpage, a manual or a book that discuss this?

 Regards,
 Peng

 To the best of my knowledge, apart from its specific use as a
 separator
 between the integer and fractional parts of a number, . has no
 specific
 use in R, and you can, for instance, use it just as you would use an
 alphanumeric character in a name.

 For instance, you could do

 _. - 1.2345
 _.
 _# [1] 1.2345

 _. - function(x) x^2
 _.(12)
 _# [1] 144

 So, unless there is something I don't know about, there is hardly
 anything to discuss about the detailed usage of '.' in R!
 
 The ',' character is one of the characters allowed in names, hence it
 can be used as you have suggested.  There are (at least) two special
 usages of the '.' in names.  Following the time-honoured Unix
 convention, names that begin with '.' are considered hidden names
 and not listed by ls() or objects() unless you set all.names = TRUE in
 the call.  Because of this convention it is inadvisable to use names
 starting with '.' except when you wish to avoid potential name
 conflicts.  The second special use of '.' in a name is in the
 construction of the names of S3 method functions.  The method for
 generic function foo applied to class bar is named foo.bar.

As in summary.glm, I suppose! However, this prompts a question.
In the first place, the construction of summary.glm from summary
and glm is, in the first instance, simply using . in its basic
role as a permissible character in a name. Correct?

Next -- and this is the real question -- how does R parse the name
summary.glm? In my naivety, I simply suppose that it looks for
an available function whose name is summary.glm in just the
same way as it looks for stopifnot, or for that matter data.matrix
which is not (as far as I know) a compound of a generic function
data applied to a class matrix. Then . would not have a special
(parseable) role in the name -- it is simply another letter.

But when you do have such a function, like summary.glm, does R
in fact parse it as summary then glm (i.e. look out for the
generic function summary and then specialise it to handle glm).

As I say, I suppose not. And, if not, then the special use of
the character . is simply a programmer's convention for the
construction of the name, and once the name exists the . does
not have a special (parseable) significance for R.

Just seeking clarification ... !
Thanks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-Aug-09   Time: 19:58:32
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How '.' is used?

2009-08-09 Thread Ted Harding
On 09-Aug-09 19:31:47, Duncan Murdoch wrote:
 (Ted Harding) wrote:
 [...]
 Next -- and this is the real question -- how does R parse the name
 summary.glm? In my naivety, I simply suppose that it looks for
 an available function whose name is summary.glm in just the
 same way as it looks for stopifnot, or for that matter data.matrix
 which is not (as far as I know) a compound of a generic function
 data applied to a class matrix. Then . would not have a special
 (parseable) role in the name -- it is simply another letter.
   
 
 It doesn't do anything special when parsing.  The special sauce comes 
 when the generic summary() function executes UseMethod(summary).
 At that point, we know the class of the object, and we know the name
 of the generic, so it goes looking for a summary.glm method.
 
 There are some subtleties to how it does that lookup (see the
 discussion in Writing R Extensions about NAMESPACES), but if you had
 a generic function calling UseMethod(data) and it was passed an
 object of class matrix, data.matrix() would be called, even though
 that doesn't make sense.  This is a flaw in the S3 system, and one of
 the motivations for the development of the S4 system.

Many thanks, Duncan. I think that makes it clear! You've prompted
me to read ?UseMethod':

  When a function calling 'UseMethod(fun)' is applied to an object
  with class attribute 'c(first, second)', the system searches
  for a function called 'fun.first' and, if it finds it, applies it
  to the object.  If no such function is found a function called
  'fun.second' is tried.  If no class name produces a suitable
  function, the function 'fun.default' is used, if it exists, or an
  error results.

which is pretty explicit that the role of the . is simply to
construct a name for the system to look for.

One learns ... Thanks.
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-Aug-09   Time: 21:22:23
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Seeing negative numbers to zero

2009-08-07 Thread Ted Harding
On 07-Aug-09 20:29:16, DebbieMB wrote:
 Hi,
 I am also new to R and I have a related question.  I am trying
 to set negative values in a single column of a dataframe to zero
 and I can't seem to do it.
 
 I have tried:
 KN1-subset(KN,select=c(5)) 
# Here I am selecting the column of the dataframe KN1 and assigning it
# the
 name KN2 - this step works
 KN2-ifelse(KN1=0,0,KN1) 
# Here I am trying to set negative numbers to zero and leave all other
 numbers the same - this doesn't work
 
 Any help would be appreciated.
 
 Thanks,
 Debbie

How about something on the lines of:

  KN1 - data.frame(c1=c(1,-2,3,-4,5),c2=c(-1,2,-3,4,-5),c3=c(-
  KN1
  #   c1 c2 c3
  # 1  1 -1 -2
  # 2 -2  2 -1
  # 3  3 -3  0
  # 4 -4  4  1
  # 5  5 -5  2
  KN2 - KN1
  KN2$c2 - (KN2$c20)*KN2$c2  ## ** See below
  KN2
  #   c1 c2 c3
  # 1  1  0 -2
  # 2 -2  2 -1
  # 3  3  0  0
  # 4 -4  4  1
  # 5  5  0  2

The logic of the ** line is that:
1: (KN2$c20) is FALSE wherever (KN2$c2 = 0), otherwise TRUE.
2: If x is a number, then TRUE*x = x and FALSE*x = 0
3: Hence (KN2$c20)*KN2$c2 is 0 wherever (KN2$c2 = 0), and
   is the same as KN2$c2 wherever (KN2$c2 0)

Does this help?
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-09   Time: 22:59:42
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange column shifting with read.table

2009-08-02 Thread Ted Harding
On 02-Aug-09 21:10:12, Noah Silverman wrote:
 Hi,
 I am reading in a dataframe from a CSV file. It has 70 columns.
 I do not have any kind of unique row id.
 
 rawdata - read.table(r_work/train_data.csv, header=T, sep=,, 
   na.strings=0)
 
 When training an svm, I keep getting an error
 So, as an experiment, I wrote the data back out to a new file
 so that I could see what the svm function sees.
 
 write.table(rawdata, file=r_work/output_data.csv,
 quote=FALSE, sep=,)
 
 It appears as if R has added a column for me with id numbers
 for each row.  That would be fine, except that R SHIFTS ALL MY
 COLUMN LABELS OVER ONE.  That causes several problems:
  1) The header names are now wrong for each column
  2) My last column has no label
  3) The SVM complains about the unlabeled column
 
 Would someone please help me sort this out.
 Thanks!
 -N

Not that the default for row.names in write.table() is TRUE.
So. in your caoomand, that is what you get. write.table() then
*creates* row-names (by default the row numbers). Compare:

  D - rbind(c(1.1,1.2,1.3),c(2.1,2.2,2.3),c(3.1,3.2,3.3))
  D
  #  [,1] [,2] [,3]
  # [1,]  1.1  1.2  1.3
  # [2,]  2.1  2.2  2.3
  # [3,]  3.1  3.2  3.3

  write.table(D,file=withTRUE.csv,quote=FALSE,sep=,)
  # withTRUE.csv:
  # V1,V2,V3
  # 1,1.1,1.2,1.3
  # 2,2.1,2.2,2.3
  # 3,3.1,3.2,3.3

  write.table(D,file=withFALSE.csv,row.names=FALSE,quote=FALSE,sep=,)
  # withFALSE.csv:
  # V1,V2,V3
  # 1.1,1.2,1.3
  # 2.1,2.2,2.3
  # 3.1,3.2,3.3

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 02-Aug-09   Time: 22:35:04
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tests for Two Independent Samples

2009-07-31 Thread Ted Harding
On 31-Jul-09 13:38:10, tedzzx wrote:
 Dear R users,
 I have got two samples: 
 sample A with observation of 223:
sample A has five categories: 1,2,3,4,5 (I use the numer
1,2,3,4,5 to define the five differen categories)
there are 5 observations in category 1; 81 observations in
category 2;110 observations in category 3; 27 observations
in category 4; 0 observations in category 5;
 To present the sample in R: a-rep(1:5, c(5,81,110,27,0))
 
 sample B with observation of 504:
sample B also has the same five categories: 1,2,3,4,5 
there are 6 observations in category 1; 127 observations in
category 2;297 observations in category 3; 72 observations
in category 4; 2 observations in category 5;
 To present the sample in R: b-rep(1:5, c(6,127,297,72,2))
 
 I want to test weather these two samples have significant difference
 in distribution ( or Tests for Two Independent Samples). 
 
 I find a webside in:
 http://faculty.chass.ncsu.edu/garson/PA765/mann.htm
 
 This page shows four nonparametric tests. Bust I can only find the test
 Kolmogorov-Smirnov Z Test. 
 res-ks.test(a,b)
 
 Can any one tell me which package has the other 3 tests? or Is there
 any other test for my question?
 Thanks advance
 Ted

If your 1,2,3,4,5 are simply nominal codes for the categories,
then you may be satisfied with a Fisher test or simply a chi-squared
test (using simulated P-values in view of the low frequencies in
some cells).

Taking your data:

  A-c(5,81,110,27,0)
  B-c(6,127,297,72,2)
  M-cbind(A,B)
  D-colSums(M)
  P-M%*%(diag(1/D))
  P
  #[,1][,2]
  # [1,] 0.02242152 0.011904762
  # [2,] 0.36322870 0.251984127  ## So the main differences between
  # [3,] 0.49327354 0.589285714  ## A and B are in these two categories
  # [4,] 0.12107623 0.142857143
  # [5,] 0. 0.003968254

  fisher.test(M,simulate.p.value = TRUE,B=10)
  #  Fisher's Exact Test for Count Data with simulated p-value
  #  (based on 1e+05 replicates)
  #  data:  M 
  #  p-value = 0.01594

  chisq.test(M,simulate.p.value=TRUE,B=10)
  #  Pearson's Chi-squared test with simulated p-value
  #  (based on 1e+05 replicates)
  #  data:  M 
  #  X-squared = 11.7862, df = NA, p-value = 0.01501

So the P-values are similar in both tests.
(Another) Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Jul-09   Time: 17:53:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string

2009-07-31 Thread Ted Harding
On 31-Jul-09 22:10:46, Mohsen Jafarikia wrote:
 Hello All:
 I am wondering how I can have dat1 and dat2 in the following
 loop where 'dat' and 'i' stick together to make dat1 and dat2 :
 
 ifn - MyData
 dat - read.table(ifn)
 
 MyData:
 01  0.40
 02  0.40
 03  0.40
 04  0.35
 05  0.34
 06  0.33
 
 names(dat)-c(Code,M)
 for(i in 1:2)
 {
 dati - dat[i:i+2,]
 }

Try using paste():

  for(i in (1:2)){ print(paste(dat, i, sep=)) }

See '?paste'

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Jul-09   Time: 23:31:17
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do poisson distribution test like this?

2009-07-29 Thread Ted Harding
On 29-Jul-09 03:28:24, glen_b wrote:
 Corrected links (the originals somehow aquired an extra space, sorry)
 
 Paper: http://www.plantphysiol.org/cgi/content/full/148/3/1189
 Table I: http://www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1

I think I've cracked it (using the informationm in Table 1).

What they seem to have done is:
1: Sum the oberved numbers N[i] of F-Box Genes: total = N
2: Sum the lengths L[i] of the chromosomes: total = L
3: Calculate expected number Exp[i] for chromosome i as
   Exp[i] = N*L[i]/L
4: Use the Poisson distribution as:
 ppois(N[i],Exp[i])if N[i]  Exp[i]
 1-ppois(N[i],Exp[i]   if N[i]  Exp[i] (equality does not occur)

I have implemented this in R, using the Table 1 data, as follows.
The variables Exp0 and P0 are the values of Exp and P from the Table;
the variables Exp1 and P1 are as calculated according to the above.

Len  - c(32.16,23.44,17.45,15.08,17.04,17.68,11.90,
  15.43,12.41,19.21,13.17,13.03,11.50,13.68,
  10.19,12.82,5.44,12.44,10.23)
Obs  - c(36,25,13,10,19,19,12,15,12,15,17,13,13,13,12,9,2,12,2)
Exp0 - c(30,22,17,14,16,17,11,15,12,18,13,12,11,13,10,12,5,12,10)
P0   - c(0.137,0.235,0.235,0.159,0.196,0.242,0.340,
  0.391,0.395,0.273,0.082,0.353,0.207,0.421,
  0.176,0.231,0.112,0.398,0.004)

N - sum(Obs) ; L - sum(Len)
Exp1 - N*Len/L
Low - (Obs  Exp1)
P1 - Low*ppois(Obs,Exp1) + (1-Low)*(1-ppois(Obs,Exp1))

cbind(Len,Obs,Exp0,Exp1,P0,P1)
 Len  Obs  Exp0   Exp1 P0   P1
#  [1,] 32.16  3630  30.429265  0.137  0.136529344
#  [2,] 23.44  2522  22.178544  0.235  0.234714001
#  [3,] 17.45  1317  16.510904  0.235  0.234942347
#  [4,] 15.08  1014  14.268449  0.159  0.158563376
#  [5,] 17.04  1916  16.122969  0.196  0.196445135
#  [6,] 17.68  1917  16.728526  0.242  0.241956811
#  [7,] 11.90  1211  11.259585  0.340  0.340015730
#  [8,] 15.43  1515  14.599613  0.391  0.390970369
#  [9,] 12.41  1212  11.742139  0.395  0.394571188
# [10,] 19.21  1518  18.176187  0.273  0.273013405
# [11,] 13.17  1713  12.461238  0.082  0.082372362
# [12,] 13.03  1312  12.328772  0.353  0.353596077
# [13,] 11.50  1311  10.881112  0.207  0.207821286
# [14,] 13.68  1313  12.943792  0.421  0.420776167
# [15,] 10.19  1210   9.641611  0.176  0.175748117
# [16,] 12.82   912  12.130074  0.231  0.231213063
# [17,]  5.44   2 5   5.147239  0.112  0.112786229
# [18,] 12.44  1212  11.770524  0.398  0.397809438
# [19,] 10.23   210   9.679458  0.004  0.003598524

There is now agreement between P0 (from the article) and P1
(as calculated above) -- subject to the rounded third decimal
place of P0 being 1 out in a few cases ([12], [13], [17]).

The puzzle clearly qrose in the first place becaue the authors
reported their Expected Numbers as integers, not fractions!

Well, well, well!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Jul-09   Time: 11:59:11
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do poisson distribution test like this?

2009-07-29 Thread Ted Harding
Follow-up:

On 29-Jul-09 03:28:24, glen_b wrote:
 Corrected links (the originals somehow aquired an extra space, sorry)
 
 Paper: http://www.plantphysiol.org/cgi/content/full/148/3/1189
 Table I: http://www.plantphysiol.org/cgi/content/full/148/3/1189/TBL1
 --

I have now scanned though the Paper, and find the explanation of
their calculation of Expected Length (which agrees with the method
I described in my previous email) in a somewhat garbled description
on page 1198 (page 10/12 of the PDF file):

  The expected gene number lambda_i on chromosome i would be
   a sample from a Poisson distribution lambda_i = m*L_i/Sum L_i
   where m is the total number of genes detected within the
   assembled sequences and L_i is the length of chromosome i.

I agree with your earlier comment, Glen, that the paper would have
benefited from more careful scrutiny by referees/editor!

Best wishes,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Jul-09   Time: 12:43:26
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do poisson distribution test like this?

2009-07-28 Thread Ted Harding
On 28-Jul-09 10:03:41, Mao Jianfeng wrote:
 Dear R-listers,
 I want to reperfrom a poisson distribution test that presented
 in a recent-published biological research paper (Plant Physiology
 2008, vol 148, pp. 1189-1200). That test is about the occurrence
 number of a kind of gene in separate chromosomes.
 
 For instance:
 
 The observed gene number in chromosome A is 36.
 The expected gene number in chromosome A is 30.
 
 Then, the authors got a probability 0.137 by distribution test on this
 trial. In this test, a Poisson distribution was used to determine the
 significance of the gene distribution.
 
 Questions:
 How can I reperform this test in R?
 
 Thank you in advance.
 Mao Jian-Feng
 Institue of Botany,
 CAS, China

Since it is not clear what test procedure they used, I have done
a couple of numerical experiments in R:

1. Compare the upper-tail probability of the POisson distribution
   with mean mu = 30 of the event that at least 36 are observed:

   1-ppois(36,30)
   # [1] 0.1196266

   Not quite the 0.137 that they got.

2. A similar comparison, using a Normal approximation to the Poisson
   (mean mu = 30, SD = sqrt(mu)):

   1 - pnorm(6/sqrt(30))
   # [1] 0.1366608

   which, after rounding, is exactly the 0.137 that they got.

So it seems they have used an upper-tail test based on the Normal
approximation to the Poisson distribution.

Method 1 (using the exact Poisson distribution) is preferable, since
it is accurate (given the assumption of Poisson distribution).
So that would, in principle, be the best way to do it in R (as
illustrated).

Possibly their adoption of Method 2 is based on a naive acceptance
of the rule-of-thumb from some textbook; or maybe their available
software does not offer ready access to the exact Poisson distribution
(which wouldn't happen if they used R -- see Method 1). As stated,
it is inaacurate compared with Method 1, so is not to be preferred.

However, if you need to reproduce their method (regardless of merit),
then use Method 2.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09   Time: 11:42:08
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] check for new files in a given directory

2009-07-28 Thread Ted Harding
On 28-Jul-09 11:04:39, Andreas Posch wrote:
 I am trying to continuously evaluate online created data files using
 R-algorithms. Is there any simple way to let R iteratively check for
 new files in a given directory, load them and process them?
 
 Any help would be highly appreciated.
 Best, A.

If in Linux/Unix, the following sort of thing will work:

  LastList - system(ls,intern=TRUE)

then, later,

  NewList - system(ls,intern=TRUE)

and then

  NewList[!(NewList %in% LastList)]

is a character vector of the names in NewList which are not in LastList
(i.e. the ones which have come in since LastList was created). Then you
can do what you like with these names.

Finally:

  LastList - NewList

means you can repeat the check on the same basis.

Don't ask me how to do this in Windows ...

Hpoing this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09   Time: 12:24:42
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] check for new files in a given directory

2009-07-28 Thread Ted Harding
On 28-Jul-09 11:23:21, Barry Rowlingson wrote:
 On Tue, Jul 28, 2009 at 12:04 PM, Andreas
 Poschandreas.po...@tugraz.at wrote:
 I am trying to continuously evaluate online created data files using
 R-algorithms. Is there any simple way to let R iteratively check for
 new files in a given directory, load them and process them?

 Any help would be highly appreciated.
 
  list.files(dir) will tell you what files are in a directory.
 
  file.info(filepath) will tell you things about a file (modification
 time etc).
 
  Sys.sleep(n) will put R to sleep for n seconds so you don't have a
 tight loop checking the directory every millisecond.
 
 That's probably all the functionality you need, except maybe to keep
 track of what files you consider 'new', and some way of deciding if
 something has already been processed. But that's a bit
 application-specific!
 
 Barry

Snap, Baz! (Except that we played different cards -- and in view of
Andreas's follow-up, yours would be the solution for him).

However, this got me looking into '?list.files, and I see there
(R version 2.9.0 (2009-04-17)):

  recursive: logical. Should the listing recurse into directories?

But:

  Directories are included only if 'recursive = FALSE'.

Surely the latter is the wrong way round, and should be

  Directories are included only if 'recursive = TRUE'.

(that's how it worked when I just tried it).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09   Time: 12:36:45
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looking for example of usage of function unz

2009-07-28 Thread Ted Harding
On 28-Jul-09 12:15:33, mau...@alice.it wrote:
 I would greatly appreciate some example of correct usage of
 function unz.
 I have to download and  uncompress the following web compressef file:
 ftp://ftp.sanger.ac.uk/pub/mirbase/targets/v5/arch.v5.txt.homo_sapiens.z
 ip
 
 I tried the following command that does not work:
 
 Targets.rec - readLines(zz -
 unz(ftp://ftp.sanger.ac.uk/pub/mirbase/targets/v5/arch.v5.txt.homo_sapi
 ens.zip))
 
 Thank you in advance,
 Maura

Maura, unz() is insisting that you give the filename argument,
which, according to '?unz', is:

  filename: a filename within a zip file.

This means that you would first need to know the name[s] of the
file[s] archived in the .zip file, as far as I can see.

So, in that case, a first step would be to download the .zip
file anyway, to your local system, and then, in whatever way is
appropriate for your system (unzip -l  in Linux), find
out what the files in it are called.

But once you have got that far, you may prefer to handle the .zip
file outside of R ...

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09   Time: 13:50:45
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] check for new files in a given directory

2009-07-28 Thread Ted Harding
On 28-Jul-09 13:40:31, Barry Rowlingson wrote:
 On Tue, Jul 28, 2009 at 12:36 PM, Ted
 Hardingted.hard...@manchester.ac.uk wrote:
 However, this got me looking into '?list.files, and I see there
 (R version 2.9.0 (2009-04-17)):

 _recursive: logical. Should the listing recurse into directories?

 But:

 _Directories are included only if 'recursive = FALSE'.

 Surely the latter is the wrong way round, and should be

 _Directories are included only if 'recursive = TRUE'.
 
  by 'included' it means 'returned'. If you do 'recursive=TRUE' it
 scans recursively for files and only files. If you do
 recursive=FALSE it returns files and directories in the specified
 directory.
 
 Makes it tricky to figure out a complete directory tree since empty
 directories won't appear at all if recursive=TRUE. You'd have to
 implement your own recursive search based on
 list.files(d,recursive=FALSE) and then testing for directoriness
 
  Sucky, unless there's a better way...
 Barry

Thanks for the clarification, Barry! I hadn't appreciated all that
(I think the wording of ?list.files is misleading here).

I agree that it can leave you falling between two stools in the
sort of situation you describe. Which, as it happens, leads me back
to the use of system(...) in Unix/Linux systems. For example, to
list all the directories (without their files) in the current
directory you could do:

  system(find . -type d -print)

which finds (recursively) everything in and under the current
directory (.) which is of type directory (d).

Likewise, the 'ls' command with suitable options (perhaps combined
with judicious 'grep'ping) can enable you to select what you want.
For example, if something is planting data files with extension
.dat from time to time,

  system(ls -tr *.dat)

would list all files with extension .dat in time (-t) order
reversed (r) (most recent last). Again, find could be useful.
Suppose the last one of such files you accessed was lastfile.dat;
then:

  system(find -maxdepth 0 -newer lastfile.dat -name '*.dat' -print)

would find, in the current directory only (-depth 0) all files
with extension .dat (-name '*.dat') which are newer than the
given file lastfile.dat.

It is for reasons of flexibility like this that I use system() for
this kind of thing anyway (with 'intern=TRUE' if the results are to
be saved in a variable), so I hadn't looked into list.files() before!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09   Time: 15:24:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help

2009-07-28 Thread Ted Harding
On 28-Jul-09 08:28:22, Inchallah Yarab wrote:
 How I can vary the parameters for a function? 
 
 I have a function with 5 parameters I want to turn the function for a
 range of numbers for one of these parameters!! i want to have in the
 end the value of the function in the different cas of one of the
 paramter (the others paramters are fixes!!) thank you for your help

It depends on the internals of the function. In R, most functions
(including what you write yourself, if written in the right way)
allow you to give a vector of numbers to a numerical parameter.
The calculations are then vectorised in a single pass, and the
results for each value are returned as a vector.

For example, the pnorm() function for several different standard
deviations:

  SD = c(1,1.5,2,2.5,3)
  cbind(SD,pnorm(q=0.5, mean=0, sd=SD))
  # [1,] 1.0 0.6914625
  # [2,] 1.5 0.6305587
  # [3,] 2.0 0.5987063
  # [4,] 2.5 0.5792597
  # [5,] 3.0 0.5661838

Likewise, if your function is

  my.fun - function(par1,par2,par3,par4,par5){
(par1 + par2*par3 + (par3^2)*(par4 + par5))
  }

then you could have

  par1 - 1.1 ; par2 - 1.2 ; par4 - 1.4; par5 - 1.5
  par3 - SD # (as above)
  cbind(SD,my.fun(par1,par2,par3,par4,par5))
  #   SD   
  # [1,] 1.0  5.200
  # [2,] 1.5  9.425
  # [3,] 2.0 15.100
  # [4,] 2.5 22.225
  # [5,] 3.0 30.800

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jul-09   Time: 15:39:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message: .Random.seed is not an integer vector but

2009-07-23 Thread Ted Harding
On 23-Jul-09 15:30:05, Jim Bouldin wrote:
 I'm trying to run this simple random sample procedure and keep
 getting the error message shown. I don't understand this; I've
 Thanks.
 
 x = as.vector(c(1:12));x
  [1]  1  2  3  4  5  6  7  8  9 10 11 12
 mode(x)
 [1] numeric
 sample(x, 3)
 Error in sample(x, 3) : 
   .Random.seed is not an integer vector but of type 'list'
 
 Jim Bouldin, PhD

I think this error is not arising from the code you have given above.
I get:

  x=as.vector(c(1:12))
  sample(x,3)
  # [1]  8 12  2

which is fine. '.Random.seed' should be an integer vector to start with.
See '?set.seed'. Therefore it should not be a list. Try entering

  .Random.seed

and see what you get -- when I do it it is a vector of 626 assorted
integers. Also try:

  is.vector(.Random.seed)

and, if that is not TRUE, try

  str(.Random.seed)
  typeof(.Random.seed)

If .Random.seed turns out to be a list, then something has gone
seriously wrong somewhere. It may be that somwehere in your code
there is a command that meddles with .Random.seed (or perhaps it
was saved, in .Rdata, from a previous session where something
went wrong). Or, worst scenario, your R installation is broken ...

And, while I'm at it, your x=as.vector(c(1:12)) is overkill!
Simply

  x - (1:12) ; sample(x,3)

should be enough, or even

  sample((1:12),3)

since (1:12) is a vector (of integers).

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09   Time: 17:13:55
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error message: .Random.seed is not an integer vector but

2009-07-23 Thread Ted Harding
On 23-Jul-09 16:08:17, Jim Bouldin wrote:
 Jim Bouldin wrote:
  Thank you.  However, when I tried that, I got this message:
 
  Warning message:
  In rm(.Random.seed) : variable .Random.seed was not found
 
 In that case, have you attached some package that has its own
 .Random.seed?
 Try to find where the current .random.seed comes from R complains
 about.
 
 Uwe Ligges
 
 No, there are no attached packages, just the ones that load
 automatically. The R commander has some type of RNG but it is not
loaded.  Completely stumped.

Follow-up to my previous reply (just posted). Having read the other
responses and your reactions, try the following:

  rm(.Random.seed)
  set.seed(54321) ## (Or your favourite magic number) [*]
  x = as.vector(c(1:12))  ## To reproduce your original code ... !
  sample(x,3)

[*] When you did rm(.Random.seed) as suggested by Uwe, the variable
.Random.seed was lost, so you have to create it again.

If, after the above, you still get the problem, then something is
very seriously wrong.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09   Time: 17:23:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random # generator accuracy

2009-07-23 Thread Ted Harding
On 23-Jul-09 17:59:56, Jim Bouldin wrote:
 Dan Nordlund wrote:
 It would be necessary to see the code for your 'brief test'
 before anyone could meaningfully comment on your results.
 But your results for a single test could have been a valid
 random result.
 
 I've re-created what I did below.  The problem appears to be
 with the weighting process: the unweighted sample came out much
 closer to the actual than the weighted sample (1% error) did. 
 Comments?
 Jim
 
 x
  [1]  1  2  3  4  5  6  7  8  9 10 11 12
 weights
  [1] 1 1 1 1 1 1 2 2 2 2 2 2
 
 a = mean(replicate(100,(sample(x, 3, prob = weights;a  # (1
 million samples from x, of size 3, weighted by weights; the mean
 should
 be 7.50)
 [1] 7.406977
 7.406977/7.5
 [1] 0.987597
 
 b = mean(replicate(100,(sample(x, 3;b  # (1 million samples
 from
 x, of size 3, not weighted this time; the mean should be 6.50)
 [1] 6.501477
 6.501477/6.5
 [1] 1.000227
 
 Jim Bouldin, PhD

To obtain the result you expected, you would need to explicitly
specify replace=TRUE, since the default for replace is FALSE.
(though probably what you really intended was sampling without
replacement).

Read carefully what is said about prob in '?sample' -- when
replace=FALSE, the probability of inclusion of an element is not
proportional to its weight in 'prob'.

The reason is that elements with higher weights are more likely
to be chosen early on. This then knocks that higher weight out
of the contest, making it more likely that elements with smaller
weights will be sampled subsequently. Hence the mean of the sample
will be biased slightly downwards, relative to the weighted mean
of the values in x.

  table(replicate(100,(sample(x, 3
  #  1  2  3  4  5  6
  # 250235 250743 249603 250561 249828 249777
  #  7  8  9 10 11 12
  # 249780 250478 249591 249182 249625 250597

(so all nice equal frequencies)

  table(replicate(100,(sample(x, 3,prob=weights
  #  1  2  3  4  5  6
  # 174873 175398 174196 174445 173240 174110
  #  7  8  9 10 11 12
  # 325820 326140 325289 325098 325475 325916

Note that the frequencies of the values with weight=2 are a bit
less than twice the frequencies of the values with weight=1:

  (325820+326140+325289+325098+325475+325916)/
(174873+175398+174196+174445+173240+174110)
  # [1] 


In fact this is fairly easily caluclated. The possible combinations
(in order of sampling) of the two weights, with their probabilities,
are:
 1s  2s
---
1 1 1   P =  6/18 *  5/17 *  4/163   0
1 1 2   P =  6/18 *  5/17 * 12/162   1
1 2 1   P =  6/18 * 12/17 *  5/152   1
1 2 2   P =  6/18 * 12/17 * 10/151   2
2 1 1   P = 12/18 *  6/16 *  5/152   1
2 1 2   P = 12/18 *  6/16 * 10/151   2
2 2 1   P = 12/18 * 10/16 *  6/141   2
2 2 2   P = 12/18 * 10/16 *  8/140   3

So the expected number of weight=1 in the sample is

  3*(6/18 *  5/17 *  4/16)  + 2*(6/18 *  5/17 * 12/16) +
  2*(6/18 * 12/17 *  5/15)  + 1*(6/18 * 12/17 * 10/15) +
  2*(12/18 *  6/16 *  5/15) + 1*(12/18 *  6/16 * 10/15) +
  1*(12/18 * 10/16 *  6/14) + 0
  = 1.046218

Hence the expected number of weight=2 in the sample is

  3 - 1.046218 = 1.953782

and their ratio 1.953782/1.046218 = 1.867471

Compare this with the value 1.867351 (above) obtained by simulation!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09   Time: 21:05:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random # generator accuracy

2009-07-23 Thread Ted Harding
OOPS! The result of a calculation below somehow got omitted!

  (325820+326140+325289+325098+325475+325916)/
(174873+175398+174196+174445+173240+174110)
  # [1] 1.867351

to be compared (as at the end) with the ratio 1.867471 of the
expected number of weight=2 to expected number of weight=1.
Sorry.
Ted.

On 23-Jul-09 20:05:09, Ted Harding wrote:
 On 23-Jul-09 17:59:56, Jim Bouldin wrote:
 Dan Nordlund wrote:
 It would be necessary to see the code for your 'brief test'
 before anyone could meaningfully comment on your results.
 But your results for a single test could have been a valid
 random result.
 
 I've re-created what I did below.  The problem appears to be
 with the weighting process: the unweighted sample came out much
 closer to the actual than the weighted sample (1% error) did. 
 Comments?
 Jim
 
 x
  [1]  1  2  3  4  5  6  7  8  9 10 11 12
 weights
  [1] 1 1 1 1 1 1 2 2 2 2 2 2
 
 a = mean(replicate(100,(sample(x, 3, prob = weights;a  # (1
 million samples from x, of size 3, weighted by weights; the mean
 should
 be 7.50)
 [1] 7.406977
 7.406977/7.5
 [1] 0.987597
 
 b = mean(replicate(100,(sample(x, 3;b  # (1 million samples
 from
 x, of size 3, not weighted this time; the mean should be 6.50)
 [1] 6.501477
 6.501477/6.5
 [1] 1.000227
 
 Jim Bouldin, PhD
 
 To obtain the result you expected, you would need to explicitly
 specify replace=TRUE, since the default for replace is FALSE.
 (though probably what you really intended was sampling without
 replacement).
 
 Read carefully what is said about prob in '?sample' -- when
 replace=FALSE, the probability of inclusion of an element is not
 proportional to its weight in 'prob'.
 
 The reason is that elements with higher weights are more likely
 to be chosen early on. This then knocks that higher weight out
 of the contest, making it more likely that elements with smaller
 weights will be sampled subsequently. Hence the mean of the sample
 will be biased slightly downwards, relative to the weighted mean
 of the values in x.
 
   table(replicate(100,(sample(x, 3
   #  1  2  3  4  5  6
   # 250235 250743 249603 250561 249828 249777
   #  7  8  9 10 11 12
   # 249780 250478 249591 249182 249625 250597
 
 (so all nice equal frequencies)
 
   table(replicate(100,(sample(x, 3,prob=weights
   #  1  2  3  4  5  6
   # 174873 175398 174196 174445 173240 174110
   #  7  8  9 10 11 12
   # 325820 326140 325289 325098 325475 325916
 
 Note that the frequencies of the values with weight=2 are a bit
 less than twice the frequencies of the values with weight=1:
 
   (325820+326140+325289+325098+325475+325916)/
 (174873+175398+174196+174445+173240+174110)
   # [1] 
 
 
 In fact this is fairly easily caluclated. The possible combinations
 (in order of sampling) of the two weights, with their probabilities,
 are:
  1s  2s
 ---
 1 1 1   P =  6/18 *  5/17 *  4/163   0
 1 1 2   P =  6/18 *  5/17 * 12/162   1
 1 2 1   P =  6/18 * 12/17 *  5/152   1
 1 2 2   P =  6/18 * 12/17 * 10/151   2
 2 1 1   P = 12/18 *  6/16 *  5/152   1
 2 1 2   P = 12/18 *  6/16 * 10/151   2
 2 2 1   P = 12/18 * 10/16 *  6/141   2
 2 2 2   P = 12/18 * 10/16 *  8/140   3
 
 So the expected number of weight=1 in the sample is
 
   3*(6/18 *  5/17 *  4/16)  + 2*(6/18 *  5/17 * 12/16) +
   2*(6/18 * 12/17 *  5/15)  + 1*(6/18 * 12/17 * 10/15) +
   2*(12/18 *  6/16 *  5/15) + 1*(12/18 *  6/16 * 10/15) +
   1*(12/18 * 10/16 *  6/14) + 0
   = 1.046218
 
 Hence the expected number of weight=2 in the sample is
 
   3 - 1.046218 = 1.953782
 
 and their ratio 1.953782/1.046218 = 1.867471
 
 Compare this with the value 1.867351 (above) obtained by simulation!
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 23-Jul-09   Time: 21:05:07
 -- XFMail --


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09   Time: 21:15:52
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random # generator accuracy

2009-07-23 Thread Ted Harding
!
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 23-Jul-09   Time: 21:05:07
 -- XFMail --
 
 
 Jim Bouldin, PhD
 Research Ecologist
 Department of Plant Sciences, UC Davis
 Davis CA, 95616
 530-554-1740
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09   Time: 23:00:56
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random # generator accuracy

2009-07-23 Thread Ted Harding
On 23-Jul-09 22:16:39, Thomas Lumley wrote:
 On Thu, 23 Jul 2009 ted.hard...@manchester.ac.uk wrote:
 
 The general problem, of sampling without replacement in such a way
 that for each item the probability that it is included in the sample
 is proportional to a pre-assigned weight (sampling with probability
 proportional to size) is quite tricky and, for certain choices
 of weights, impossible. For a glimpse of what's inside the can of
 worms, have a look at the reference manual for the 'sampfling'
 package, in particular the function samprop():

 
 The 'sampling' package also provides a nice selection of ways to draw
 PPS samples without replacement.
 
 -thomas

Thanks, Thomas. There is of course also the package 'pps'.
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Jul-09   Time: 00:00:35
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correction.

2009-07-21 Thread Ted Harding
On 21-Jul-09 01:24:04, Rolf Turner wrote:
 It has been pointed out to me that I erred in an earlier post.
 ``Go stick your head in a pig.'' is not the motto of the (entire)
 Sirius Cybernetics Corporation.  It is the motto if the Sirius
 Cybernetics Corporation ***Complaints Division***.
 
 My apologies for the misinformation.
   cheers,
   Rolf Turner

Don't worry, Rolf. It wasn't your fault. You have provided a valid
response to a Randomised Conceptualisation Trial (which involves
applying Thought Interventions to participating subjects). If you
had the impression that the Intervention was not random in your
case, be assured that this was as intended.

The Sirius Cybernetics Corporation (SCC) will record your response
in advanced computers equipped with the SCC's groundbreaking WOM
(Write-Only Memory). SCC take very seriously the problem of disposal
of computers which cease to function after a certain amount of use,
and are working to develop the enhanced EWOM (Erasable Write-Only
Memory) module. SCC are proud to have been associated with the
development of the unsinkable submarine.[*]

Yours cosmically,
Z.B.

[*] Such things are not fiction. They occur in real life:

  The prime minister responded: 'We have done everything
   that we can to increase the number of helicopters and
   there will be more Merlin helicopters on the ground.'

http://news.bbc.co.uk/1/hi/uk/8151244.stm


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-Jul-09   Time: 10:41:38
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zinb for Non-interger data

2009-07-19 Thread Ted Harding
On 18-Jul-09 17:26:36, JPS2009 wrote:
 Sorry bit of a Newbie question, and I promise I have searched the
 forum already, but I'm getting a bit desperate!
 
 I have over-dispersed, zero inflated data, with variance greater
 than the mean, suggesting Zero-Inflated Negative Binomial - which
 I attempted in R with the pscl package suggested on
 http://www.ats.ucla.edu/stat/R/dae/zinbreg.htm
 
 However my data is non-integer with some pesky decimals (i.e. 33.12)
 and zinb / pscl doesn't like that - not surprising as zinb is for
 count data, normally whole integers etc.
 
 Does anyone know of a different zinb package that will allow
 non-integers or and equivalent test/ model to zinb for non-integer
 data? Or should I try something else like a quasi-Poisson GLM?
 
 Apologies for the Newbie question! Any help much appreciated!
 Thanks!

The presence of decimals suggests that those data values are records
of quantities which ought to be modelled as continuous variables.
For instance, in answer to a survey question How much did you spend
on alcoholic drinks yesterday, the answer would be either a positive
sum of money (with decimals), or zero, depending on whether the
person spent anything at all on alcohol.

So:
With probability p, the amount spent was positive and, conditional
on being positive, has a distribution which can be modelled by a
particular continuous distribution (maybe Log-normal?).

With probability (1-p), the amount spent was zero.

So a correct approach first requires you to face the question of
how to model the positive part of the distribution.

Once you have settled that question, it is then possible to see
whether that particular class of problem is covered by some package
in R, or whether you need to develop an approach yourself.

In any case, if I am barking up the right tree above, neither negative
binomial nor Poisson would, in principle, be correct for such data
since, as you observe, these are intended for count data, not for
data which is essentially continuous.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 19-Jul-09   Time: 12:25:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (-8)^(1/3) == NaN?

2009-07-18 Thread Ted Harding
On 18-Jul-09 22:04:57, Dave DeBarr wrote:
 Why does the expression (-8)^(1/3) return NaN, instead of -2?
 
 This is not answered by
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-are-powers-of-negative-
 numbers-wrong_003f
 
 Thanks,
 Dave

Because R does not try to evaluate (-8)^(1/3), but (-8)^x, where x is
a very close approximation to 1/3 but is not exactly 1/3 (which is
impossible in a finite binary representation).

But even if it could exactly represent 1/3, R would still need to
have a special look-up for certain fractional powers (1/3, 1/5, ... )
to enable it to recognise that these are odd-integer-roots of negatgive
numbers, and therefore can be evaulated as -(nth_root(abs(x))).

It doesn't help, either, to try to do it in complex numbers, since
(-8) will then be seen as 8*exp(i*pi) whose cube root will be found as

2*exp(i*pi/3) = 2*(cos(pi/3) + i*sin(pi/3)) = 2*(1/2 + i*sqrt(3)/2):

  (complex(1,-8,0))
  # [1] -8+0i

  complex(1,-8,0)^(1/3)
  # [1] 1+1.732051i

  (8*exp(complex(1,0,pi)))^(1/3)
  # [1] 1+1.732051i

  sqrt(3)
  # [1] 1.732051

I'm not sure what best to suggest for your situation. Basically, if it
is in a context where it can only be

  (negative number)^(1/(odd integer))

then you are better off modifying the logic of your program so as
to ensure the result you want.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jul-09   Time: 23:54:11
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package norm has been removed. What to use for Maximum L

2009-07-17 Thread Ted Harding
On 17-Jul-09 12:12:01, Uwe Ligges wrote:
 Achim Zeileis wrote:
 On Fri, 17 Jul 2009, Daniel Abbott wrote:
 Hello,
 I apologize if an answer to my questions is available, or if I
 submitted this question incorrectly. I have read the mailing lists,
 as well as the R Project and CRAN homepages. However, I may have
 missed something.

 I noticed the package 'norm' has been removed. Its page
 http://cran.r-project.org/web/packages/norm/index.html now reads:

 Package ?norm? was removed from the CRAN repository.

 Formerly available versions can be obtained from the archive.

 My questions are:

 1. Why was norm removed? I used the package and found it to be very
 useful. Is there a serious error in the package or another problem?
 
 What do you consider to be a serious error?
 
 Typical reasons for removing a package from the active CRAN repository
 are that the package does not pass CRAN checks and/or that the package
 maintainer is irresponsive. (norm, specifically, hadn't been updated 
 since 2002.) This is serious enough to make all automatic CRAN
 features such as daily package checks and building of binary packages
 too cumbersome.
 
 But if you do not consider this to be serious, you can keep on using 
 norm, it is still in CRAN's package archives. Or, even better, you 
 could adopt it, fix it, and release a new version to CRAN (although it
 is not quite clear to me whether the norm's license allows this).
 
 
 Oh yes, indeed.
 
 Please ignore my former message, I haven't looked at the license 
 carefully enough. And since the license is rather special (not
 GPL'ed as I assumed in my message), you cannot take over
 maintainership without negotiation with the former maintainer,
 I fear.
 Best,
 Uwe Ligges

On a point of information: The licence in question:

License: The software may be distributed free of charge and used
 by anyone if credit is given. It has been tested fairly
 well, but it comes with no guarantees and the authors
 assume no liability for its use or misuse.

is verbatim from Joe Shafer's original licence for his NORM.
He used exactly the same wording for his packages CAT, MIX and PAN.
These were originally written in S, with FORTRAN code for many of
the functions. The various people who have ported these to R have
simply copied these words into the R packages. See (if you have
the packages installed)
  library(help=cat)
  library(help=norm)
  library)help=mix)

I may have some comments about this removed from CRAN issue later,
but I need to think about them first ...

Best wishes to all,
Ted.



 (Note that CRAN runs no checks whether the methods implemented are 
 reasonable or well-suited for the problem they are trying to address
 etc.)
 
 2. If norm should no longer be removed, what is another good package
 for ML (maximum likelihood) missing data imputation? I have seen
 several recommendations online, but I wonder which one is currently
 the reigning champ.
 
 The Multivariate and the SocialSciences task views have sections 
 about missing data
   http://CRAN.R-project.org/view=Multivariate
   http://CRAN.R-project.org/view=SocialSciences
 
 Additionally, I can recall that there is the Amelia II package:
   http://CRAN.R-project.org/package=Amelia
 and potentially others.
 
 hth,
 Z
 
 Thank you very much. I appreciate your time!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jul-09   Time: 13:38:23
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package norm has been removed. What to use for Maximum L

2009-07-17 Thread Ted Harding
Follow-up:

On 17-Jul-09 12:38:27, Ted Harding wrote:
 On a point of information: The licence in question:
 
 License: The software may be distributed free of charge and used
  by anyone if credit is given. It has been tested fairly
  well, but it comes with no guarantees and the authors
  assume no liability for its use or misuse.
 
 is verbatim from Joe Shafer's original licence for his NORM.
 He used exactly the same wording for his packages CAT, MIX and PAN.
 These were originally written in S, with FORTRAN code for many of
 the functions. The various people who have ported these to R have
 simply copied these words into the R packages. See (if you have
 the packages installed)
   library(help=cat)
   library(help=norm)
   library)help=mix)
 
 I may have some comments about this removed from CRAN issue later,
 but I need to think about them first ...
 
 Best wishes to all,
 Ted.

While Joe Shafer's MI software web page

  http://www.stat.psu.edu/~jls/misoftwa.html

as cited in the R help pages, still exists (though apparently still
dating from 1999), only the Windows versions of NORM, CAT, MIX and PAN
are still accessible. The link to the Unix versions:

S-PLUS for Unix:  http://www.stat.psu.edu/~jls/splunix.html

no longer leads anywhere (though it still did only a few years ago).
I don't know whether it has been moved, and can be found elsewhere,
ot whether Shafer has withdrawn it.

The Windows versions would be compiled code. The Unix versions were
source code and needed to be compiled. If Shafer has withdrawn the
source-code versions, this might have implications for use (these
underlie the S-Plus missing library).

Does anyone have information about this?
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jul-09   Time: 13:54:45
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package norm has been removed. What to use for Maximum L

2009-07-17 Thread Ted Harding
On 17-Jul-09 13:01:46, Gabor Grothendieck wrote:
 On Fri, Jul 17, 2009 at 8:54 AM, Ted
 Hardingted.hard...@manchester.ac.uk wrote:
 Follow-up:

 On 17-Jul-09 12:38:27, Ted Harding wrote:
 On a point of information: The licence in question:

 License: The software may be distributed free of charge and used
 _ _ _ _ _by anyone if credit is given. It has been tested fairly
 _ _ _ _ _well, but it comes with no guarantees and the authors
 _ _ _ _ _assume no liability for its use or misuse.

 is verbatim from Joe Shafer's original licence for his NORM.
 He used exactly the same wording for his packages CAT, MIX and PAN.
 These were originally written in S, with FORTRAN code for many of
 the functions. The various people who have ported these to R have
 simply copied these words into the R packages. See (if you have
 the packages installed)
 _ library(help=cat)
 _ library(help=norm)
 _ library)help=mix)

 I may have some comments about this removed from CRAN issue later,
 but I need to think about them first ...

 Best wishes to all,
 Ted.

 While Joe Shafer's MI software web page

 _http://www.stat.psu.edu/~jls/misoftwa.html

 as cited in the R help pages, still exists (though apparently still
 dating from 1999), only the Windows versions of NORM, CAT, MIX and PAN
 are still accessible. The link to the Unix versions:

 S-PLUS for Unix: _http://www.stat.psu.edu/~jls/splunix.html

 no longer leads anywhere (though it still did only a few years ago).
 
 Its archived here:
 http://web.archive.org/web/20050212143644/http://www.stat.psu.edu/
 ~jls/splunix.html

Thanks, Gabor!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jul-09   Time: 14:27:26
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Conditional Logit

2009-07-16 Thread Ted Harding
On 16-Jul-09 19:40:20, Noah Silverman wrote:
 Hello,
 I'm brand new to using R.  (I've been using Rapid Miner, but would
 like to move over to R since it gives me much more functionality.)
 
 I'm trying to learn how to do a conditional logit model.
 
 My data has one dependent variable, 2 independent variables and a 
 group variable.
 
 example:
 
 classv1v2group
 sick.3.71
 well.2.91
 sick.25   .41
 sick.32   .62
 well.21  .812
 sick.5.3 2
 
 Can someone help me formulate the proper command for the clogit()
 function?
 Thanks!

Have a look at:

http://finzi.psych.upenn.edu/R/library/survival/html/clogit.html

You would need to install the 'survival' package.

To search for this sort of thing in future, go to R Site Search at:

  http://finzi.psych.upenn.edu/nmz.html

and search according to the available options. For your particular
query, I unchecked R-help 2008- and Task views, leaving only
Functions, and entered clogit into the search box. Up came
various hits, amongst which the one suggested above.

Welcome to R, and happy hunting in the future!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-09   Time: 20:54:18
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predictive punishment module (was: Re: Simple cat statem

2009-07-15 Thread Ted Harding
On 15-Jul-09 14:45:26, S Ellison wrote:
 
 Duncan Murdoch murd...@stats.uwo.ca 15/07/2009 15:04:29 
 * R has a new predictive punishment module.  It punishes you for
 things it knows you will do later.
 Dang! Does that mean it'll return errors for the statistical idiocy
 I haven't yet got around to committing? 
 
 ;-)
 
 Steve E

It is even more twisted and sadistic than that.

Being an Expert System on Human Psychology (how else could it succeed
so well in putting the user on the wrong foot), it knows that the user
will interpret the punishment as being related to the correct thing
that the user just did, even though the punishment is for the thing
that will be done later. Users will therefore develop an inhibition
in respect of correct actions, and will be driven into incorrect
behaviour.

The incorrect thing that will be done later will be punished anyway.
Since the complement to {an incorrect thing} is
  {correct thing} union {all the other incorrect things}
the user will end up
a) Confused
b) Increasingly likely to do incorrect things, therefore increasingly
   likely to be punished, finally being punished on almost all
   occasions of action (predictively or not).

So, now we know ... :(
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jul-09   Time: 16:21:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mann-whitney U-test - U-value

2009-07-11 Thread Ted Harding
On 11-Jul-09 18:13:29, Martin Batholdy wrote:
 Hi,
 I know that I can perform a Mann-Whitney U test with wilcox.test(x,y)
 But instead of an U-value I get a W-Value.
 Is it possible to change this?

The usual definition of the Man-Whitney U test for two samples x
of size m) and y (of size n) is that

  U = number of pairs (x[i],y[j]) such that X[i]  Y[j]

For R's wilcox.text(X,Y), '?wilcox.test' says:

  R's value can also be computed as the number of all pairs '(x[i],
  y[j])' for which 'y[j]' is not greater than 'x[i]', the most
  common definition of the Mann-Whitney test.

In other words, it is the complement of the definition of U above.
Hence the standard U statistic can be obtained from R's W as

  U = m*n/2 - W

since m*n/2 is the total number of possible pairs.

(In my view, R is being a bit idiosyncratic there).
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 11-Jul-09   Time: 20:04:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear regression and testing the slope

2009-07-08 Thread Ted Harding
On 08-Jul-09 12:29:40, evrim akar wrote:
 Dear All,
 First of all I would like to say I do not have much knowledge
 about this subject, so most of you can find it really easy.
 I am doing a linear regression and I want to test if the slope
 of the curve is 0. R gives the summary statistics:
 
 Call:
 lm(formula = x ~ s)
 
 Residuals:
   Min1QMedian3Q   Max
 -0.025096 -0.020316 -0.001203  0.011658  0.044970
 
 Coefficients:
  Estimate Std. Error t value Pr(|t|)
 (Intercept)  0.005567   0.016950   0.3280.750
 s   -0.001599   0.002499  -0.6400.538
 
 Residual standard error: 0.02621 on 9 degrees of freedom
 Multiple R-squared: 0.04352,Adjusted R-squared: -0.06276
 F-statistic: 0.4095 on 1 and 9 DF,  p-value: 0.5382
 
 what is this t-value for? The explanation in the help file was
 unfortunately not clear to me. How can I test my hypotheses that
 if the slope is 0?
 
 Thank you in advance,
 regards,
 Evrim

The quantity 't' is the estimated value (-0.001599 for the slope 's')
divided by its estimated standard error (0.002499). Taking the values
as reported by the summary:

  t = -0.001599/0.002499 = -0.639856

which R has reported (to 3 significant figures) as -0.640

The Pr(|t|) is the probability, assuming the null hypothesis that
the slope (coefficient of 's') is zero, that data could arise at random
giving rise to a t-value which, in absolute value, would exceed the
absolute value |t| = |-0.639856| = 0.639856 which you got from your
data.

The relevance of this for testing the hypothesis that the slope is 0
is that, if the slope really is 0, then large values (either way) of
the coefficient of 's' (reported by R as Estimate) are unlikely.
So if you got a value of Pr(|t|) which was small (conventionally
less that 0.05, or 0.01, etc.) then you would have a value so large
that getting a value at least as large as this if the hypothesis
were true would be unlikely. Therefore it would be more plausible
that the null hypothesis was false.

In your case, the P-value Pr(|t|) = 0.538, so you would be more
likely than not to get an estimate at least as deviant from 0 as the
one you did get, if the null hypothesis were true. Hence the data do
not provide grounds for rejecting the null hypothesis.

Note that not having grounds for rejection does not mean that you
must accept it: a non-signifcant t-value is not proof that the
null hypothesis is true.

There is a good basic outline of the t-test in the Wikipedia article
Student's t-test:

  http://en.wikipedia.org/wiki/Student%27s_t-test

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jul-09   Time: 14:17:52
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Uncorrelated random vectors

2009-07-07 Thread Ted Harding
Be careful to be clear what you are referring to when you say
correlation is zero.

The commands
  x - rnorm(100)
  y - rnorm(100)
will produce two vectors of given length (100) which (to within the
effectively ignorable limitations of the ransom number generator)
will have been produced independently. Hence the *theoretical*
correlation is zero. If that is what you meant, then it is already
answered. When you compute cor(x,y), however, the answer will in
general be non-zero (though only rarely significantly so). This is
because the numbers produced by the second independent run of rnrom()
have an almost zero probability of producing two vectors for which
the value of cor(x,y) = 0.

However, possibly you mean that you want two vectors for which the
result of cor(x,y) = 0. One way to achieve this is along the following
lines:

  set.seed(54321)
  x  - rnorm(100)
  y0 - rnorm(100)
  My - mean(y0)
  Sy - sd(y0)
  y1 - lm(y0 ~ x)$res ; y1 - y1/sd(y1)
  y  - My + y1*Sy

  mean(y0)
  # [1] 0.04497584
  mean(y)
  # [1] 0.04497584
  sd(y0)
  # [1] 0.848231
  sd(y)
  # [1] 0.848231

  cor(x,y0)
  # [1] 0.05556468
  cor(x,y)
  # [1] 6.451072e-18 [effectively 0, to within rounding error]

In this case, however, note that
[A]: The 100 elements of y, given the values of x, are NOT independent
of each other, since they satisfy the linear constraint

  x[1]*y[1] + x[2]*y[2] + ... + x[100]*y[100]
  - (sum(x))*[y[1] + y[2] + ... + y[100])/100 = 0

and therefore can vary only in 99 dimensions, not 100. Nor are they
independent of the values of x (even though the numerical correlation
is 0).
On the other hand, the values of y0 are independent of the values
of x, and of each other.

You need to be very clear why you want to have two vectors x,y
such that cor(x,y) = 0, since otherwise you are at risk of carrying
out an invalid analysis.

Hoping this helps,
Ted.



On 07-Jul-09 14:26:02, Stein, Luba (AIM SE) wrote:
 Thank you for your help!
 But is it possible to produe two vectors x and y with a given length
 such that there correlation is zero.
 
 For me ist not enough just to simulate two vectors with there
 correlation.
 
 Thank you,
 Luba 
 
 
 
 
 -Urspr?ngliche Nachricht-
 Von: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be] 
 Gesendet: Dienstag, 7. Juli 2009 15:51
 An: Stein, Luba (AIM SE); r-help@r-project.org
 Betreff: RE: [R] Uncorrelated random vectors
 
 cor.test(rnorm(1), rnorm(1)) 
 
 
 
 
 ir. Thierry Onkelinx
 ~ Roger Brinner
 
 The combination of some data and an aching desire for an answer does
 not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey
 
 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Stein, Luba (AIM SE)
 Verzonden: dinsdag 7 juli 2009 15:46
 Aan: r-help@r-project.org
 Onderwerp: [R] Uncorrelated random vectors
 
 Hello,
 
 is it possible to create two uncorrelated random vectors for a given
 distribution.
 
 In fact, I would like to have something like the function rnorm or
 rlogis with the extra property that they are uncorrelated.
 
 Thanks for your help,
 Luba


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jul-09   Time: 16:48:06
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test for X=1 fails, test for 0 works, data in text file

2009-07-07 Thread Ted Harding
On 07-Jul-09 19:06:59, Mark Knecht wrote:
 Hi,
I am apparently not understanding some nuance about either the use
 of subset or more likely my ability to test for a numerical match
 using '='. Which is it? Thanks in advance.

It looks as though you have tripped over the distinction between =
and ==. The first is (in effect) an assignment operator (like -).
The second is a comparison operator: X==Y is TRUE if X equals Y.

So, for example, to select rows from MyResults according to your
criteria below, you could do something like:

  MyResultsNeg - MyResults[(MyResults$PosType==(-1)),]
  MyResultsPos - MyResults[(MyResults$PosType==1   ),]

[Note: the parentheses (...) are in fact logically superfluous,
but I like to use them (a) to make it absolutely clear, visually,
how things are being evaluated; (b) (not relevant in this particular
context) to avoid falling into traps like N - 10 ; X - 1:N-1
which results in X == (0:9) = (1:10) - 1. The correct syntax for
the latter would be X - 1:(N-1).]

Hoping this helps,
Ted.


I've read a data file, reshaped it and then created MyResults by
 keeping only lines where the value column is greater than 0. So far so
 good. The data in MyResults looks good to me by eye.
 
The problem comes in when I try to further subset MyResults into
 two files, one with PosType=1 and the other with PosType=-1. Looking
 at the dimension of the results there's no change. It didn't work
 which is verified by eye. However if I test for PosType=1 by actually
 testing for PosType0 then it does work.
 
Is this the general case in R that if I've read data that was
 written into a csv file as 1 - it is 1 if I look into the file with a
 text editor - that I cannot test for that? Or is some case where I
 need to use maybe as.numeric or something else first to ensure R sees
 the number the way I'm thinking about the number?
 
 Cheers,
 Mark
 
 dim(X)
 [1]  25 425

 ReShapeX - melt(X, id = c(Trade, PosType, EnDate, EnTime, 
 ExDate,  ExTime, PL_Pos, Costs, Save2))

 dim(ReShapeX)
 [1] 1040011

 MyResults - subset(ReShapeX, value  0)

 dim(MyResults)
 [1] 1105   11

 MyResults.GroupA - subset(MyResults, PosType = 1)

 dim(MyResults.GroupA)
 [1] 1105   11

 MyResults.GroupB - subset(MyResults, PosType = -1)

 dim(MyResults.GroupB)
 [1] 1105   11

 MyResults.GroupA - subset(MyResults, PosType  0)

 dim(MyResults.GroupA)
 [1] 432  11

 MyResults.GroupB - subset(MyResults, PosType  0)

 dim(MyResults.GroupB)
 [1] 673  11

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jul-09   Time: 20:28:44
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] r-project.org address blacklisted by anti-spam software

2009-07-07 Thread Ted Harding
On 07-Jul-09 21:59:41, Hans W Borchers wrote:
 Dear List:
 An e-mail mentioning the r-project.org address and sent to a friend
 at a German university was considered spam by the local spam filter.
 
 Its reasoning: the URL r-project.org is blacklisted at
 uribl.swinog.ch resp.
 at antispam.imp.ch. I checked the list

 http://antispam.imp.ch/swinog-uri-rbl.txt  [caution: long list]
 
 and indeed, there it was. Can anybody explain how or why the R project
 made its way onto the list? Is it reasonable to file a protest?
 --Hans Werner

This is clearly a very undiscriminating blacklist. My own ISP
(zen.co.uk, highly respected) is listed! I am reminded of the
scandalous blacklist maintained by SORBS ( http://www.us.sorbs.net )
-- now happily, it seems, about to die -- whose motive seemed to be
to blacklist on the slightest pretext in order to demand exSORBitant
fees for removal ($25 per *reported* spam). At one time, the UK
academic community subscribed to SORBS, with the effect that I was
unable to email to any UK academic institution because my IP address,
dynamically allocated on dial-up by British Telecom, was balcklisted
by SORBS. I found a work-round, but ... . It seems the UK academic
community dropped SORBS, and now similar abandonment (see at above URL)
may prompt the welcome closure of SORBS.

General information about this blacklist can be found by going one up
on your URL to: http://antispam.imp.ch/

Your German will be better than mine, so you will be better placed
to try to find out what may be involved in getting r-project.org
de-listed.

Another suggestion is to ask your friend to try to get his University
to abandon antispam.imp.ch altogether, on the grounds that it is
preventing a perfectly honest domain (with importance for the work
of the University) from communicating with people in the University.

And the best of luck. In my experience, sometimes the administrators
of computing resources can be less interested in supporting their users
than in ticking the boxes handed down by their management -- who,
I hope, may evolve to thinking outside the box, going forward.[*]

Good luck!
Ted.
[*] With acknowledgements to Lucy Kellaway:
http://www.ft.com/cms/s/0/3cb5f0fa-8965-11dc-b52e-779fd2ac.html

See also:
http://www.ft.com/cms/s/4238cace-b6f2-11dc-aa38-779fd2ac.html
http://www.ft.com/cms/s/48e4569a-cb56-11dc-97ff-77b07658.html
and, last but not least:
http://www.ft.com/cms/s/7880e774-99f1-11dc-ad70-779fd2ac.html




E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jul-09   Time: 00:18:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hartigan's Dip test

2009-07-06 Thread Ted Harding
On 06-Jul-09 12:39:29, James Allsopp wrote:
 Hi,
 I just got a value for the dip test out of my data of 0.074 for a
 sample size of 33. I'm trying to work out what this actually means
 though?
 Could someone help me relate this to a p-value?
 
 Thanks
 James

Have a look at

  http://finzi.psych.upenn.edu/R/library/diptest/html/qDiptab.html

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jul-09   Time: 13:50:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Computer Modern

2009-07-02 Thread Ted Harding
On 02-Jul-09 09:06:44, Mr Derik wrote:
 I am trying to use computer modern fonts in postscript files
 for a latex document. Ultimately I want to automate this through
 sweave. I've read the documentation ans have tried the following
 code to use lattice to produce a graph using computer modern:
 
 library(lattice)
 library(grid)
 testPlot=(
 xyplot(seq(1:10) ~ seq(1:10),
 main=one to ten,
 xlab=the quick fox,
 ylab=jumped over the lazy brown dog,
 xlim=c(0,1),
 ylim=c(0,1),
 col=black,
 type=l ,
 lwd=2
 )
 )
 setwd(C:\\R_folder\\CMtests)
 postscript(cm_test.eps, width = 4.0, height = 3.0,
horizontal = FALSE, onefile = FALSE, paper = special,
family = ComputerModern, encoding = TeXtext.enc)
print(testPlot)
 dev.off() 
 
 This produces a plot with courier.
 
 I am using R 2.9.0 on a windows XP machine. I did manage to produce
 one plot with CM as the font so I know it's possible with my set up.
 I can't get back to that. Please help me with the code.
 Thank You

I think you may need to also use the fonts pAramater to postscript().
See in '?postscript':

  fonts: a character vector specifying additional R graphics font
family names for font families whose declarations will be
included in the PostScript file and are available for use
with the device. See 'Families' below.  Defaults to 'NULL'.

Since the Computer Modern family is most probably not built in
to your printer, the PostScript file will need to include font
definitions for these fonts. If I understand aright, this is what
would be achieved by appropriate use of the fonts parameter.

If the font definitions are not included, the calls for them will
not be recognised by the printer which may then substitute a default
(likely to be Courier).

See also the section TeX fonts in '?postscript'.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-09   Time: 11:59:29
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)^2

2009-07-02 Thread Ted Harding
Putting your two queries together [see revised Subject ... ]:

[R] (no subject)^1:

  Could you please help me?
  I am trying to load an csv-file in R, but it works wrong!
  My data is

   0,0127
  -0,0016
   0,0113
   0,0037
  -0,0025

   
   Ret
X0 X0127
  1  016
  2  0   113
  3  037
  4  025

In this case, you need to use the header=FALSE option to read.csv():
  Ret-read.csv(Ret.csv, header=FALSE)
since the default for read.csv() is header=TRUE, so it assigns
names to the variables (col1 and col2) according to what is in the
first line. Since your first line had numeric data, it appends these
to X. Read what you get from

  ?read.csv

[R] (no subject)^2

  Hi Guys,
  It is very simple question, but I can't find the answer! Please
  help me.
  I use R and such simple function as length() doesn't work. The
  result is always 1 even if my data are more then 1 observations!
  Do I have to load any additional library?
   length(Ret_1)
[1] 1
   length
function (x)  .Primitive(length)
  Thank you!!!

I surmise that Ret_1 is the result of a command similar to your
  Ret-read.csv(Ret.csv)
Hence it is a dataframe.

If I use your Ret-read.csv(Ret.csv) command with your Ret.csv
data, I get, as you did:

  Ret
  #   X0 X0127
  # 1  016
  # 2  0   113
  # 3  037
  # 4  025

Here, Ret is a dataframe with two columns. In fact, a dataframe
is a special kind of list, one element for each column:

  str(Ret)
  # 'data.frame':   4 obs. of  2 variables:
  #  $ X0   : int  0 0 0 0
  #  $ X0127: int  16 113 37 25

and each $ is one element of the list: $X0 and $X0127. Each element
is a vector of numbers in the case of your Ret.

You can see what they are separately by using the $ operator
to extract them:

  Ret$X0
  # [1] 0 0 0 0
  Ret$X0127
  # [1]  16 113  37  25

So Ret is a list with 2 elements. Hence:

  length(Ret)
  # [1] 2

Therefore, in the case of your Ret_1, I surmise that your Ret_1
is a list with one element (as Sarah Goslee surmised also).

In other words, if you constructed Ret_1 by reading from a CSV
file, as in
  Ret-read.csv(Ret.csv)
you will have a dataframe with 1 column, namely a list with one
element.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-09   Time: 15:49:56
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Name of data.frame as a text string?

2009-07-02 Thread Ted Harding
On 02-Jul-09 19:00:44, Mark Knecht wrote:
 I've passed a data.frame as an input into a function which does some
 plotting on a 4x4 matrix of certain parameters within in the
 data.frame. I'd like to add a small header on top of each plot with
 the name of the data.frame so that it's clear as I compare these 16
 things where each on came from.
 
 So far I haven't found the right way to get the name of the data.frame
 as a string which I can use in something like mtext. Is there one? If
 I put dummy text in, or take the time to pass in the name by hand,
 then I do get titles just as I'd like, but I'd far rather let the name
 of the data.frame speak for itself.
 
 Thanks,
 Mark

One way to do this (which is how I usually do it) is to set up the
dataframe name as a string variable, and then use this as required.

For instance:

  datafr - My1stDF
  DF - read.csv(paste(datafr,.csv,sep=)
  {do things}
  plot(whatever,main=paste(Data from, datafr),...)

which will read the CSV file whose name corresponds to what you have
set 'datafr' to, and then put a corresponding header into the plot.

This is a particularly useful technique if you want to loop through
several datasets, on the lines of

  DataFrs - c(My1stDF, My2ndDFD, My3rdDF, My4thDF)
  for( datafr in DataFrs ) {
{stuff like the above}
  }

Several variants of this kind of approach are possible!

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jul-09   Time: 20:21:43
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting in logistic model

2009-06-30 Thread Ted Harding
 yield identical results.

In fact, a bit more probing to GLM shows that there is already
a discrepancy at the score level:

  S0 - GLM$linear.predictors
  max(abs(S0-S))
  # [1] 8.881784e-16

so S0 has not been calculated by applying GLM$coef to X. Also, if
we apply the inverse link to S0, then:

  max(abs(Fit0 - exp(S0)/(1+exp(S0
  # [1] 1.110223e-16

which is the same discrepancy as between Fit1 and Fit0.

  max(abs(Fit1 - exp(S0)/(1+exp(S0
  # [1] 1.110223e-16

the same again!! 

Hence, if I have understood him aright, Fabrizio's question.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jun-09   Time: 16:44:56
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I change which R Graphics Device is active?

2009-06-30 Thread Ted Harding
On 30-Jun-09 16:48:15, Barry Rowlingson wrote:
 To keep track, call dev.cur() after creating a new plot
 and assign it to something memorable. Here's how to keep
 a histogram and an xy plot:
 
 dev.new();histPlot = dev.cur()
 dev.new();xyPlot = dev.cur()
 dev.set(histPlot);hist(runif(100))
 X11cairo
2
 dev.set(xyPlot);plot(1:10)
 X11cairo
3
 
 You just have to remember to call dev.set before your particular plot.
 
 Barry

Barry, spot on! If there were a prize for postings with
simplicity, clarity, conciseness and usability, I would nominate
yours (especially in the Why didn't *I* think of that? category).

Cheers,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jun-09   Time: 18:33:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting in logistic model

2009-06-30 Thread Ted Harding

On 30-Jun-09 17:41:20, Marc Schwartz wrote:
 On Jun 30, 2009, at 10:44 AM, Ted Harding wrote:
 

 On 30-Jun-09 14:52:20, Marc Schwartz wrote:
 On Jun 30, 2009, at 4:54 AM, Renzi Fabrizio wrote:

 I would like to know how R computes the probability of an event
 in a logistic model (P(y=1)) from the score s, linear combination
 of x and beta.

 I noticed that there are differences (small, less than e-16) between
 the fitting values automatically computed in the glm procedure by R,
 and the values manually computed by me applying the reverse
 formula p=e^s/(1+e^s); moreover I noticed  that the minimum value
 of the fitting values in my estimation is 2.220446e-16, and there
 are many observation with this probability (instead the minimum
 value obtained by manually estimation is 2.872636e-152).

 It would be helpful to see at least a subset of the output from your
 model and your manual computations so that we can at least visually
 compare the results to see where the differences may be.

 The model object returned from using glm() will contain both the
 linear predictors on the link scale (model$linear.predictors) and
 the fitted values (model$fitted.values). The latter can be accessed
 using the fitted() extractor function.

 To use an example, let's create a simple LR model using the infert
 data set as referenced in ?glm.

 model1 - glm(case ~ spontaneous + induced, data = infert,
  family = binomial())

 model1
 Call:  glm(formula = case ~ spontaneous + induced,
 family = binomial(), data = infert)

 Coefficients:
 (Intercept)  spontaneous  induced
 -1.7079   1.1972   0.4181

 Degrees of Freedom: 247 Total (i.e. Null);  245 Residual
 Null Deviance:316.2
 Residual Deviance: 279.6  AIC: 285.6

 # Get the coefficients
 coef(model1)
 (Intercept) spontaneous induced
  -1.7078601   1.1972050   0.4181294

 # get the linear predictor values
 # log odds scale for binomial glm
 head(model1$linear.predictors)
   1   2   3   4   5
 6
  1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564
 0.32560375

 You can also get the above by using the coefficients and the model
 matrix for comparison:
 # the full set of 248
 # coef(model1) %*% t(model.matrix(model1))
 head(as.vector(coef(model1) %*% t(model.matrix(model1
 [1]  1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564
 0.32560375

 # get fitted response values (predicted probs)
 head(fitted(model1))
 1 2 3 4 5 6
 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

 We can also get the fitted values from the linear predictor values
 by using:

 LP - model1$linear.predictors
 head(exp(LP) / (1 + exp(LP)))
1 2 3 4 5 6
 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

 You can also get the above by using the predict.glm() function with
 type == response. The default type of link will get you the  
 linear
 predictor values as above. predict.glm() would typically be used to
 generate predictions using the model on new data.

 head(predict(model1, type = response))
 1 2 3 4 5 6
 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

 In glm.fit(), which is the workhorse function in glm(), the fitted
 values returned in the model object are actually computed by using
 the inverse link function for the family passed to glm():

 binomial()$linkinv
 function (eta)
 .Call(logit_linkinv, eta, PACKAGE = stats)
 environment: 0x171c8b10

 Thus:
 head(binomial()$linkinv(model1$linear.predictors))
 1 2 3 4 5 6
 0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

 So those are the same values as we saw above using the other methods.
 So, all is consistent across the various methods.

 Perhaps the above provides some insights for you into how R does some
 of these things and also to point out, as is frequently the case with
 R, there is more than one way to get the same result.

 HTH,
 Marc Schwartz

 An interesting and informative reply, Marc; but I think it misses
 the point of Fabrizio's query. I think Fabrizio's point is the
 following:

  set.seed(54321)
  X - sort(rnorm(100))
  a0 - 1.0 ; b0 - 0.5
  Y - 1*(runif(100)  a0 + b0*X)
  GLM - glm(Y~X,family=binomial)
  C - GLM$coef
  C
  # (Intercept)   X
  #2.7269662.798429
  a1 - C[1] ; b1 - C[2]
  Fit0 - GLM$fit
  S - a1 + b1*X
  Fit1 - exp(S)/(1+exp(S))
  max(abs(Fit1 - Fit0))
  # [1] 1.110223e-16

 This discrepancy is of course, in magnitude, a typical rounding
 error. But the fact that it occurred indicates that when glm()
 computed the fitted values it did not do so by using the fitted
 coefficients GLM$coef, then creating the fitted score (linear
 predictor) S (as above), then applyhing to this the inverse link
 exp(S)/(1+exp(S)), since doing that would replicate

Re: [R] read.csv, header=TRUE but skip the first line

2009-06-28 Thread Ted Harding
On 28-Jun-09 21:05:59, Mark Knecht wrote:
 Hi,
 Complete newbie to R here. Just getting started reading manuals and
 playing with data.
 
 I've managed to successfully get my *.csv files into R, however I
 have to use header=FALSE because the real header starts in line #2.
 The file format looks like:
 
 PORTFOLIO EQUITY TABLE
 
 TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS
 PL,DRAWDOWN,DRAWDOWN(%),EQUITY
 
 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,10,124.00
 2,1,1/14/2004 9:03:00 AM,-86.00,As
 Given,1,-86.00,86.00,0.849,10,038.00
 3,1,1/14/2004 11:51:00 AM,-226.00,As
 Given,1,-226.00,312.00,3.082,9,812.00
 4,1,1/15/2004 12:57:00 PM,134.00,As
 Given,1,134.00,178.00,1.758,9,946.00
 
 where the words PORTFOLIO EQUITY TABLE make up line 1, the rest of
 the text is on line 2, and then the lines starting with numbers are
 the real data. (Spaces added by me for email clarity only.)
 
 If I remove the first line by hand then I can use header=TRUE and
 things work correctly, but it's not practical for me to remove the
 first line by hand on all these files every day.
 
 I'd like to understand how I can do the read.csv but skip the first
 line. Possibly read the file, delete the first line and then send it
 to read.csv, or some other way?
 
 Thanks in advance,
 Mark

Simply use the option skip=1, as opposed to the default skip=0.
This then skips the first line of the file and only starts reading
at line 2. With header=TRUE (which is the default for read.csv()
anyway), the first line read in (i.e. line 2 of the file) will be
taken as the header, and the remainder as data.

You should read what it output by

  ?read.csv

One thing that may be tricky for a beginner to get their head round
is that this is *really* the help page for read.table(), and
that read.csv() is in fact a front end for read.table() with
different defaults.

In particular, whereas read.table() has default header=FALSE,
read.csv() has default header=TRUE. Also, of course, where
read.table() has sep= (i.e. white space), read.csv() has sep=,.

Other options for read.csv() which are not mentioned specifically
in the usage line for read.csv() (i.e. are subsumed in ...)
are the same as options mentioned in the usage line for read.table()
and have the same defaults.

So, implicitly, skip is an option for read.csv() just as it is
for read.table(), and it has the same default, namely skip=0.
So you can set it to skip=1 just as you can for read.table()
and it will work in the same way.

This is stated in ?read.csv to be:
skip: integer: the number of lines of the data file to skip before
  beginning to read data.

This is potentially misleading because of the final word data,
since a beginner might think this referred to the real data part
of the file (i.e. what follows the header), when header=TRUE as
in read.csv().

More explicitly, it could be written:

skip: integer: the number of lines of the data file to skip before
  beginning to read the lines in the file.

Hoping this helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jun-09   Time: 22:37:46
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA with means and SDs as input

2009-06-25 Thread Ted Harding
On 25-Jun-09 10:15:30, Sebastian Stegmann wrote:
 Dear R-community,
 I'm struggling with a paper that reports only fragmented results
 of a 2by2by3 experimental design. However, Means and SDs for all
 cells are given.
 
 Does anyone know a package/function that helps computing an ANOVA
 with only Means and SDs as input (resulting in some sort of effect
 size and significance test)?

No hope of that, unless you also have the numbers (N) of observations
in each cell (which of course may be equal across cells). You can get
a mean and an SD with anything from N=2 upwards.

If you do have the Ns, then one way is to construct an artificial
sample for each cell, such that the mean and the SD of each is equal
to the given mean and SD for the cell. Then you can submit the
resulting reconstructed data to a standard lm() in R. and carry
on from there.

The basic construct is:
Let a given cell have N data, mean=M, SD=S.

  X0 - rnorm(N)
  X - M + S*(X0 - mean(X0))/sd(X0)

Assuming you are going to do a standard normal ANOVA, the result
of operating with the above artificial data is algebraically the
same as if you had the true original data.

Hoping this helps!
Ted.

 The reason why I'm interested is simple: I'm conducting a
 meta-analysis. If I included only what the authors would like to
 show the world, the results would be somewhat biased...
 
 I've combed through the web and my various books on R, but it's not
 that easy to find. Or is it?
 
 Thanks for your help!
 Sebastian


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 25-Jun-09   Time: 11:51:38
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to draw a line in plot when I know the start point(x

2009-06-25 Thread Ted Harding
On 25-Jun-09 18:38:37, Marc Schwartz wrote:
 On Jun 25, 2009, at 1:30 PM, Lesandro wrote:
 Hello all,
 How to draw a line in plot when I know the start point(x1,y1)
 and end point(x2,y2)? I need make this as additional information
 in the graph:

 plot(wl2[[1]],wl2[[2]])

 I think that is possible make this with the function abline(), is  
 possible? I looked the function lines() too, but don't understand
 as make.

 Thanks!
 Lesandro
 
 See ?segments which does just what you are looking for.
 
 lines() is more designed for a series of connected lines (eg. a  
 polygon) rather than a single line segment.
 
 abline() can draw a straight line, at a given vertical or horizontal  
 position, or if given a linear model object, the fitted line.
 HTH,
 Marc Schwartz

Hmm ... for this particular purpose I don't see what is wrong with

  plot(wl2[[1]],wl2[[2]])
  lines(c(x1,x2),c(y1,y2))

along with any additional paramaters to lines() for line-type,
colour, etc. -- I do this all the time ...
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 25-Jun-09   Time: 19:51:20
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to change ONLY the first character of each variable

2009-06-18 Thread Ted Harding
On 18-Jun-09 23:24:39, Mark Na wrote:
 Dear R-helpers,
 I would like to adapt the following code
 
 names(data)-sub(M,MOLE,names(data))
 
 which changes any occurrence of M (in my variable names) to MOLE
 such that it ONLY operates on the first character of each variable
 name, i.e. M will only be changed to MOLE if it's the first character
 of a variable.
 
 I would appreciate any help you might provide. Thanks!
 
 Mark Na

  M - MATMAN
  sub(^M,MOLE,M)
  # [1] MOLEATMAN
  AM - AMATMAN
  sub(^M,MOLE,AM)
  # [1] AMATMAN

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 19-Jun-09   Time: 00:46:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a proposal regarding documentation

2009-06-14 Thread Ted Harding
On 14-Jun-09 22:23:24, Gabor Grothendieck wrote:
 In PHP and also in MySQL the manual has a wiki capability
 so that users can add notes at the end of each page, e.g.
 
 http://www.php.net/manual/en/functions.variable-functions.php
 
 http://dev.mysql.com/doc/refman/4.1/en/update.html
 
 That would combine documentation and wiki into one. Here it would
 involve copying the R help pages into the wiki in a readonly mode
 with the writeable wiki portion at the end of each such page. It
 would also be necessary to be able to update the help pages in the
 wiki when new versions became available.

That is an interesting and potentially very fruitful idea. I'd like
to propose a quasi-readonly mode in which wiki users could interpolate
the comments at chosen points within the original text of a help-page
(but without any alteration of the original), with the interpolation
clearly marked by some kind of highlighting (e.g. change of font or
colour).

In that way, a user reading through the original help-page on the
wiki would encounter, in context and in timely fashion, comments
by other users who had themselves met difficulties at that point
and who (hopefully) have provided more detailed guidance and/or
links to other parts of the documentation which illuminate or
fill-in what was not obvious in the first place. With, preferably,
the option to back up to where they were before, if they are
led to some other help page.

Ted.

 No explicit email group or coordination would be needed. It would
 also address the organization problem as they could be organized
 as they are now, i.e. into packages: base, stats, utils, ...
 
 It would require the development of a program to initially copy
 the help pages and to update them while keeping the notes in place
 whenever a new version of R came out.
 
 On Sun, Jun 14, 2009 at 5:35 PM, Peter
 Flompeterflomconsult...@mindspring.com wrote:
 I certainly don't have anything against the WIKI, but I think that the
 documentation
 is where the action is, especially for newbies. _It's the _natural
 first step
 when you want to learn about a function or when you _get an error
 message you
 don't understand.

 Peter

 Peter L. Flom, PhD
 Statistical Consultant
 www DOT peterflomconsulting DOT com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jun-09   Time: 23:41:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issues getting R to write image files

2009-06-11 Thread Ted Harding
On 11-Jun-09 09:24:42, Tobias Verbeke wrote:
 Hi Kenny,
 
 Have spent the last couple of days learning R and shell scripting to
 do
 batch plotting jobs. I have had success getting R to complete a filled
 contour plot and output to a file (.jpg or .tiff etc). However, when I
 try
 to do the same thing with the simple plot command the script seems to
 execute correctly yet there is no output. Below is my R code: 
 
 file - Sys.getenv(input_file) 
 tiff(paste( file, tiff, sep=.))
 z - read.table(file) 
 plot(z, type=l, xlim=range(0.6,2), col = red, plot.title =
 title(main =
 file, xlab = Wavelength (um), ylab = Intensity (arb.)) 
 
 dev.off()
 
 q() 
 
 You need to close the tiff graphics device you opened
 using dev.off() before quitting.
 
 HTH,
 Tobias

I thought of that too -- since the graphics device needs to be
closed before writing out to the file is completed and the file
is closed.

However, it occurred to me that possibly q() would also have that
effect, since it closes down R which should have the effect of
closing devices, flushing buffers, and closing files (though I do
not see this documented under ?q).

So I experimented.

1. New R session.
2. Assign values to some variables.
3. Open a tiff() device, plot them, and quit R (no dev.off):

  tiff(file=temp.tif)
  plot (X,P,  type=l)
  lines(X,I.b, col=blue )
  lines(X,I.m, col=green)
  lines(X,I.bm^2,  col=red)
  q()

4. End of R session, and temp.tif (which did not exist at the start)
contains a good TIFF file with exactly what I expected to see. This
confirmed my suspicions.

So it sould seem that dev.off() is not the answer.

Probably something is wrong along the line of reading in the data,
or in specifying what to plot. But I can't see anything obvious in
the code, so it may depend on what sort of structure z is, for instance.

Ted.



E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 11-Jun-09   Time: 10:43:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Postcript font size

2009-06-11 Thread Ted Harding
On 11-Jun-09 22:16:22, Andre Nathan wrote:
 On Thu, 2009-06-11 at 19:09 -0300, Andre Nathan wrote:
 Is there a way to specify a font size that is consistent among calls
 to plot()?
 
 To put it correctly: is there a way to specify a font size that is
 consistent among calls to postcript()?
 
 All the fonts in the same file have the same size, but comparing files,
 even when par(ps = ...) or postcript(pointsize = ...) are specified
 the same way, the fonts have different sizes.
 
 Best regards,
 Andre

How are you comparing?
A: Looking into the PostScript code in the files, and identifying
   the font-sizes where they are set in the code?
B: Comparing apparent font sizes when looking at two plots on screen?
C: Comparing apparent font sizes when looking at plots as printed
   on paper (and, in this case, have the plots been included in an
   enacapsulated document as EPS graphics)?

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 11-Jun-09   Time: 23:47:59
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ridiculous behaviour printing to eps: labels all messed

2009-06-08 Thread Ted Harding
 to postscript() or to dev.copy2eps() (which accepts the
same arguments as postscript()).

4. Zeljko's first suggestion (to use postscript() in the first place,
rather than dev.copy2eps() after plotting) is probably not going to
be useful, since you would have to set useKerning=FALSE in postscript()
anyway, and you can equally well set it in dev.copy2eps().

5. Zeljko's second suggestion (of using one-letter labels) is also
unlikely to be useful, and could be dangerous (e.g. if you used a
as a label, and it also occurred in a text string in the diagram
and the latter was set by (in) show ... (a) show ... or the like).
Two-letter names might be safer, e.g. AA, BB, ... , since it
would be unusual for such pairs to be kerned (though not impossible,
since PostScript allows quite arbitrary things to be done with text,
if the user wants it to happen, or the application thinks it would
be a Good Thing). But, in any case, the desirable feedback from seeing
exp1 and exp2 (or whatever) on the plot would then be missing.

I would be strongly inclined to the useKerning=FALSE solution
in this case.

6. I'm not a LaTeX user, so can't comment much on psfrag. However,
I have just read its documentation The PSFrag System at
http://www.tug.org/teTeX/tetex-texmfdist/doc/latex/psfrag/pfgguide.ps

wherein Section 8 (Common mistakes, known problems, and bugs)
discusses some of the issues that can arise from the fact that
psfrag looks in the EPS file for an exact match for the tags.
Executive Summary: Beware!

7. This is exercise for my hobby-horse. If the EPS graphic produced
by R is, as it stands, adequate for your purposes then leave it
alone and simply import it into your document.

If it is not, then my approach (almost invariably) is to extract
from R the numerical data which define the graphic, and use these
data in an independent document-creation package (which could be
LaTeX, perhaps, though not in my case) to re-create the graphic
from scratch, and then include any embellishments (such as
equations, etc.) which the independent software can do properly,
when R can not do it properly or at all.

An example of this can be seen at

  http://www.zen89632.zen.co.uk/R/EM_Aitken/em_aitken.pdf

which is a little expository document about Aitken Acceleration
of iterative processes, as applied to the EM algorithm.

The computations were done in R, and the resulting data were used
to draw the figures externally (in fact within the document itself).
All three figures therein have some mathematical symbolism, the
second one on Page 2 particularly. The full R code is also given.

LaTeX-ers: Can you make such figures with the same details in them,
without fiddling with an R-generated EPS (i.e. from scratch, using
data from R)? I hope so!

For what it's worth -- the software used was GNU groff (with the
'pic' preprocessor to create the figures). But that's just me.
Dinosaurs do not easily digest organisms more recently evolved ...

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jun-09   Time: 14:38:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re flect Back to Back Histograms in x-axis?

2009-06-08 Thread Ted Harding
On 08-Jun-09 13:11:03, sedm1000 wrote:
 I've looked long and hard for this, but maybe I am missing something...
 
 There is a nice module that displays histograms reflected in the
 y axis, i.e.
 http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=136
 
 but is it possible to reflect in the x-axis, so to have two datasets,
 one pointing up and one down rather than left and right? I haven't
 been able to find a way to plot this way using the guide, or elsewhere.
 
 Thanks...
 -- 

Try along the lines of:

  set.seed(54321)
  X1 - rnorm(500)
  X2 - rnorm(500)
  H1 - hist(X1,breaks=0.5*((-8):8),ylim=c(-100,100))
  H1 - hist(X1,breaks=0.5*((-8):8),ylim=c(-100,100),col=red)
  H2 - hist(X2,breaks=0.5*((-8):8),plot=FALSE)
  H3-H2 ; H3$counts - -H3$counts
  plot(H3,add=TRUE,col=blue)

NB: The choice of breaks and ylim was made after a preliminary
inspection of hist(X1) and hist(X2).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jun-09   Time: 15:50:48
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Journal Articles that Have Used R

2009-06-07 Thread Ted Harding
On 07-Jun-09 10:56:25, Gabor Grothendieck wrote:
 Try this:
 site:journal.sjdm.org filetype:R

When I enter that into Google, I got only the following two hits:

  #
  #!/usr/bin/Rscript --vanilla # input is a pre-made list of files ...
  #!/usr/bin/Rscript --vanilla # input is a pre-made list of files ending
  in html called ../htmlist # (see below). This is easily modified. ...
  journal.sjdm.org/RePEc/rss/rss.R - Cached - Similar pages

  #
  #!/usr/bin/Rscript --vanilla --verbose # script to convert RePEc ...
  #!/usr/bin/Rscript --vanilla --verbose # script to convert RePEc-style
  rdf files (ReDIFF) to DOAJ-type xml files # usage: oai.R [file] # where
  [file] is a ...
  journal.sjdm.org/RePEc/rss/oai.R - Cached - Similar pages

none of which is what Jonathan os looking for (and the Similar pages
links are a waste of time).

In regexp language, what he is looking for is

  http://journal.sjdm.org/[0:9]+/*.R

of which there are several instances on the site, for example

  http://journal.sjdm.org/8210/

shows

   jdm8210.html13-Dec-2008 1
   jdm8210.pdf 13-Dec-2008 11:18   102K
   jdm8210.tex 13-Dec-2008 11:18   27K
   jdm8210001.gif  09-Dec-2008 14:38   11K
   probs.R 09-Dec-2008 14:37   1.0K
   test.R  23-May-2008 05:46   251
   ttest.csv   22-May-2008 21:31   2.6K1:1831K

so there are two .Rfiles there (8210 is the number of an article
in the Journal). Other similar directories mAy or may not have
.R files -- for example
  http://journal.sjdm.org/8816/
has none.

The problem is that utilities like wget won;t work in this case,
since HTTP doesn't accept wild cards, unlike FTP; but the journal
site doesn't accept FTP ... !!

It's an intriguing problem, and I'm seeking advice amongst my Linux
acquaintances about it. I sonehow doubt that there is a solution ...

Ted.
 On Sat, Jun 6, 2009 at 6:39 PM, Jonathan Baronba...@psych.upenn.edu
 wrote:
 I also use R to redraw figures for the journal I edit (below), when
 the authors cannot produce usable graphics (about 50% of the author
 who try).

 Unfortunately, I cannot find a way to search for just the R files.
 They are all http://journal.sjdm.org/*/*.R
 where * is the number of the article. _But Google, to my knowledge
 will not deal with wildcards like this.

 Jon
 --
 Jonathan Baron, Professor of Psychology, University of Pennsylvania
 Home page: http://www.sas.upenn.edu/~baron
 Editor: Judgment and Decision Making (http://journal.sjdm.org)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jun-09   Time: 12:37:34
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Journal Articles that Have Used R

2009-06-07 Thread Ted Harding
.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 07-Jun-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 12:37:34
 -- XFMail --

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jun-09   Time: 15:01:10
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Journal Articles that Have Used R

2009-06-07 Thread Ted Harding
On 07-Jun-09 14:01:14, Ted Harding wrote:
 I think the reason Google will not find it is that, in the Journal
 website, the R files (and the names of the article directories that
 might contain them, such as journal.sjdm.org/8210/ -- see below)
 are not directly pointed to by any index.html or any
 href ... /href in the website, as far as I can see. This would
 be why 'wget' cannot find them in HTTP mode, and it would prevent
 Google being led to them.
 [... ... ...]
 This all seems to be overkill, however! Much easier if the site
 would accept FTP.
 Ted.

Or, as Jonathan has now done, the R files were pointed to by links within
the web-page! :)
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jun-09   Time: 15:41:49
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Journal Articles that Have Used R

2009-06-06 Thread Ted Harding
On 06-Jun-09 18:34:21, Jakson Alves de Aquino wrote:
 Jason Rupert wrote:
 Is there a way to get a reference list of journal articles that have
 used R?  
 
 I am just looking for some examples of R graphs and presentation of
 results where R was used to generate the results. 
 
 Did you try Google Scholar:
 
 http://scholar.google.com.br/scholar?q=R%3A+A+Language+and+Environment+f
 or+Statistical+Computing
 
 It has cited by links.

A very good suggestion! An even better oone [:)] is to take out the .br:

http://scholar.google.com/scholar?q=R%3A+A+Language+and+Environment+
for+Statistical+Computing

(158,000 hits of which, on briefly scanning through the first 40,
about half seem to be hournal articles; draw you own conclusions).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jun-09   Time: 19:50:57
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contrasts

2009-06-03 Thread Ted Harding
On 03-Jun-09 17:50:38, Ricardo Arias Brito wrote:
 Hi all,
 I want t use a contrasts in adjusted regression (lm()) for a factor
 with four levels, compared the coefficients first with second, first
 with third, ..., third with fourth.
 Someone can help with a contrast matrix to be used. I tried and not yet
 I obtained.
 
 Say:
 
 y - rnorm(50)
 x -  cut(rnorm(50, mean=y, sd=0.25),c(-3,-1.5,0,1.5,3))
 reg -  lm(y ~ x, contrasts = list(x=contr.sum))
 
 Thank you,
 Ricardo

I am not sure if you have written what you meant!

If you mean to contrast first level with second, second with third,
and third with fourth (which is not what you wrote), then what you
want is contr.sdif in the MASS package.

However, what you wrote suggests that you want to obtain all possible
pairs of levels:
  1st with 2nd, 1st with 3rd, 1st with 4th,
  2nd with 3rd, 2nd with 4th, 3rd with 4th

This is not feasible at the level of model fitting (which is where
any matrix such as you ask for would be used), since you can
only have 3 linearly independent contrasts between 4 levels.
In this case you would be asking for 6 contrasts, so 3 of them
would be linear combinations of the 3 which were fitted in the
model.

If you use contr.sdif, you get 3 of them (1st with 2nd, 2nd with 3rd,
3rd with 4th) straight off. The other three can then be derived
subsequently from these 3 via differences. The model fit should
allow you to obtain the variance-covariance matrix of the three
which were fitted. Then you can derive the variance-covariance
matrix (and hence the SEs) of your derived contrasts.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jun-09   Time: 19:34:59
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behavior when reading csv - line wraps

2009-05-31 Thread Ted Harding
,
   the word kempten starts the next line:
 39;167757703;12;10.309295;47.724545;21903...@n00
 ;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavar
 ia;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss
 ;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kup
 pel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europ
 aeischeunion;germanio;

   What could be the reason?

   I ws thinking about solving the issue by using a different
   separator, that I would use for the first 7 fields and
   concatenating all of the remaining values into a single
   stirng
   value, but could not figure out how to do such a
substitution in
   R. Unfortunately, on my system I cannot specify a range for
sed...

   Thanks for any help/pointers
   Martin

   __
   R-help@r-project.org mailto:R-help@r-project.org
mailto:R-help@r-project.org mailto:R-help@r-project.org
mailing list

   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.htmlhttp://www.r-pro
   ject.org/posting-guide.html
http://www.r-project.org/posting-guide.html
   http://www.r-project.org/posting-guide.html

   and provide commented, minimal, self-contained,
reproducible code.




--Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?





 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?



 --
 Martin Tomko
 Postdoctoral Research Assistant   Geographic Information Systems
 Division
 Department of Geography
 University of Zurich - Irchel
 Winterthurerstr. 190
 CH-8057 Zurich, Switzerland

 email:  martin.to...@geo.uzh.ch
 site:   http://www.geo.uzh.ch/~mtomko
 mob:+41-788 629 558
 tel:+41-44-6355256
 fax:+41-44-6356848


 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-May-09   Time: 16:24:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behavior when reading csv - line wraps

2009-05-30 Thread Ted Harding
;400d;rabbitri
 otnet;

where the last values occurs on the next line in the data frame.

It does not have to be the last value, as in the follwong example,
the word kempten starts the next line:
39;167757703;12;10.309295;47.724545;21903...@n00
 ;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavar
 ia;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss
 ;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kup
 pel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europ
 aeischeunion;germanio;

What could be the reason?

I ws thinking about solving the issue by using a different
separator, that I would use for the first 7 fields and
concatenating all of the remaining values into a single stirng
value, but could not figure out how to do such a substitution in
R. Unfortunately, on my system I cannot specify a range for sed...

Thanks for any help/pointers
Martin

__
R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.or
g/posting-guide.html
http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?



 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-May-09   Time: 21:15:13
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular point pattern on multi-segment line

2009-05-29 Thread Ted Harding
On 29-May-09 11:23:15, Pascal LAFFARGUE wrote:
 I need your help for the following purpose :
 
 I would like to create regularly spaced points on a multisegment
 line (a 'psp class' object).
 
 A function of the spatstat library( = pointsOnLines() ) is dedicated
 to that objective but the problem is that the probability of falling
 on a particular segment is proportional to the length of the segment.
 
 What about a solution using the whole multi-segment line and not 
 individual segments for points distribution ?
 
 Many thanks in advance for your help !
 
 Plaff

If I understand you right, one fairly obvious approach would be the
following.

1. Let L be a vector of lengths of segments: L[i] is the length
   of segment i (say there are K segments).

2. Use X - sort(sum(L)*runif(N)) to place N points uniformly
   on (0,sum(L))

3. Let Lcum - c(0,cumsum(L))

4. whichsegs - cut(X,breaks=Lcum,labels=FALSE)

Then whichsegs is a vector of N integers, from 1:K, which give the
segments into whch the N points in X fall.

So, for(i in (1:K)), extract the values of X[whichsegs==i],
subtract Lcum[i] from each, and then place the points at the
resulting distances along segment i.

I think this should work (not tested in detail)!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-May-09   Time: 13:14:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about using a remote system

2009-05-28 Thread Ted Harding
On 28-May-09 00:58:17, Erin Hodgess wrote:
 Dear R People:
 I would like to set up a plug-in for Rcmdr to do the following:
 
 I would start on a Linux laptop.  Then I would log into another
 outside system and run a some commands.
 
 Now, when I tried to do
 system(ssh e...@xxx.edu)
 password xx
 
 It goes to the remote system.  how do I continue to issue commands
 from the Linux laptop please?
 
 (hope this makes sense)
 thanks,
 Erin

Hi Erin,
If I understand what you want, I think you may be asking a bit
too much from R itself.

When you, Erin, are interacting with your Linux laptop you can
switch between two (or more) different X windows or consoles,
on one of which you are logged in to the remote machine and on
which you issue commands to the remote machine, and can read
its output and, maybe, copy this to a terminal running commands
on your Linux laptop.

When you ask R to log in as you describe, you are tying that
instance of R to the login. I don't know of any resources within
R itself which can emulate your personal behaviour. Though maybe
others do know ...

However, there is in principle a work-round. You will need to
get your toolbox out and get to grips with Unix plmubing;
and, to make it accessible to R, you will need to create a
Unix plumber.

The basic component is the FIFO, or named pipe. To create
one of these, use a Linux command like

  mkfifo myFIFO1

Then 'ls -l' in the directory you are working in will show this
pipe as present with the name myFIFO1. For example, if in one
terminal window [A] you start the command

  cat myFIFO1

then in another [B] you

  echo Some text to test my FIFO  myFIFO1

you will find that this text will be output in the [A] window
by the 'cat' command.

Similarly, you could 'mkfifo myFIFO2' and write tc this while
in the [A] window, and read from it while in the [B] window.

So, if you can get R to write to myFIFO1 (which is being read
from by another program which will send the output to a remote
machine which that program is logged into), and read from myFIFO2
which is being written to by that other program (and that is
easy to do in R, using connections), then you have the plumbing
set up.

But, to set it up, R needs to call in the plumber. In other words,
R needs to execute a command like

  system(MakeMyFIFOs)

where MakeMyFIFOs is a script that creates myFIFO1 and myFIFO2,
logs in to the remote machine, watches myFIFO1 and sends anything
it reads to the remote machine, watches for any feedback from
the remote machine and then writes this into myFIFO2.

Meanwhile, back in R, any time you want R to send something to
the remote machine you write() it (or similar) to myFIFO1, and
whenever you want to receive something from the remote machine
you readLines() it (or similar) from myFIFO2.

That's an outline of a possible work-round. Maybe other have solved
your same problem in a better way (and maybe there's an R package
which does just that ... ). If so, I hope they'll chip in!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-09   Time: 07:33:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do I get to be a member?

2009-05-27 Thread Ted Harding
On 27-May-09 18:09:30, Michael Menke wrote:
 Information, please.

By subscribing your email address to the list.
Visit the R-help info page at:

  https://stat.ethz.ch/mailman/listinfo/r-help

and read what it has to say in general about R-help. Near the bottom
is a section Subscribing to R-help which tells you what to do.

Please note that it is *email addresses*, not persons, which are
subscribed, so (if you have more than one email address ) you should
use the one which you have subscribed when you want to post to the list.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 27-May-09   Time: 20:35:02
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Animal Morphology: Deriving Classification Equation with

2009-05-24 Thread Ted Harding
 a reaction to
what your data say.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-May-09   Time: 20:07:43
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [correction] Animal Morphology: Deriving Classification Equation with

2009-05-24 Thread Ted Harding
[Apologies -- I made an error (see at [***] near the end)]

On 24-May-09 19:07:46, Ted Harding wrote:
 [Your data and output listings removed. For comments, see at end]
 
 On 24-May-09 13:01:26, cdm wrote:
 Fellow R Users:
 I'm not extremely familiar with lda or R programming, but a recent
 editorial review of a manuscript submission has prompted a crash
 course. I am on this forum hoping I could solicit some much needed
 advice for deriving a classification equation.
 
 I have used three basic measurements in lda to predict two groups:
 male and female. I have a working model, low Wilk's lambda, graphs,
 coefficients, eigenvalues, etc. (see below). I adjusted the sample
 analysis for Fisher's or Anderson's Iris data provided in the MASS
 library for my own data.
 
 My final and last step is simply form the classification equation.
 The classification equation is simply using standardized coefficients
 to classify each group- in this case male or female. A more thorough
 explanation is provided:
 
 For cases with an equal sample size for each group the classification
 function coefficient (Cj) is expressed by the following equation:
 
 Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp
 
 where Cj is the score for the jth group, j = 1 … k, cjo is the
 constant for the jth group, and x = raw scores of each predictor.
 If W = within-group variance-covariance matrix, and M = column matrix
 of means for group j, then the constant   cjo= (-1/2)CjMj (Julia
 Barfield, John Poulsen, and Aaron French 
 http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm).
 
 I am unable to navigate this last step based on the R output I have.
 I only have the linear discriminant coefficients for each predictor
 that would be needed to complete this equation.
 
 Please, if anybody is familiar or able to to help please let me know.
 There is a spot in the acknowledgments for you.
 
 All the best,
 Chase Mendenhall
 
 The first thing I did was to plot your data. This indicates in the
 first place that a perfect discrimination can be obtained on the
 basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated
 to WG, WT, ID, SEX):
 
   d.csv(horsesLDA.csv)
   # names(D0) # WRMA_WG  WRMA_WT  WRMA_ID  WRMA_SEX
   WG-D0$WRMA_WG; WT-D0$WRMA_WT;
   ID-D0$WRMA_ID; SEX-D0$WRMA_SEX
 
   ix.M-(SEX==M); ix.F-(SEX==F)
 
   ## Plot WT vs ID (M  F)
   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
   points(ID[ix.M],WT[ix.M],pch=+,col=blue)
   points(ID[ix.F],WT[ix.F],pch=+,col=red)
   lines(ID,15.5-1.0*(ID))
 
 and that there is a lot of possible variation in the discriminating
 line WT = 15.5-1.0*(ID)
 
 Also, it is apparent that the covariance between WT and ID for Females
 is different from the covariance between WT and ID for Males. Hence
 the assumption (of common covariance matrix in the two groups) for
 standard LDA (which you have been applying) does not hold.
 
 Given that the sexes can be perfectly discriminated within the data
 on the basis of the linear discriminator (WT + ID) (and others),
 the variable WG is in effect a close approximation to noise.
 
 However, to the extent that there was a common covariance matrix
 to the two groups (in all three variables WG, WT, ID), and this
 was well estimated from the data, then inclusion of the third
 variable WG could yield a slightly improved discriminator in that
 the probability of misclassification (a rare event for such data)
 could be minimised. But it would not make much difference!
 
 However, since that assumption does not hold, this analysis would
 not be valid.
 
 If you plot WT vs WG, a common covariance is more plausible; but
 there is considerable overlap for these two variables:
 
   plot(WG,WT)
   points(WG[ix.M],WT[ix.M],pch=+,col=blue)
   points(WG[ix.F],WT[ix.F],pch=+,col=red)
 
 If you plot WG vs ID, there is perhaps not much overlap, but a
 considerable difference in covariance between the two groups:
 
   plot(ID,WG)
   points(ID[ix.M],WG[ix.M],pch=+,col=blue)
   points(ID[ix.F],WG[ix.F],pch=+,col=red)
 
 This looks better on a log scale, however:
 
   lWG - log(WG) ; lWT - log(WT) ; lID - log(ID)
## Plot log(WG) vs log(ID) (M  F)
   plot(lID,lWG)
   points(lID[ix.M],lWG[ix.M],pch=+,col=blue)
   points(lID[ix.F],lWG[ix.F],pch=+,col=red)
 
 and common covaroance still looks good for WG vs WT:
 
   ## Plot log(WT) vs log(WG) (M  F)
   plot(lWG,lWT)
   points(lWG[ix.M],lWT[ix.M],pch=+,col=blue)
   points(lWG[ix.F],lWT[ix.F],pch=+,col=red)
 
 but there is no improvement for WG vs IG:
 
   ## Plot log(WT) vs log(ID) (M  F)
   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
   points(ID[ix.M],WT[ix.M],pch=+,col=blue)
   points(ID[ix.F],WT[ix.F],pch=+,col=red)

[***]
The above is incorrect! Apologies. I plotted the raw WT and ID
instead of their logs. In fact, if you do plot the logs:

  ## Plot log(WT) vs log(ID) (M  F)
  plot(lID,lWT)
  points(lID[ix.M],lWT[ix.M],pch=+,col=blue)
  points(lID[ix.F],lWT[ix.F],pch=+,col=red)

you now get what looks like much closer agreement between

Re: [R] Animal Morphology: Deriving Classification Equation with

2009-05-24 Thread Ted Harding
 on this forum hoping I could solicit some much needed
 advice for deriving a classification equation.
 
 I have used three basic measurements in lda to predict two groups:
 male and female. I have a working model, low Wilk's lambda, graphs,
 coefficients, eigenvalues, etc. (see below). I adjusted the sample
 analysis for Fisher's or Anderson's Iris data provided in the MASS
 library for my own data.
 
 My final and last step is simply form the classification equation.
 The classification equation is simply using standardized coefficients
 to classify each group- in this case male or female. A more thorough
 explanation is provided:
 
 For cases with an equal sample size for each group the
 classification
 function coefficient (Cj) is expressed by the following equation:
 
 Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp
 
 where Cj is the score for the jth group, j = 1 … k, cjo is the
 constant for the jth group, and x = raw scores of each predictor.
 If W = within-group variance-covariance matrix, and M = column matrix
 of means for group j, then the constant   cjo= (-1/2)CjMj (Julia
 Barfield, John Poulsen, and Aaron French 
 http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm)
 .
 
 I am unable to navigate this last step based on the R output I have.
 I only have the linear discriminant coefficients for each predictor
 that would be needed to complete this equation.
 
 Please, if anybody is familiar or able to to help please let me know.
 There is a spot in the acknowledgments for you.
 
 All the best,
 Chase Mendenhall
 
 The first thing I did was to plot your data. This indicates in the
 first place that a perfect discrimination can be obtained on the
 basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated
 to WG, WT, ID, SEX):
 
   d.csv(horsesLDA.csv)
   # names(D0) # WRMA_WG  WRMA_WT  WRMA_ID  WRMA_SEX
   WG-D0$WRMA_WG; WT-D0$WRMA_WT;
   ID-D0$WRMA_ID; SEX-D0$WRMA_SEX
 
   ix.M-(SEX==M); ix.F-(SEX==F)
 
   ## Plot WT vs ID (M  F)
   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
   points(ID[ix.M],WT[ix.M],pch=+,col=blue)
   points(ID[ix.F],WT[ix.F],pch=+,col=red)
   lines(ID,15.5-1.0*(ID))
 
 and that there is a lot of possible variation in the discriminating
 line WT = 15.5-1.0*(ID)
 
 Also, it is apparent that the covariance between WT and ID for Females
 is different from the covariance between WT and ID for Males. Hence
 the assumption (of common covariance matrix in the two groups) for
 standard LDA (which you have been applying) does not hold.
 
 Given that the sexes can be perfectly discriminated within the data
 on the basis of the linear discriminator (WT + ID) (and others),
 the variable WG is in effect a close approximation to noise.
 
 However, to the extent that there was a common covariance matrix
 to the two groups (in all three variables WG, WT, ID), and this
 was well estimated from the data, then inclusion of the third
 variable WG could yield a slightly improved discriminator in that
 the probability of misclassification (a rare event for such data)
 could be minimised. But it would not make much difference!
 
 However, since that assumption does not hold, this analysis would
 not be valid.
 
 If you plot WT vs WG, a common covariance is more plausible; but
 there is considerable overlap for these two variables:
 
   plot(WG,WT)
   points(WG[ix.M],WT[ix.M],pch=+,col=blue)
   points(WG[ix.F],WT[ix.F],pch=+,col=red)
 
 If you plot WG vs ID, there is perhaps not much overlap, but a
 considerable difference in covariance between the two groups:
 
   plot(ID,WG)
   points(ID[ix.M],WG[ix.M],pch=+,col=blue)
   points(ID[ix.F],WG[ix.F],pch=+,col=red)
 
 This looks better on a log scale, however:
 
   lWG - log(WG) ; lWT - log(WT) ; lID - log(ID)
 ## Plot log(WG) vs log(ID) (M  F)
   plot(lID,lWG)
   points(lID[ix.M],lWG[ix.M],pch=+,col=blue)
   points(lID[ix.F],lWG[ix.F],pch=+,col=red)
 
 and common covaroance still looks good for WG vs WT:
 
   ## Plot log(WT) vs log(WG) (M  F)
   plot(lWG,lWT)
   points(lWG[ix.M],lWT[ix.M],pch=+,col=blue)
   points(lWG[ix.F],lWT[ix.F],pch=+,col=red)
 
 but there is no improvement for WG vs IG:
 
   ## Plot log(WT) vs log(ID) (M  F)
   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
   points(ID[ix.M],WT[ix.M],pch=+,col=blue)
   points(ID[ix.F],WT[ix.F],pch=+,col=red)
 
 So there is no simple road to applying a routine LDA to your data.
 
 To take account of different covariances between the two groups,
 you would normally be looking at a quadratic discriminator. However,
 as indicated above, the fact that a linear discriminator using
 the variables ID  WT alone works so well would leave considerable
 imprecision in conclusions to be drawn from its results.
 
 Sorry this is not the straightforward answer you were hoping for
 (which I confess I have not sought); it is simply a reaction to
 what your data say.
 
 Ted.
 
 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax

Re: [R] Product of 1 - probabilities

2009-05-21 Thread Ted Harding
On 21-May-09 14:15:08, Mark Bilton wrote:
 I am having a slight problem with probabilities.
 
 To calculate the final probability of an event p(F), we can take the
 product of the chance that each independent event that makes p(F) will
 NOT occur.
 So...
 p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) )
 
 If the chance of an event within the product occurring remains the
 same, we can therefore raise this probability to a power of the number
 of times that event occurs.
 e.g. rolling a dice p(A) = 1/6 of getting a 1...
 p(F) = 1 - (1- (1/6))^z 
 p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at
 least once' in z number of rolls.
 
 So then to R...
 
 if p(A) = 0.01; z = 4; p(F) = 0.039
 
 obviously p(F)  p(A)
 
 however the problem arises when we use very small numbers e.g. p(B) = 1
 * 10^-30
 R understands this value
 However when you have 1-p(B) you get something very close to 1 as you
 expect...but R seems to think it is 1.
 So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything
 equals 1 so p(F) = 0 and not just close to zero BUT zero.
 It doesn't matter therefore if z = 1*10^1000, the answer is still zero
 !!
 
 Obviously therefore now p(F)  p(B)
 
 Is there any solution to my problem, e.g.
 - is it a problem with the sum (-) ? ie could I change the number of
 bits the number understands (however it seems strange that it can hold
 it as a value close to 0 but not close to 1)
 -or should I maybe use a function to calculate the exact answer ?
 -or something else
 
 Any help greatly appreciated
 Mark

The problem is not with sum(), but with the fact that R adopts the
exact value 1 for (1-x) if (on my machine) x  10^(-16). This is to
do with the number of binary digits used to store a number. Your
10^(-30) is much smaller than that!

  x - 10^(-30)
  x
# [1] 1e-30
  y - (1-x) ; 1-y
# [1] 0
  x - 10^(-15)
  y - (1-x) ; 1-y
# [1] 9.992007e-16
  x - 10^(-16)
  y - (1-x) ; 1-y
# [1] 1.110223e-16
  x - 10^(-17)
  y - (1-x) ; 1-y
# [1] 0

Read ?.Machine and, in particular, about:
  double.eps: the smallest positive floating-point number 'x'
such that '1 + x != 1'.  It equals 'base^ulp.digits' if
either 'base' is 2 or 'rounding' is 0;  otherwise, it is
'(base^ulp.digits) / 2'.  Normally '2.220446e-16'.

There is no cure for this in R, but you could try to work round it
(depending on the details of your problem) by using an approximation
to the value of the expression.

For instance, if x is very small (say 10^(-20)), and n is not very
large, then to first order

  (1 - x)^n = 1 - n*x

or, if X is a vector of very small numbers,

  prod((1 - X)) = 1 - sum(X)

Either of these may still result in 1 if what is being subtracted
is less than .Machine$double.eps.

However, in the particular type of application you describe,

  1 - (1 - x)^n = n*x to first order
  1 - prod((1 - X)) = sum(X) to first order

You will just have to work out what will give you a sensible answer.
Basically, you are trying to operate beyond the limits of precision
of R, and so will need to re-cast your question in alternative terms
which will yield an adequately precise result.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-May-09   Time: 15:59:47
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Product of 1 - probabilities

2009-05-21 Thread Ted Harding
On 21-May-09 14:45:20, Gabor Grothendieck wrote:
 There are several arbitrary precision packages available: gmp (an
 interface to the GNU multi-precision library on CRAN) and
 bc (an R interface to the bc arbitrary precision calculator):
 http://r-bc.googlecode.com
 
 There are also packages providing R interfaces to two computer
 algebra systems and they both support not only arbitrary
 precision but also exact calculation:
 
 http://rsympy.googlecode.com
 http://ryacas.googlecode.com
 
 library(bc)
 1 - (1-10^-75)^10
 [1] 0
 bc(1 - (1-10^-75)^10)
 [1]
 .00
 000100

Thanks for pointing this out, Gabor! In particular, bc is something
I often use (but on its own). It is good to know that R can connect
with it.

However, while your call above will indeed return an adequate answer
for 1 - (1-10^-75)^10, and can be assigned numerically by, say,

  w - as.numeric(bc(1 - (1-10^-75)^10))

this still does not let Mark off the hook of verifying that he will,
in the end, get a sensible answer. If this calculation were part of
a more elaborate computation in which, say, some formula called for
(1-w), then that would still evaluate to 1, with possible dire results.

While plain algebraic expressions can easily be evaluated in bc, its
repertoire of mathematical functions is limited. No doubt the other
mathematical packages you mention have much richer resources; but for
anything complicated requiring their use I would probably go for
working externally with such a package. However, that may be a naive
view, since I have not tried the R interfaces!

Nevertheless, I would still strongly recommend doing some analysis
of the expressions in a computation when limits of precision are an
issue, since often a simple expression can be obtained which, in R,
would give adequate results.

Ted.


 On Thu, May 21, 2009 at 10:15 AM, Mark Bilton mcbil...@hotmail.com
 wrote:

 I am having a slight problem with probabilities.

 To calculate the final probability of an event p(F), we can take the
 product of the chance that each independent event that makes p(F) will
 NOT occur.
 So...
 p(F) = 1- ( (1-p(A)) * (1-p(B)) * (1-p(C))...(1-p(x)) )

 If the chance of an event within the product occurring remains the
 same, we can therefore raise this probability to a power of the number
 of times that event occurs.
 e.g. rolling a dice p(A) = 1/6 of getting a 1...
 p(F) = 1 - (1- (1/6))^z
 p(F) = 1 - (1-p(A))^z tells us the probabiltity of rolling a 1 'at
 least once' in z number of rolls.

 So then to R...

 if p(A) = 0.01; z = 4; p(F) = 0.039

 obviously p(F)  p(A)

 however the problem arises when we use very small numbers e.g. p(B) =
 1 * 10^-30
 R understands this value
 However when you have 1-p(B) you get something very close to 1 as you
 expect...but R seems to think it is 1.
 So when I calculate p(F) = 1 - (1-p(B))^z = 1 to the power anything
 equals 1 so p(F) = 0 and not just close to zero BUT zero.
 It doesn't matter therefore if z = 1*10^1000, the answer is still zero
 !!

 Obviously therefore now p(F)  p(B)

 Is there any solution to my problem, e.g.
 - is it a problem with the sum (-) ? ie could I change the number of
 bits the number understands (however it seems strange that it can hold
 it as a value close to 0 but not close to 1)
 -or should I maybe use a function to calculate the exact answer ?
 -or something else

 Any help greatly appreciated
 Mark
 -





 _


 _ _ _ _[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-May-09   Time: 16:39:16
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop avoidance and logical subscripts

2009-05-21 Thread Ted Harding
On 21-May-09 16:56:23, retama wrote:
 Patrick Burns kindly provided an article about this issue called
 'The R Inferno'. However, I will expand a little bit my question
 because I think it is not clear and, if I coud improve the code
 it will be more understandable to other users reading this messages
 when I will paste it :)
 
 In my example, I have a dataframe with several hundreds of DNA
 sequences in the column data$sequences (each value is a long string
 written in an alphabet of four characters, which are A, C, T and G).
 I'm trying to know parameter number of Gs plus Cs over the total 
 [G+C/(A+T+C+G)] in each sequence. In example, data$sequence [1] is
 something like AATTCCCGG but a little bit longer, and, its G+C
 content is 0.69 . I need to compute a vector with all G+C contents
 (in my example, in data$GCsequence, in which data$GCsequence[1] is
 0.69).
 
 So the question was if making a loop and a combination of values with
 c() or cbind() or with logical subscripts is ok or not. And which
 approach should produce better results in terms of efficiency (my
 script goes really slow).
 
 Thank you,
 Retama

Perhaps the following could be the basis of your code for the bigger
problem:

  S - unlist(strsplit(AATTCCCGG,))
  S
#  [1] A A T T C C C G G G G G G
  (sum((S==C)|(S==G)))
# [1] 9
  (sum((S==C)|(S==G)))/length(S)
# [1] 0.6923077

You could build a function on those lines, to evaluate what you
want for any given string; and then apply() it to the elements
(which are the separate character strings) of data$sequences
(which is presumably a vector of character strings).

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 21-May-09   Time: 18:18:24
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] postscript problems (landscape orientation)

2009-05-21 Thread Ted Harding
On 21-May-09 23:02:28, David Scott wrote:
 Well most people deal with that problem by not using Acrobat to
 read .pdf files. On linux you can use evince or xpdf. On windows
 just use gsview32. Those readers don't lock the .pdf.
 
 I am with Peter and generally go straight to pdf these days. The only 
 reason for going through postscript is if you want to use psfrag.
 
 David Scott

Going off at a tangent to the original query, I would say that
this is too limited a view! For one thing, PostScript (in its
Encasulated manifestation) is readily imported and re-scalable
by software which does not import PFF. Also, PS is an editable
plain-text file (even though there may be chunks in hexadecimal
for some graphics -- but it's still ASCII). One thing which I
quite often do is edit the %%BoundingBox:  line to improve the
framing of the graphic -- and viewing it in ghostscript with
watch mode on as one edits, one can easily adjust things to
a satisfactory visual effect.

If you know what you are doing, you can if you wish move things
around, or add or delete things (especially bits of text) by
using any plain-text editor on the PostScript file.

Finally (though this may be a symptom of serious masochsim on
my part), if I download a PDF in which the author has plotted
the data, after I print to file in PostScript from Acrobat
Reader I can usually obtain a very close approximation to the
original data values by extracting the PS coordinates of the
plotted points (and axis tick-marks) from the PostScript file.

The only reason for going through postscript is if you want
to use psfrag -- or psnup and/or psbook or ... 

Ted.



E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 22-May-09   Time: 00:31:32
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round function seems to produce maximum 2 decimals

2009-05-20 Thread Ted Harding
On 20-May-09 20:10:15, Glenn E Stauffer wrote:
 I am trying to use round()to force R to display a specific number
 of decimals, but it seems to display =2 decimals no matter what I
 specify in the digits argument. As an alternative I tried signif(),
 but it also produces unexpected results. See example code and results
 below.
 Format() works, but then the result no longer is numeric. Am I missing
 something simple?
 I am using R 2.9.0 on Windows XP. 
 Thanks,
 Glenn
 
#code
 h=12345.16711
 h
 
 round(h,digits=1)
 round(h,digits=2)
 round(h,digits=3)
 round(h,digits=4)
 round(h,digits=5)
 
 signif(h,digits=9)
 
 format(h,nsmall=4)
 
#results
 h=12345.16711
 h
 [1] 12345.17
 round(h,digits=1)
 [1] 12345.2
 round(h,digits=2)
 [1] 12345.17
 round(h,digits=3)
 [1] 12345.17
 round(h,digits=4)
 [1] 12345.17
 round(h,digits=5)
 [1] 12345.17
 signif(h,digits=9)
 [1] 12345.17
 
 format(h,nsmall=4)
 [1] 12345.1671

What you're missing is that when you do (e.g.)

  h - 12345.16711
  round(h,digits=4)
# [1] 12345.17

what is displayed ([1] 12345.17) is not the result of round(),
but what the result of round() is to be displayed as given the
options digits=7 (default) for the number of *significant figures*
in the display of stored values.

To see the result as it is stored, you should use print() with
the appropriate number of disgits specified:

  print( round(h,digits=5),10)
# [1] 12345.16711
  print( round(h,digits=4),10)
# [1] 12345.1671
  print( round(h,digits=3),10)
# [1] 12345.167
  print( round(h,digits=2),10)
# [1] 12345.17

Internally, round(h) is correctly stored:

  h4 - round(h,4)
  h - h4
# [1] 1e-05
  h3 - round(h,3)
  h - h3
# [1] 0.00011
  h2 - round(h,2)
  h - h2
# [1] -0.00289

To illustrate the influence of the display option digits=7:

  h-45.16711
  h
# [1] 45.16711
  round(h,digits=4)
# [1] 45.1671
   round(h,digits=3)
# [1] 45.167
  round(h,digits=2)
# [1] 45.17

  h-345.16711
  h
# [1] 345.1671
   round(h,digits=4)
# [1] 345.1671
   round(h,digits=3)
# [1] 345.167
   round(h,digits=2)
# [1] 345.17

  h-2345.16711
  h
# [1] 2345.167
  round(h,digits=4)
# [1] 2345.167
   round(h,digits=3)
# [1] 2345.167
  round(h,digits=2)
# [1] 2345.17

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 20-May-09   Time: 22:54:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what is wrong with this code?

2009-05-19 Thread Ted Harding
On 19-May-09 19:52:20, deanj2k wrote:
 dlogl -
 -(n/theta)-sum((y/(theta)^2)*((1-exp(y/theta))/(1+exp(y/theta)))
 
 d2logl - (n/theta^2) -
 sum((-2y/theta^3)*(1-exp(y/theta))/(1+exp(y/theta)))
 - sum(((2*y/theta^4)*exp(y/theta))/((1+exp(y/theta))^2))
 
 returns the error message:
 Error: unexpected symbol in:
 dlogl -
 -(n/theta)-sum((y/(theta)^2)*((1-exp(y/theta))/(1+exp(y/theta)))
 d2logl
 
 do you know what i have done wrong

The error message strongly suggests that the line beginning d2logl -
is being seen as a continuation of the preceding line.

Counting parentheses, I find that you are 1 short of what is required
to complete the expression in the line beginning -(n/theta).

In that case, R will continue on to the next line seeking the completion,
and will encounter d2logl non-syntactically.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 19-May-09   Time: 22:12:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Concatenating two vectors into one

2009-05-18 Thread Ted Harding
On 18-May-09 11:09:45, Henning Wildhagen wrote:
 Dear users, 
 a very simple question: 
 
 Given two vectors x and y
 
 x-as.character(c(A,B,C,D,E,F))
 y-as.factor(c(1,2,3,4,5,6))
 
 i want to combine them into a single vector z as A1, B2, C3 and so on.
 
 z-x*y is not working, i tried several others function, but did not
 get to the solution.
 
 Thanks for your help,
 Henning

And a very simple solution! Use paste():

  x-as.character(c(A,B,C,D,E,F))
  y-as.factor(c(1,2,3,4,5,6))
  paste(x,y)
  # [1] A 1 B 2 C 3 D 4 E 5 F 6
  paste(x,y,sep=)
  # [1] A1 B2 C3 D4 E5 F6

Ted.
PS: 'x*y' will attempt to perform a numerical multiplication.
This cannot work for character vectors.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 18-May-09   Time: 12:23:56
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Output of binary representation

2009-05-17 Thread Ted Harding
I am interested in studying the binary representation of numerics
(doubles) in R, so am looking for possibilities of output of the
internal binary representations. sprintf() with format a or A
is halfway there:

  sprintf(%A,pi)
# [1] 0X1.921FB54442D18P+1

but it is in hex. 

The following illustrate the sort of thing I want:

1.1001 0010 0001  1011 0101 0100 0100 0100 0010 1101 0001 1000
times 2

11.0010 0100 0011  0110 1010 1000 1000 1000 0101 1010 0011 000

0.1100 1001   1101 1010 1010 0010 0010 0001 0110 1000 1100 0
times 4

(without the spaces -- only put in above for clarity).

While I could take the original output 0X1.921FB54442D18P+1 from
sprintf() and parse it out into binary using gsub() or the like,
of submit it to say an 'awk' script via an external file, this would
be a tedious business!

Is there some function already in R which outputs the bits in the
binary representation directly?

I see that Dabid Hinds asked a similar question on 17 Aug 2005:
Raw data type transformations

  http://finzi.psych.upenn.edu/R/Rhelp02/archive/59900.html

(without, apparently, getting any response -- at any rate within
the following 3 months).

With thanks for any suggestions,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-May-09   Time: 18:23:49
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Output of binary representation

2009-05-17 Thread Ted Harding
Many thankis, Gabor! That looks both interesting and powerful.
Indeed, it seems to implement with one stroke what I had been
thinking of implementing piecemeal.
Best wishes,
Ted.

On 17-May-09 17:48:00, Gabor Grothendieck wrote:
 gsubfn of the gsubfn package is like gsub but can take a function,
 list or proto object as the replacement instead of a character
 string and with a list it can be used to readily turn hex to binary:
 
 library(gsubfn)
 binary.digits -
 + list(0= , 1= 0001, 2= 0010, 3= 0011,
 +  4= 0100, 5= 0101, 6= 0110, 7= 0111,
 +  8= 1000, 9= 1001, A= 1010, B= 1011,
 +  C= 1100, D= 1101, E= 1110, F= )

 gsubfn([0-9A-F], binary.digits, 0X1.921FB54442D18P+1)
 [1]
 X0001.1001001110110101010001000110110100011000P+0001
 
 
 On Sun, May 17, 2009 at 1:23 PM, Ted Harding
 ted.hard...@manchester.ac.uk wrote:
 I am interested in studying the binary representation of numerics
 (doubles) in R, so am looking for possibilities of output of the
 internal binary representations. sprintf() with format a or A
 is halfway there:

 _sprintf(%A,pi)
 # [1] 0X1.921FB54442D18P+1

 but it is in hex.

 The following illustrate the sort of thing I want:

 1.1001 0010 0001  1011 0101 0100 0100 0100 0010 1101 0001 1000
 times 2

 11.0010 0100 0011  0110 1010 1000 1000 1000 0101 1010 0011 000

 0.1100 1001   1101 1010 1010 0010 0010 0001 0110 1000 1100 0
 times 4

 (without the spaces -- only put in above for clarity).

 While I could take the original output 0X1.921FB54442D18P+1 from
 sprintf() and parse it out into binary using gsub() or the like,
 of submit it to say an 'awk' script via an external file, this would
 be a tedious business!

 Is there some function already in R which outputs the bits in the
 binary representation directly?

 I see that Dabid Hinds asked a similar question on 17 Aug 2005:
 Raw data type transformations

 _http://finzi.psych.upenn.edu/R/Rhelp02/archive/59900.html

 (without, apparently, getting any response -- at any rate within
 the following 3 months).

 With thanks for any suggestions,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 17-May-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 18:23:49
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-May-09   Time: 21:09:12
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Output of binary representation

2009-05-17 Thread Ted Harding
Thanks, Jim. While that is still in hex, I find I can get the binary
represntation using Gabor's gsubfn() function, provided the A-F isw
changed to a-f in setting up his 'binary.digits', and the output is
explicitly cast to character:

gsubfn([0-9a-f], binary.digits,
   as.character(writeBin(pi,raw(),endian='big')

Ted.

On 17-May-09 20:04:58, jim holtman wrote:
 Are you looking for how the floating point is represented in the
 IEEE-754
 format?  If so, you can use writeBin:
 
 writeBin(pi,raw(),endian='big')
 [1] 40 09 21 fb 54 44 2d 18
 
 
 On Sun, May 17, 2009 at 1:23 PM, Ted Harding
 ted.hard...@manchester.ac.ukwrote:
 
 I am interested in studying the binary representation of numerics
 (doubles) in R, so am looking for possibilities of output of the
 internal binary representations. sprintf() with format a or A
 is halfway there:

  sprintf(%A,pi)
 # [1] 0X1.921FB54442D18P+1

 but it is in hex.

 The following illustrate the sort of thing I want:

 1.1001 0010 0001  1011 0101 0100 0100 0100 0010 1101 0001 1000
 times 2

 11.0010 0100 0011  0110 1010 1000 1000 1000 0101 1010 0011 000

 0.1100 1001   1101 1010 1010 0010 0010 0001 0110 1000 1100 0
 times 4

 (without the spaces -- only put in above for clarity).

 While I could take the original output 0X1.921FB54442D18P+1 from
 sprintf() and parse it out into binary using gsub() or the like,
 of submit it to say an 'awk' script via an external file, this would
 be a tedious business!

 Is there some function already in R which outputs the bits in the
 binary representation directly?

 I see that Dabid Hinds asked a similar question on 17 Aug 2005:
 Raw data type transformations

  http://finzi.psych.upenn.edu/R/Rhelp02/archive/59900.html

 (without, apparently, getting any response -- at any rate within
 the following 3 months).

 With thanks for any suggestions,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 17-May-09   Time: 18:23:49
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/po
 sting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-May-09   Time: 22:06:59
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sprintf() question

2009-05-17 Thread Ted Harding
On 17-May-09 22:03:19, Daniel Nordlund wrote:
 When I type the following, I get results different from what I
 expected. 
 
 sprintf('%a',3)
 [1] 0x1.8
 
 Shouldn't the result be
 
 [1] 0x1.8p+2

Well, not p+2 but p+1
  (0x1.8 = 1.1000[2] ; *2 = 11.000[2] = 3[10]) ;
however, I get:

  sprintf('%a',3)
  # [1] 0x1.8p+1

which is indeed correct.

  R version 2.9.0 (2009-04-17) ## Same as yours
  platform  i486-pc-linux-gnu  ## Different from yours ...

which perhaps suggests that there may be a mis-compilation in the
Windows version.

Ted.

 I read through the help ?sprintf and didn't find anything that changed
 my expectation.  What am I misunderstanding?  I am using R-2.9.0 binary
 from CRAN on Windows XP Pro, and my session info is
 
 
 sessionInfo()
 R version 2.9.0 (2009-04-17) 
 i386-pc-mingw32 
 
 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base   
 
 
 Thanks for any enlightenment.
 
 Dan
 
 Daniel Nordlund
 Bothell, WA  USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-May-09   Time: 23:32:19
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replace % with \%

2009-05-15 Thread Ted Harding
On 15-May-09 14:46:27, Liviu Andronic wrote:
 Dear all,
 I'm trying to gsub() % with \% with no obvious success.
 temp1 - c(mean, sd,   0%,   25%,  50%,  75%,  100%)
 temp1
 [1] mean sd   0%   25%  50%  75%  100%
 gsub(%, \%, temp1, fixed=TRUE)
 [1] mean sd   0%   25%  50%  75%  100%
 Warning messages:
 1: '\%' is an unrecognized escape in a character string
 2: unrecognized escape removed from \%
 
 I am not quite sure on how to deal with this error message. I tried
 the following
 gsub(%, \\%, temp1, fixed=TRUE)
 [1] mean   sd 0\\%   25\\%  50\\%  70\\%  100\\%
 
 Could anyone suggest how to obtain output similar to:
 [1] mean   sd 0\%   25\%  50\%  75\%  100\%
 
 Thank you,
 Liviu

1: The double escape \\ is the correct way to do it. If you
   give \% to gsub, it will try to interpret % as a special
   character (like \n for newline), and there is none such
   (as it tells you). On the other hand, \\ tells gsub to
   interpret \ (normally used as the Escape character) in a
   special way (namely as a literal \).

2: The output
   mean   sd 0\\%   25\\%  50\\%  70\\%  100\\%
   from gsub(%, \\%, temp1, fixed=TRUE) is one of those cases
   where R displays something different from what is really there!
   In other words, 0\\% for example is the character string you
   would have to enter in order for R to store \%. You can see what
   is really there using cat:

 cat(gsub(%, \\%, temp1, fixed=TRUE))
 # mean sd 0\% 25\% 50\% 75\% 100\%

   which, of course, is what you wanted. You can see in other ways
   that what is stored is what you wanted -- for instance:

 temp2 - gsub(%, \\%, temp1, fixed=TRUE)
 write.csv(temp2,gsub.csv)

   and then, if you look into gsub.csv outside of R, you will see:

 ,x
 1,mean
 2,sd
 3,0\%
 4,25\%
 5,50\%
 6,75\%
 7,100\%

   which, again, is what you wanted.

Hoping this helops,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 15-May-09   Time: 16:32:13
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation

2009-05-14 Thread Ted Harding
On 14-May-09 11:28:17, Wacek Kusnierczyk wrote:
 Barry Rowlingson wrote:
 As a beginner, I agree  the for loop is much clearer to me.

  [Warning: Contains mostly philosophy]
   
 maybe quasi ;)
 
 To me, the world and how I interact with it is procedural. When I want
 to break six eggs I do 'get six eggs, repeat break egg until all
 eggs broken'. 
 
 yeah, that's the implementation level.  a typical recipe would not say
 'for n from 1 to 6, break the nth egg'.  it'd rather say 'break the
 eggs', which is closer to 'apply break to the eggs'.  you do of course
 break the eggs sequentially (or?), but that's below the abstraction
 useful for the recipe purpose.

But it does influence how you organise the subsequent garbage collection.
:)
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-09   Time: 12:40:16
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] specify the number of decimal numbers

2009-05-14 Thread Ted Harding
On 14-May-09 11:40:21, lehe wrote:
 Hi,
 I was wondering how to specify the number of decimal numbers in my
 computation using R? I have too many decimal numbers for my result,
 when I convert them to string with as.character, the string will
 be too long.
 Thanks and regards!

Since you say you want them to end up as character strinngs,
it would seem that sprintf() is the ideal tool for the job.

1. Read ?sprintf

2. Examples:

  pi
# [1] 3.141593

  sprintf(%.3f,pi)
# [1] 3.142

# sprintf(%10.3f,pi)
  [1]  3.142

  sprintf(%010.3f,pi)
# [1] 03.142

  sprintf(%025.17f,pi)
# [1] 003.14159265358979312

So you can do pretty much what you want, in terms of output format.
Ted.



E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-09   Time: 13:04:30
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


<    2   3   4   5   6   7   8   9   10   >