from:"Ted Harding"

Re: [R] specify the number of decimal numbers

2009-05-14 Thread Ted Harding

On 14-May-09 12:03:43, lehe wrote:
 
 Thanks! 
 In my case, I need to deal with a lot of such results, e.g. elements
 in a matrix. If using sprintf, does it mean I have to apply to each
 result individually? Is it possible to do it in a single command?

Yes, in various ways depending on how you want the output to be:

  M-pi*matrix(c(1,2,3,4),nrow=2)
  M
#  [,1]  [,2]
# [1,] 3.141593  9.424778
# [2,] 6.283185 12.566371

  sprintf(%.3f,M)
# [1] 3.142  6.283  9.425  12.566
  sprintf(c(%.1f,%.2f,%.3f,%.4f),M)
# [1] 3.1 6.289.425   12.5664

In this usage, the output is a vector of character strings.
If you wanted to keep the output character strings in matrix layout:

   matrix(sprintf(%.3f,M),nrow=2)
#  [,1][,2]
# [1,] 3.142 9.425 
# [2,] 6.283 12.566

While one may think of using print() with the digits argument
for this kind of thing, digits=3 for instance prints to a minimum
of 3 _significant_ digits, not decimal places. So:

  print(10*M,3)
#  [,1]  [,2]
# [1,] 31.4  94.2
# [2,] 62.8 125.7

See in ?print.default:
  The same number of decimal places is used throughout a vector. 
  This means that 'digits' specifies the minimum number of
  significant digits to be used, and that at least one entry will
  be encoded with that minimum number.

As far as I can tell, there is not a print option such as decimals,
e.g. decimals=3 which would print all numbers with 3 digits after
the decimal point (as in sprintf(%.3f,...)); if there were such
an option, one would expect to find it listed in ?options. If that
is really true, it is a pity! Short of using sprintf() (and then
converting back to numeric if you need the results as number), the
only way you could use digits to get a desired number (e.g. 3)
of decimal places throughout would be to first study the range of
magnitudes of the numbers and then choose decimals=... accordingly.
For example, to print 10*M to 2 d.p., you need to note that the
shortest number is 31.4***, so you need decimals=5 to get 3 d.p.:

  print(10*M,5)
#[,1][,2]
# [1,] 31.416  94.248
# [2,] 62.832 125.664

On the other hand, you could do it straight off with sprintf:
[***]
  (matrix(as.numeric(sprintf(%.3f,10*M)),nrow=2))
#[,1][,2]
# [1,] 31.416  94.248
# [2,] 62.832 125.664

regardless of the magnitudes of the numbers -- though you could
still fail if there are zeros after the decimal place:

[***]
  sprintf(%.3f,c(1.0,2.0,3.0,4.0))
# [1] 1.000 2.000 3.000 4.000
  as.numeric(sprintf(%.3f,c(1.0,2.0,3.0,4.0)))
# [1] 1 2 3 4

Maybe there really is a print method which will output a result
like [***] without the quotes; but I don't know how to find it!

ted.

 
 
 jholtman wrote:
 
 Depending on what you want to do, use 'sprintf':
 
 x - 1.23456789
 x
 [1] 1.234568
 as.character(x)
 [1] 1.23456789
 sprintf(%.1f  %.3f  %.5f, x,x,x)
 [1] 1.2  1.235  1.23457

 
 
 On Thu, May 14, 2009 at 7:40 AM, lehe timlee...@yahoo.com wrote:
 

 Hi,
 I was wondering how to specify the number of decimal numbers in my
 computation using R? I have too many decimal numbers for my result,
 when
 I
 convert them to string with as.character, the string will be too
 long.
 Thanks and regards!
 --
 View this message in context:
 http://www.nabble.com/specify-the-number-of-decimal-numbers-tp23538852
 p23538852.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/p
 osting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 View this message in context:
 http://www.nabble.com/specify-the-number-of-decimal-numbers-tp23538852p2
 3539189.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-09   Time: 14:01:14
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch

Re: [R] specify the number of decimal numbers

2009-05-14 Thread Ted Harding

On 14-May-09 12:27:40, Wacek Kusnierczyk wrote:
 jim holtman wrote:
 Depending on what you want to do, use 'sprintf':

 x - 1.23456789
 x
 [1] 1.234568
   
 as.character(x)
 [1] 1.23456789
   
 sprintf(%.1f  %.3f  %.5f, x,x,x)
 [1] 1.2  1.235  1.23457
   
 ... but remember that sprintf introduces excel bugs into r (i.e.,
 rounding is not done according to the  IEC 60559 standard, see ?round):
 
 ns = c(0.05, 0.15)
 round(ns, 1)
 # 0.0 0.2
 as.numeric(sprintf('%.1f', ns))
 # 0.1 0.1
 vQ

True! And thanks for importing that point into the discussion.
And, by the way, it goes some way to solving an issue I raised
earlier, in that

  M
#  [,1]  [,2]
# [1,] 3.141593  9.424778
# [2,] 6.283185 12.566371

  round(10*M,3)
#[,1][,2]
# [1,] 31.416  94.248
# [2,] 62.832 125.664

though, of course, still 
  round(1.0,3)
# [1] 1  ### not 1.000

It would still be good to be able to simply have

  print(X,decimals=3)

(with proper rounding, of course)

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-09   Time: 14:47:30
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] specify the number of decimal numbers

2009-05-14 Thread Ted Harding

On 14-May-09 15:15:16, James W. MacDonald wrote:
 Wacek Kusnierczyk wrote:
 (Ted Harding) wrote:
 On 14-May-09 12:27:40, Wacek Kusnierczyk wrote:
   
 ... but remember that sprintf introduces excel bugs into r (i.e.,
 rounding is not done according to the  IEC 60559 standard, see
 ?round):

 ns = c(0.05, 0.15)
 round(ns, 1)
 # 0.0 0.2
 as.numeric(sprintf('%.1f', ns))
 # 0.1 0.1
 vQ
 
 True! And thanks for importing that point into the discussion.
 
 said but true, true but sad.  i have already raised the issue on
 this list earlier, but to no response.  apparently, this sort of
 excel bug in r is an intentional feature, so you may not get it
 improved anytime soon.  unless you submit a patch and get it
 accepted, that is.
 
 Have you brought this 'issue' up with the Perl people as well?
 
 perl -e 'print sprintf(%.1f, 0.05) . \n;'
 0.1
 perl -e 'print sprintf(%.1f, 0.15) . \n;'
 0.1

This happens also when you use C's fprintf and sprintf (at any rate
in my gcc):

  #include stdio.h
  #include math.h
  main(argc,argv) int argc; char **argv;
  {
  fprintf(stdout, %.1f\n, 0.15);
  fprintf(stdout, %.1f\n, 0.05);
  fprintf(stdout, %.2f\n, 0.15);
  fprintf(stdout, %.2f\n, 0.05);
  }

  cc -o testprint3 testprint3.c
  ./testprint
  0.1
  0.1
  0.15
  0.05

(with similar output when printing a string formatted by sprintf).

So, in so far a R relies on the compiler's implementation of the
*printf functions, this can hardly be put right withbout re-writing
[g]cc!

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-09   Time: 17:16:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] where does the null come from?

2009-05-13 Thread Ted Harding

On 13-May-09 14:43:17, Gabor Grothendieck wrote:
 out -   apply(m, 1, cat, '\n')
 1 3
 2 4
 out
 NULL

Or, more explicitly, from ?cat :

  Value:
   None (invisible 'NULL').

Ted.

 On Wed, May 13, 2009 at 5:23 AM, Wacek Kusnierczyk
 waclaw.marcin.kusnierc...@idi.ntnu.no wrote:
 _ _m = matrix(1:4, 2)

 _ _apply(m, 1, cat, '\n')
 _ _# 1 2
 _ _# 3 4
 _ _# NULL

 why the null?

 vQ

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-May-09   Time: 15:56:04
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with randomly generating samples

2009-05-13 Thread Ted Harding

On 13-May-09 15:18:05, Debbie Zhang wrote:
 Dear R users,
 Can anyone please tell me how to generate a large number of samples in
 R, given certain distribution and size.
 For example, if I want to generate 1000 samples of size n=100, with a
 N(0,1) distribution, how should I proceed? 
 (Since I dont want to do rnorm(100,0,1) in R for 1000 times)
  
 Thanks for help
 Debbie

One possibility is

  nsamples - 1000
  sampsize - 100
  Samples - matrix(rnorm(nsamples*sampsize,0,1),nrow=nsamples)

Then each row of the matrix Samples will be a sample of size 'sampsize',
the i-th can be accessed as Samples[i,], and there are 'nsamples' rows
to choose from.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-May-09   Time: 16:46:05
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Input to variables - help please

2009-05-13 Thread Ted Harding

On 13-May-09 19:49:35, Steve Sidney wrote:
 Dear list
 I have managed to write a short program to evaluate data which is
 inputted from a csv file using the following
 
 x = read.csv(wms_results.csv, header=TRUE)
 
 All works fine but since I have a number of similar data files which
 have different names, I would like the program to allow me to input
 the file name, rather than having to edit the program.
 
 From the documentation I am unclear how to replace the filename.csv
 with a string varaiable so that from the keyboard I can input the name.
 
 Also which command can I use to output some text to the monitor as a
 prompt
 
 Any help would be great and apologies for such a simple question.
 Thanks
 
 Regards
 Steve

Try:

  NextFile - readline(Enter the CSV filename:\n)
  x = read.csv(NextFile, header=TRUE)

You will then get a prompt Enter the CSV filename:
so enter it. Then it will read it.

Example (from the terminal):

   NextFile - readline(Enter the CSV filename:\n)
  Enter the CSV filename:
  ACBDEFG.csv
   NextFile
  [1] ACBDEFG.csv
  

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-May-09   Time: 21:12:19
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matching period with perl regular expression

2009-05-13 Thread Ted Harding

On 13-May-09 23:47:41, Henrique Dallazuanna wrote:
 Try this:
 
 gsub(^(\\w*).*$, \\1, x)

Even simpler:
  x
# [1] wa.w
  gsub(\\..*,,x,perl=TRUE)
# [1] wa
  x-abcde.fghij.klmno
  gsub(\\..*,,x,perl=TRUE)
# [1] abcde

(and it doesn't matter whether 'perl' is TRUE or FALSE)

Ted.

 On Wed, May 13, 2009 at 8:41 PM, Stephen J. Barr
 stephenjb...@gmail.comwrote:
 
 Hello,

 I have several strings where I am trying to eliminate the period and
 everything after the period, using a regular expression. However, I am
 having trouble getting this to work.

  x = wa.w
  gsub(x, \..*, , perl=TRUE)
 [1] 
 Warning messages:
 1: '\.' is an unrecognized escape in a character string
 2: unrecognized escape removed from \..*

 In perl, you can match a single period with \.
 Is this not so even with perl=TRUE. I would like for x to be equal to
  x = wa

 What am I missing here?
 -stephen
 ==
 Stephen J. Barr
 University of Washington
 WEB: www.econsteve.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O
 
   [[alternative HTML version deleted]]
 


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-09   Time: 00:58:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Power function for ratio of lognormal means: two equally

2009-05-12 Thread Ted Harding

On 12-May-09 12:00:50, Steve Candy wrote:
 Hi All
 This is a general stats problem that I am dealing with using R,
 so any help is greater appreciated.
 
 I have two lognormal distributions with means M1 and M2.
 If we have:
 
 H0: log(M1/M2)=0
 
 H1: log(M1/M2) !=0 equivalent to log(M1/M2)=log(1+P) where P0 or P0.
 
 If we calculate the power for a value of P=0.1 or P=-0.1 (i.e. a 10%
 difference) and say assume SE{log(M1/M2)}=0.05, and confidence level=
 100(1-alpha), alpha=0.05, then how is the power function calculated?
 
 As far as I can see we can calculate the power in the two ways given
 below and if there is no assumed direction to difference between M1 and
 M2 are not both calculations valid?
 
 # P=-0.1
 qn - qnorm(p=0.05, mean=0, sd=0.05, lower.tail=T)
 Power.1 - pnorm(q=qn, mean=log(0.9), sd=0.05, lower.tail=T)
 # P=0.1
 qn - qnorm(p=0.95, mean=0, sd=0.05, lower.tail=T)
 Power.2 - pnorm(q=qn, mean=log(1.1), sd=0.05, lower.tail=F)

 print(c(Power.1,Power.2))
 [1] 0.6780872 0.6030887

  So which value should I use? Or would the average of the two values be
 appropriate to use? Or is there a fault in my logic? After a quick lit
 search I contacted a colleague who has written two stats text books and
 taught at University level for many years and he has not come across
 this problem and suggested averaging the values. This post is to ask if
 anyone has come across this pretty basic problem and has a suggested
 approach.
 
 thanks
 Steve

The Power Function is the probability of rejecting H0 for a particular
instance of H1 (defined by a specific non-null value of a parameter),
considered as a function of that parameter.

So the power for H1: +10% is not the same as the power for H1: -10%.

So then you face the problem of what to do about that when you want
to consider What is the power when H1 is +/-10%?. The answer, in
general terms, is whatever best suits the context in which the problem
arises.

One general solution which makes a certain kind of sense is: Adopt
the smaller value of the two results (0.6780872 or 0.6030887). Then
you know that the power is at least 0.6030887 if H1 (either +10% or -10%)
is true. If your objective is to know at least what power can
be achieved in the circumstances, then that answers the question. If
0.6030887 is adequate for your needs, then your ivestigation is in
a satisfactory state.

On the other hand, if in the context of the problem, you are concerned
that there may not be enough power, then you may want to know at
most what power can be achieved. In that case, you can achieve at most
0.6780872. If that is inadequate for your needs, then you need to
re-consider the investigation as a whole, with a view to increasing
the power..

You might also be considering a situation in which you are prepared
to think as though +/-10% may arise at random with probabilities
(P+) and (P-), and are interested in the average power as being
representative of what you may expect to achieve overall (even though
this will never be true for any particular case).

In that case, you may reasonably adopt 0.6780872*(P-) + 0.6030887*(P+).
Your colleague's suggestion is equivalent to (P-) = (P+) = 0.5.

Additionally, in such a context, you may have different utilities
for successfully rejecting H0 when H1: -10% is true, versus successfully
rejecting H0 when H1: +10% is true. In that case, you would be
thinking of computing the expected utility (the preceding case is
tantamount to having equal utilities).

Summary: Apply the results in the way that best fits into what you
know about the situation you are investigating, and best meets the
objectives you have in that aituation. There is not a universal
answer to your question!

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 12-May-09   Time: 14:19:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] anyone know how to calculate chi square value from P val

2009-05-11 Thread Ted Harding

On 11-May-09 19:36:00, Anyuan Guo wrote:
 Dear all,
 I have P value of a list of markers. Now I need the chi square
 value with degrees of freedom 2.
 I noticed there are several Chisquare functions (dchisq, pchisq, 
 qchisq, and rchisq), but it seems all are not for my purpose.
 In microsoft excel, there is a function CHINV to do this, such as 
 CHINV(0.184, 2) is 3.386, that means the chi square value for P value 
 0.184, degree 2 is 3.386.
 Does the R package has some function to do this?
 
 Thanks
 Anyuan

Yes, and you already looked at it but apparently did not recognise it!

Either:

  qchisq(1-0.184, 2)  ## (note 1-0.184, since 0.184 is the upper tail)
# [1] 3.385639

Or:

  qchisq(0.184, 2, lower.tail=FALSE) ## Default for lower.tail is TRUE
# [1] 3.385639

Enter ?qchisq for more informatio0n on this and related function.
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 11-May-09   Time: 18:15:34
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Histogram frequencies with a normal pdf curve overlay

2009-05-09 Thread Ted Harding

On 09-May-09 16:10:42, Jacques Wagnor wrote:
 Dear List,
 When I plot a histogram with 'freq=FALSE' and overlay the
 histogram with a normal pdf curve, everything looks as expected,
 as follows:
 
 x - rnorm(1000)
 hist(x, freq=FALSE)
 curve(dnorm(x), add=TRUE, col=blue)
 
 What do I need to do if I want to show the frequencies (freq=TRUE)
 with the same normal pdf overlay, so that the plot would still look
 the same?
 
 Regards,
 Jacques

Think first about how you would convert the histogram densities
(heights of the bars on the density scale) into histogram frequencies.

  Density * (bin width) * N = frequency

where N = total number in sample. Then all you need to is multiply
the Normal density by the same factor. To find out the bin width,
take the difference between succesive values of the breaks component
of the histogram. One way to do all this is

  N - 1000
  x - rnorm(N)
  H - hist(x, freq=TRUE)  ## This will plot the histogram as well
  dx - min(diff(H$breaks))
  curve(N*dx*dnorm(x), add=TRUE, col=blue)

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-May-09   Time: 17:31:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Division ?

2009-05-03 Thread Ted Harding

On 02-May-09 17:34:15, bogdanno wrote:
 It seems division with numbers bigger than 10 000 000 doesn't work
  2000/21
 [1] 952381
 /23
 [1] 2415459
 
 Thank you

I think you are confusing what is displayed with what is computed:

  2000/21
# [1] 952381
  print(2000/21,17)
# [1] 952380.9523809524

  /23
# [1] 2415459
  print(/23,17)
# [1] 2415458.913043478

  (2000/21)*21
# [1] 2e+07
  print((2000/21)*21,17)
# [1] 2e+07

  (/23)*23
# [1] 
  print((/23)*23,17)
# [1] 

Your numbers bigger than 10 000 000 corresponds to the default
display of results to 7 significant figures.

If (as in the above print() statements) you increase this, you
get more reasonable-looking results.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 03-May-09   Time: 10:11:02
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding zeros to dataframe

2009-05-01 Thread Ted Harding

On 01-May-09 17:20:08, Collins, Cathy wrote:
 Greetings,
 I am new to R and am hoping to get some tips from experienced
 R-programmers.
  
 I have a dataset that I've read into R as a dataframe. There are 5
 columns: Plot location,species name, a species number code (unique
 to each species name), abundance, and treatment. There are 272 plots in
 each treatment, but only the plots in which the species was recorded
 have an abundance value.  For all species in the dataset, I would like
 to add zeros to the abundance column for any plots in which the species
 was not recorded, so that each species has 272 rows.  The data are
 sorted by species and then abundance, so all of the zeros can
 presumably just be tacked on to the last (272-occupied plots) row for
 each species.
  
 My programming skills are still somewhat rudimentary (and biased toward
 VBA-style looping...which seems to be leading me astray). Though I have
 searched, I have not yet seen this particular problem addressed in the
 help files.
  
 Many thanks for any suggestions,
 Cathy
 mailto:ccoll...@ku.edu

Suppose we call your dataframe abun.df. Then its columns will be
something like abun.df$location, abun.df$name, abun.df$abundance,
abun.df$trtmt (depending on what you called them in the first place).

From your description, I am presuming that abundence values where
a species was not recorded have no value entered. In that case,
presumably they have gone into abun.df$abundance as NA. You can
check this with a command like

  sum(is.na(abun.df$abundance))

If you get a positive result, then that is likely to be the case.
As a cross-check:

  sum(abun.df$abundance  0, na.rm=TRUE)

should give another number which, together with the first, should
add up to the total number of rows in the dataframe.

Assuming, then, that this is the case, the simplest method to set
the non-recorded values to 0 is on the lines of

  ix - (is.na(abun.df$abundance))
  abun.df$abundance[ix] - 0

Then you can run the check

  sum(abun.df$abundance == 0)

and you should get a number which is the same as you got from

  sum(is.na(abun.df$abundance))

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-May-09   Time: 18:39:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] postscript printer breaking up long strings

2009-04-30 Thread Ted Harding

On 30-Apr-09 18:50:38, tommers wrote:
 For a long string in an axis title, or main title the postscript device
 breaks apart the long strings into smaller strings.  For example,
 
  postscript('linebreaktest.eps')
  plot(1,xlab='aReallyLongStringToSeeHowItBreaks',
 ylab='aReallyLongStringToSeeHowItBreaks')
  for(i in
c(.6,1,1.4))text(i,i,'aReallyLongStringToSeeHowItBreaks')
  dev.off()
 
 produces the attached output.
 http://www.nabble.com/file/p23322197/linebreaktest.eps
 linebreaktest.eps 

In the graph shown in your URL above, the xlab and the ylab
appear in their entirety, unbroken. So does the one plotted
in the middle of the graph. I get the same when I run your code.

The texts you plotted at the first and third of the positions
(i,i) also are not (I think) unbroken -- they are simply not
entirely visible.

The one at bottom left is missing its start aReallyLongSt,
and the one at top right is missing its end eHowItBreaks.

These apparent truncations arise because the beginning of the
first, and the end of the second, are outside the plotting
area, and so when the postcript() driver puts a BoundingBox
at the border of the potting area this has the effect of
clipping these strings when they are displayed.

The apparently broken strings in the PostScript code you
quote below are perfectly normal in PostScript. When dealing
with text in a variable-width font, a PS driver will typically
split continuous text into chunks, to allow for effects like
kerning (or other adjustments of spacing) between the chunks.
You will note that all five instances of the string are split
in the same places, despite the fact that at least three of
them come out perfectly (according to the graph on your URL).

So nothing is wrong. You will have to adjust the extent of
your plotting area, or adjust the pos or just of your
(i,i) labels, to be able to see them in their entirety.

Hoping this helps,
Ted.

 The most important lines from the eps source are at the end
 of the file:
 
 309.70 36.72 (aReallyLongStr) 0 ta
 0.180 (ingT) tb
 -1.440 (oSeeHo) tb
 -0.180 (wItBreaks) tb gr
 30.96 212.50 (aReallyLongStr) 90 ta
 0.180 (ingT) tb
 -1.440 (oSeeHo) tb
 -0.180 (wItBreaks) tb gr
 77.04 91.44 743.76 534.96 cl
 /Font1 findfont 12 s
 0 setgray
 1.03 104.76 (aReallyLongStr) 0 ta
 0.180 (ingT) tb
 -1.440 (oSeeHo) tb
 -0.180 (wItBreaks) tb gr
 309.70 310.10 (aReallyLongStr) 0 ta
 0.180 (ingT) tb
 -1.440 (oSeeHo) tb
 -0.180 (wItBreaks) tb gr
 618.36 515.43 (aReallyLongStr) 0 ta
 0.180 (ingT) tb
 -1.440 (oSeeHo) tb
 -0.180 (wItBreaks) tb gr
 ***
 Each string is broken into 4 chunks.  Has anyone else had this problem?
 
 Thanks!
 -- 


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-Apr-09   Time: 21:39:59
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie R question

2009-04-28 Thread Ted Harding

On 28-Apr-09 20:42:45, Tena Sakai wrote:
 Hi,
 
 I am a newbie with R.  My environment is linux and
 I have a file.  I call it barebone.R, which has one
 line:
 
  cat ('Hello World!\n')
 
 I execute this file as:
 
  R --no-save  barebone.R
 
 And it does what I expect.  What I get is 20+/- lines
 of text, one of which is 'Hello World!'.
 
 How would I go about getting rid of all but the line I
 am after, 'Hello World!'?
 
 Regards,
 
 Tena Sakai
 tsa...@gallo.ucsf.edu

An unusual request! I had to browse 'man R' a bit before getting
a hint.

  R --silent --nosave  barebone.R

should do it!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Apr-09   Time: 22:11:23
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Terminology: Multiple Regression versus multivariate Reg

2009-04-24 Thread Ted Harding

On 24-Apr-09 08:14:34, str...@freenet.de wrote:
 Hello R-Members,
 can anyone explain the difference between
 multiple and multivariate regression to me 
 in terms of the terminology and eventually 
 its respect to the mathematical foundation 
 respectively ?
 
 Is multiple regression perhaps more related to GLM
 and multivariate Regression rather applied,
 if there are no explizit numeric factor levels ?
 
 Thanks for elucidations on that topic.
 
 Many thanks and best regrads
 Bjoern

This is indeed a question of terminology and usage, and there
is a degree of variability in it.

As far as I am concerned (and others, though not all), multivariate
regression refers to regression where the dependent (outcome)
variable is mutltivariate:

  Y ~ X

where each instance of Y is a multivariate observation. For example,
suppose G (growth) consists of two pieces of data: height and weight,
and A is Age. Then a multivariate regression model would look like

  G ~ A  or  (Ht,Wt) ~ Age

(two variables on the left, one variable on the right). This allows
for correlation between Ht and Wt to be part of the model.

What is generally meant by multiple regression is regression
of a single variable (on the left) on more than one variable
(on the right), for example

  Wt ~ Ht + Age

If you must make a distinction, there is the term simple regression
(nowadays rarely used) for when there is only one variable on the right:

  Wt ~ Age

Whether this is a linear model (use lm()) or a generalised linear
model (use glm()) has nothing to do with the termonology.

There is an unfortunate (in my view) tendency for people to use
multivariate regression whden talking about what I call multiple
regression above (i.e. more than 1 independent variable). I think
this should be reserved for regression where the left-hand side is
multivariate.

But maybe I'm in a minority ...
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Apr-09   Time: 10:00:23
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Is latest R faster?

2009-04-20 Thread Ted Harding

Hi Folks,
I just upgraded R to the latest R version 2.9.0 (2009-04-17)
(from the Debian Etch repository). It seems to me that it
at least starts up much faster than it used to and, though I
don't have comparative timing data for long jobs, also that
it does its work faster. Is that just an impression, or has
something been done to speed things up?

With thnks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 20-Apr-09   Time: 20:29:44
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geometric mean to handle large number and negative value

2009-04-15 Thread Ted Harding

On 15-Apr-09 09:26:55, richard.cot...@hsl.gov.uk wrote:
 I have created two functions to compute geometric means. Method 1 can
 handle even number of negative values but not large number, vice versa
 for method 2. How can I merge both functions so that both large number
 and negative values can be handled ?
 
  geometric.mean1 - function(x) prod(x)^(1/length(x))
  geometric.mean2 - function(x) exp(mean(log(x)))
 
  geometric.mean1(1:1000)
 [1] Inf
  geometric.mean2(1:1000)
 [1] 3678798
 
  geometric.mean1(c(-5,-4,4,5))
 [1] 4.472136
  geometric.mean2(c(-5,-4,4,5))
 [1] NaN
 Warning message:
 In log(x) : NaNs produced
 
 Geometric mean is usually restricted to positive inputs, because
 otherwise the answer can have an imaginary component. If you really
 want the geometric mean of negative inputs, use the second method
 but convert the input to be a complex number first.
 
 comp.x - as.complex(c(-5,-4,4,5))
 geometric.mean2(comp.x)
# [1] 0+4.472136i
 
 Regards,
 Richie.
 Mathematical Sciences Unit
 HSL

Since it appears that you were content with the result of your product
method when there was an even number of negative cases, and this is
equivalent to the result you would get if all the negative numbers
were positive, why not simply convert all numbers to positive by
using abs(), and then applying your second method (which can cope
with large numbers)?

I.e. geometric.mean3 - function(x) exp(mean(log(abs(x

However, do think carefully about whether the results will make the
sort of sense that you intend.

For instance, on that basis,

  geometric.mean3(c(-1,1)) = 1, not 0

  geometric.mean2(c(-4,-1)) = 2, so the resulting geometric mean
is outside the range of the original numbers.

(yet it is what your first method would have given).

On the other hand, Richie's suggestion gives results which you may
consider to make more sense:

  comp.x - as.complex(c(-1,1))
  geometric.mean2(comp.x)
  # [1] 0+1i

  comp.x - as.complex(c(-4,-1))
  geometric.mean2(comp.x)
  # [1] -2+0i

But then, in the original example:

  comp.x - as.complex(c(-5,-4,4,5))
  geometric.mean2(comp.x)
  # [1] 0+4.472136i

what do you want to do with the resulting 4.472136i ?

So you need to think about what you intend to do with the result,
in general, and about why you want to compute a geometric mean.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 15-Apr-09   Time: 11:14:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MLE for bimodal distribution

2009-04-10 Thread Ted Harding

On 08-Apr-09 23:39:36, Ted Harding wrote:
 On 08-Apr-09 22:10:26, Ravi Varadhan wrote:
 EM algorithm is a better approach for maximum likelihood estimation
 of finite-mixture models than direct maximization of the mixture
 log-likelihood.  Due to its ascent properties, it is guaranteed to
 converge to a local maximum.  By theoretical construction, it also
 automatically ensures that all the parameter constraints are
 satisfied.
 [snip]
 Be warned
 that EM convergence can be excruciatingly slow.  Acceleration methods
 can be of help in large simulation studies or for bootstrapping.
 
 Best,
 Ravi.
 
 [snip]
 As to acceleration: agreed, EM can be slow. Aitken acceleration
 can be dramatically faster. Several outlines of the Aitken procedure
 can be found by googling on aitken acceleration.
 
 I recently wrote a short note, describing its general principle
 and outlining its application to the EM algorithm, using the Probit
 model as illustration (with R code). For fitting the location
 parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM
 took 3. For fitting the location and scale paramaters, Raw EM took
 108, and Aitken took 4.
 
 If anyone would like a copy (PS or PDF) of this, drop me a line.

I have now placed a PDF copy of this, if anyone is interested
(it was intended as a brief expository note), at:

  http://www.zen89632.zen.co.uk/R/EM_Aitken/em_aitken.pdf

Best wishes to all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Apr-09   Time: 17:15:06
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] puzzling lm.fit errors

2009-04-09 Thread Ted Harding

On 09-Apr-09 22:53:51, Brendan Morse wrote:
 Hi everyone, I am running a monte carlo and am getting an error that I 
 haven't the slightest clue where to begin figuring it out. The error  
 is as follows:
 
 Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)
:
0 (non-NA) cases
 In addition: Warning message:
 In ltm.fit(X, betas, constraint, formula, con) :
Hessian matrix at convergence is not positive definite; unstable  
 solution.
 
 The puzzling aspect though is that it successfully runs a number of  
 complete iterations before this occurs. I am not sure what or where  
 this is going on.
 
 Any suggestions?
 
 Thanks very much (in advance)
 - Brendan

This is the sort of thing that can happen is the covariates in your
linear model lead to a model matrix which singular (or nearly so).
In other words, at least one of your covariate vectors is a linear
combination of the others (or nearly so).

The facts that (a) you are doing a Monte Carlo simulationm and
(b) you get a number of satisfactory runs and then it fails, suggest
that you are randomly creating covariate (independent-variable) values
as well as dependent-variable values.

If that is so, then the structure of your model may be such that it
is relatively likely for such singularity to arise. While not
particularly likely for continuous covariates, near-singularity can
nevertheless happen. It can be much more likely if you are generating
random levels of factors, since only a finite number of combinations
are possible, and sooner or later you will definitely hit exact
collinarity. And if you are using a fairly large number of covariates,
and not a large number of cases, then it can become really likely!

I'm only guessing, of course, and you can either agree with, or deny,
the above suggestions. Either way, it would help in clarifying the
problem to know the details of how your Monte Carlo simulation is
supposed to work.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Apr-09   Time: 00:23:04
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MLE for bimodal distribution

2009-04-08 Thread Ted Harding

On 08-Apr-09 22:10:26, Ravi Varadhan wrote:
 EM algorithm is a better approach for maximum likelihood estimation
 of finite-mixture models than direct maximization of the mixture
 log-likelihood.  Due to its ascent properties, it is guaranteed to
 converge to a local maximum.  By theoretical construction, it also
 automatically ensures that all the parameter constraints are satisfied.
 
 The problem with mixture estimation, however, is that the local maximum
 may not be a good solution.  Even global maximum may not be a good
 solution.  For example, it is well known that in a Gaussian mixture,
 you can get a log-likelihood of +infinity by letting mu = one of the
 data points, and sigma = 0 for one of the mixture components.  You can
 detect trouble in MLE if you observe one of the following: (1) a
 degenerate solution, i.e. one of the mixing proportions is close to 0,
 (2) one of the sigmas is too small.
 
 EM convergence is sensitive to starting values.  Although, there are
 several starting strategies (see MacLachlan and Basford's book on
 Finite Mixtures), there is no universally sound strategy for ensuring a
 good estimate (e.g. global maximum, when it makes sense). It is always
 a good practice to run EM with multiple starting values.  Be warned
 that EM convergence can be excruciatingly slow.  Acceleration methods
 can be of help in large simulation studies or for bootstrapping.
 
 Best,
 Ravi.

Well put! I've done a fair bit of mixture-fitting in my time,
and one approach which can be worth trying (and for which there
is often a reasonably objective justification) is to do a series
of fits (assuming a given number of components, e.g. 2 for the
present purpose) with the sigma's in a constant ratio in each one:

  sigma2 = lambda*sigma1

Then you won't get those singularities where it tries to climb
up a single data-point. If you do this for a range of values of
lambda, and keep track of the log-likelihood, then you get in
effect a profile likelihood for lambda. Once you see what that
looks like, you can then set about penalising ranges you don't
want to see.

The reason I say there is often a reasonably objective justification
is that often you can be pretty sure, if there is a mixture present,
that lambda does not go outside say 1/10 - 10 (or whatever),
since you expect that the latent groups are fairly similar,
and unlikely to have markedly different sigma's.

As to acceleration: agreed, EM can be slow. Aitken acceleration
can be dramatically faster. Several outlines of the Aitken procedure
can be found by googling on aitken acceleration.

I recently wrote a short note, describing its general principle
and outlining its application to the EM algorithm, using the Probit
model as illustration (with R code). For fitting the location
parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM
took 3. For fitting the location and scale paramaters, Raw EM took 108,
and Aitken took 4.

If anyone would like a copy (PS or PDF) of this, drop me a line.
Or is there some repository for R-help (like the Files area
in Google Groups) where one can place files for others to download?

[And, if not, do people think it might be a good idea?]

Best wishes tgo all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-Apr-09   Time: 00:33:02
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extract the p value of F statistics from the lm class

2009-04-05 Thread Ted Harding

On 05-Apr-09 08:18:27, tedzzx wrote:
 Dear R users
 I have run an regression and want to extract the p value of the F
 statistics, but I can find a way to do that.
 
 x-summary(lm(log(RV2)~log(IV.m),data=b))
 
 Call:
 lm(formula = log(RV2) ~ log(IV.m), data = b[[11]])
 
 Residuals:
  Min   1Q   Median   3Q  Max 
 -0.26511 -0.09718 -0.01326  0.11095  0.29777 
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)  -0.3059 0.1917  -1.5950.121
 log(IV.m) 0.9038 0.1065   8.488 1.38e-09 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 
 Residual standard error: 0.1435 on 31 degrees of freedom
 Multiple R-squared: 0.6991,   Adjusted R-squared: 0.6894 
 F-statistic: 72.04 on 1 and 31 DF,  p-value: 1.379e-09 
 
 names(x)
  [1] call  terms residuals
  [4] coefficients  aliased   sigma
  [7] dfr.squared adj.r.squared
 [10] fstatisticcov.unscaled
 
 x$fstatistic
valuenumdfdendf 
 72.04064  1.0 31.0 
 
 But can not find the p value of F statistics. 
 Thanks
 Ted

Maybe you were looking in the wrong place. A few lines above the
output from x$fstatistic

  x$fstatistic
 valuenumdfdendf 
  72.04064  1.0 31.0

you will find

  F-statistic: 72.04 on 1 and 31 DF,  p-value: 1.379e-09

and therefore will find the P-value. However, maybe that is not
the question you really wanted to ask. If that is what I think it
may be, you could

1:  Observe that x$fstatistic is a vector with 3 values which
are: value of F; numerator df; demoninator df
2:  Note (from ?pf)
pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
3:  Therefore do
pf(x$fstatistic[1],x$fstatistic[2],x$fstatistic[3],lower.tail=FALSE)
# [1] 1.378626e-09

Note that the P-value is not in the list of values returned by lm()
although $fstatistic is one of the values. The computation of the
P-value in the displayed output from summary.lm() is done by the
'print' method for summary.lm() (just as in [3] above).

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-09   Time: 13:12:45
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-04-01 Thread Ted Harding

On 01-Apr-09 09:37:49, Gad Abraham wrote:
 Rolf Turner wrote:
 
 I get to the video screen OK --- there's a large greenish sideways
 triangle waiting to be clicked on.  I do so; there's a message that
 says it's downloading, with a little progress bar.  That seems to
 complete quite rapidly.  Then nothing for a while.  Then an error
 message on the video screen saying ``Fatal error --- video source
 not ready.''  Then that error message goes away.  Long wait.  Then
 I get audio, but never any video.  Give up.
 
 I'm using Firefox on an Imac; the ``About Mozilla Firefox'' button
 on the Firefox dropdown menu says I've got Mozilla 5.0, Firefox
 2.0.0.2
 --- whatever that means.
 
 Bottom line --- I can't watch the video.
 
 Firefox 3.0.8 on OS X doesn't work either.
 
 But you can go directly to 
 http://www.lecturemaker.com/wp-content/uploads/2009/02/lmpremovie.swf 
 and it will work.
 -- 
 Gad Abraham

AND it works beautfully using FlashPlayer v.9 on Linux too!

This was the version I already had installed when I first failed
to locate the video -- instead seeing the little panel for installing
FlashPlayer, which I ignored, since I already had it, not realising
that it was telling me I should install FlashPhayer 10.

It is now clear that this advice was erroneous, since version 9
in fact works fine!

Furthermore, when (as I previously reported) I did install version 10,
that didn't work either. I later found that version 10 was looking
for versions of Linux libraries which I didn't have; and also that
FlashPlayer no longer worked (e.g.) on the BBC website.

So I re-installed version 9, and everything again worked on the BBC
and for videos in Press reports, etc. But of course still the same
failure for the SFBA video. Now, using the direct URL for the same
video as kindly revealed by Gad, it works and could have worked
all along!

It is, therefore, clear that the HTML in the original URL is doing
the wrong thing, by somehow detecting a version  10 of FlashPlayer
and refusing to cooperate, when this was erroneous.

I (and Jim Porzak and Mike Driscoll) received a private email from
  Ron Fredericks
  Video director and Multimedia expressionist
  http://www.lecturemaker.com
  Ron @ Lecturemaker r...@lecturemaker.com
(whom I'm adding to the address-list for this mail) which stated:


  The video may be hard find, admittedly, if you do not have
  version 10 of flash player installed. If you do not have a
  recent version of Flash in your web browser, I present a
  smallish Adobe image with the words you do not have
  version 10 of flash installed, click to install. 

  Attached is a screenshot of what you should be seeing at this link
  http://www.lecturemaker.com/2009/02/r-kickoff-video/

  I'll take your feedback from the R User Group list server
  as a suggestion to make the need to update your flash player
  a lot bigger on the screen. 

  Ron Fredericks

So, message to Ron Fredericks: The video works with version 9.
If (for some reason) it is essential for the original URL to
detect when version 10 is not installed, then there should be
a pointer on the folloowing lines:

  You appear not to have version 10 of Flach Player installed.
   If you are using an earlier version, please click on the
   following link . 

which would take people to the URL provided by Gad, which does work
on earlier versions (well, version 9 at least). Then it would no
longer be the case that The video may be hard find, admittedly,
if you do not have version 10 of flash player installed.!

My thanks to everyone who hsd helped to delve into this tangle,
and especially to Gad who found where the solution was lurking!

Best wishes to all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Apr-09   Time: 11:17:02
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A query about na.omit

2009-04-01 Thread Ted Harding

On 01-Apr-09 15:49:40, Jose Iparraguirre D'Elia wrote:
 Dear all,
 Say I have the following dataset:
  
 DF
 x y z
 [1]   1 1 1
 [2]   2 2 2
 [3]   3 3NA
 [4]   4   NA   4
 [5]  NA  5 5
  
 And I want to omit all the rows which have NA, but only in columns X
 and Y, so that I get:
  
  x  y  z
 1  1  1
 2  2  2
 3  3  NA

Roll up your sleeves, and spell out in detail the condition you need:

  DF-data.frame(x=c(1,2,3,4,NA),y=c(1,2,3,NA,5),z=c(1,2,NA,4,5))
  DF
#x  y  z
# 1  1  1  1
# 2  2  2  2
# 3  3  3 NA
# 4  4 NA  4
# 5 NA  5  5

  DF[!(is.na(rowSums(DF[,(1:2)]))),]
#   x y  z
# 1 1 1  1
# 2 2 2  2
# 3 3 3 NA

Hoping this helps,
Ted.

 If I use na.omit(DF), I would delete the row for which z=NA, obtaining
 thus
  
 x y z
 1 1 1
 2 2 2
  
 But this is not what I want, of course. 
 If I use na.omit(DF[,1:2]), then I obtain
  
 x y 
 1 1
 2 2
 3 3
  
 which is OK for x and y columns, but I wouldn't get the corresponding
 values for z (ie 1 2 NA)
  
 Any suggestions about how to obtain the desired results efficiently
 (the actual dataset has millions of records and almost 50 columns, and
 I would apply the procedure on 12 of these columns)?
  
 Sincerely,
  
 Jose Luis 
  
 Jose Luis Iparraguirre
 Senior Research Economist 
 Economic Research Institute of Northern Ireland
  
  
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 01-Apr-09   Time: 18:00:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] digits in round()

2009-03-31 Thread Ted Harding

Hi Folks,

Compare

  print(1234567890,digits=4)
# [1] 1.235e+09
  print(1234567890,digits=5)
# [1] 1234567890

Granted that

  digits: a non-null value for 'digits' specifies the minimum
  number of significant digits to be printed in values.

how does R decide to switch from the 1.235e+09 (rounded to
4 digits, i.e. the minumum, in e notation) to 1234567890 (the
complete raw notation, 10 digits) when 'digits' goes from 4 to 5?

Thanks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09   Time: 13:59:37
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Ted Harding

On 30-Mar-09 22:13:04, Jim Porzak wrote:
 Next week Wednesday evening, April 8th, Mike Driscoll will be talking
 about Building Web Dashboards using R
 see: http://www.meetup.com/R-Users/calendar/9718968/ for details  to
 RSVP.
 
 Also of interest, our member Ron Fredericks has just posted a well
 edited video of the February kickoff panel discussion at Predictive
 Analytics World The R and Science of Predictive Analytics: Four Case
 Studies in R with
 * Bo Cowgill, Google
 * Itamar Rosenn, Facebook
 * David Smith, Revolution Computing
 * Jim Porzak, The Generations Network
 and chaired by Michael Driscoll, Dataspora LLC
 
 see: http://www.lecturemaker.com/2009/02/r-kickoff-video/
 
 Best,
 Jim Porzak

It could be very interesting to watch that video! However, I have
had a close look at the web page you cite:

  http://www.lecturemaker.com/2009/02/r-kickoff-video/

and cannot find a link to a video. Lots of links to non-video
things, but none that I could see to a video.

There is a link on that page at:
  How Google and Facebook are using R
  by Michael E. Driscoll | February 19, 2009
  http://dataspora.com/blog/predictive-analytics-using-r/

Following that link leads to a page, on which the first link, in:

  (March 26th Update: Video now available)
  Last night, I moderated our Bay Area R Users Group kick-off
  event with a panel discussion entitled The R and Science of
  Predictive Analytics, co-located with the Predictive Analytics
  World conference here in SF.

leads you back to where you came from, and likewise the link at
the bottom of the page:

  A video of the event is now available courtesy of Ron Fredericks
  and LectureMaker.

Could you help by describing where on that web page it can be found?
With thanks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 30-Mar-09   Time: 23:55:07
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use R Group SFBA April meeting reminder; video of Feb k

2009-03-30 Thread Ted Harding

On 30-Mar-09 23:04:40, Jim Porzak wrote:
 Since Sundar beat me to it w/ Firefox 3 test, I checked with IE 7.0 -
 works fine for me there also.
 
 -Jim

Interesting! I'm using Iceweasel 2.0.0.19  (Firefox under another
name) on Linux. I'll have to check out what blocks it has activated!
I put on a fragrantly scented facemask and started IE up on Windows.
Going to the same URL, I now find a big video screen just below
the line:

The R and Science of Predictive Analytics: Four Case in R -- The Video

And it duly plays.

But at the same place in my Firefox, I only see a little button
inviting me to Get Adobe Flash Player. But I already have that
installed for Iceweasel!. Well, maybe it needs updating. Let me
try that ... It says Adobe Flash Player version 10.0.22.87 and
I have flashplayer_9 already there, so ... (some time later) I now
have flashplayer_10 installed, but I still get the same result.
Hmmm  
Well, thanks for helping to locte what the problem might be!
Ted.



 On Mon, Mar 30, 2009 at 4:00 PM, Sundar Dorai-Raj sdorai...@gmail.com
 wrote:
 Could be that you have some sort of ad filter in your browser that's
 blocking the video? It appears just fine for me in Firefox 3.

 On Mon, Mar 30, 2009 at 3:55 PM, Ted Harding
 ted.hard...@manchester.ac.uk wrote:
 On 30-Mar-09 22:13:04, Jim Porzak wrote:
 Next week Wednesday evening, April 8th, Mike Driscoll will be
 talking
 about Building Web Dashboards using R
 see: http://www.meetup.com/R-Users/calendar/9718968/ for details 
 to
 RSVP.

 Also of interest, our member Ron Fredericks has just posted a well
 edited video of the February kickoff panel discussion at Predictive
 Analytics World The R and Science of Predictive Analytics: Four
 Case
 Studies in R with
 _ _ * Bo Cowgill, Google
 _ _ * Itamar Rosenn, Facebook
 _ _ * David Smith, Revolution Computing
 _ _ * Jim Porzak, The Generations Network
 and chaired by Michael Driscoll, Dataspora LLC

 see: http://www.lecturemaker.com/2009/02/r-kickoff-video/

 Best,
 Jim Porzak

 It could be very interesting to watch that video! However, I have
 had a close look at the web page you cite:

 _http://www.lecturemaker.com/2009/02/r-kickoff-video/

 and cannot find a link to a video. Lots of links to non-video
 things, but none that I could see to a video.

 There is a link on that page at:
 _How Google and Facebook are using R
 _by Michael E. Driscoll | February 19, 2009
 _http://dataspora.com/blog/predictive-analytics-using-r/

 Following that link leads to a page, on which the first link, in:

 _(March 26th Update: Video now available)
 _Last night, I moderated our Bay Area R Users Group kick-off
 _event with a panel discussion entitled The R and Science of
 _Predictive Analytics, co-located with the Predictive Analytics
 _World conference here in SF.

 leads you back to where you came from, and likewise the link at
 the bottom of the page:

 _A video of the event is now available courtesy of Ron Fredericks
 _and LectureMaker.

 Could you help by describing where on that web page it can be found?
 With thanks,
 Ted.

 
 E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 30-Mar-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 23:55:07
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09   Time: 00:49:08
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] advice for alternative to barchart

2009-03-30 Thread Ted Harding

 are proportional to the total cars owned by owner i.

The (A) answers (a), and (B) answers (b).

As to how best to achieve this in R, I'm not sure. In any case,
I don't draw sophisticated graphics like this directly in R,
since I would want totally precise control over sizes, shapes
and positions in the result. I in fact use the 'pic' component
of the 'groff' package (though these days it may be possible to
achive the same sort of result in TeX). However (staying on topic)
I would certainly get R to generate the data array from which
the graphic would be drawn by 'pic'. Other graphics software may
also allow this to be easily and precisly done (I would not like
to recommend Excel ... ).

If you were to post the data, or a suitable set of similar artificial
data (or mail to me privately) I would be happy to try my hand at
producing the sort of thing described above.

Then you could judge whether you liked it!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 31-Mar-09   Time: 01:58:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subscription request

2009-03-29 Thread Ted Harding

On 29-Mar-09 11:48:12, Michael Larsson wrote:
 Hello,
 I would like to subscribe to the mailing list. I already receive
 the daily digest, but for some reason I am not subscribed to the
 list, meaning any posts I make by replying to the e-mail digest
 have to be placed on the list by a moderator - incurring significant
 delay.
 
 Thanks,
 Michael

This almost certainly means that some other email address of yours
is subscribed to the list (via which you receive the digests),
whereas the email address from which you try to post to the list
is not subscribed.

You could check this by looking into the full headers of a digest
you have received, to check what address it has been sent to. For
example, in one message to me from R-help I find (about halfway
down the list of headers):

Received: from hypatia.math.ethz.ch ([129.132.145.15]) by
 deimos.mcc.ac.uk with
 esmtps (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD))
 (envelope-from r-help-boun...@r-project.org)
 id 1LndwT-000A37-TR for ted.hard...@manchester.ac.uk;
 Sat, 28 Mar 2009 19:11:42 +

showing (in the for ...  clause) that the R-help list server
(hypatia.math.ethz.ch) addressed it to ted.hard...@manchester.ac.uk

If you find that out, then you could work round the problem by
using that address to post to R-help.

Alternatively, you can simply visit the R-help web page

  https://stat.ethz.ch/mailman/listinfo/r-help

and there subscribe your other email address (the one from which
you wish to post). This will then mean that you will receive the
messages from R-help at both addresses, unless yoou either use
the above web-page to disable sending of mail to one of the
addresses, or use it to unsubscribe the one you do not wish to
post from.

Though a moderator, I cannot check what addresses are subscribed
(only the list owner can do that), so that is as far as I can go
to help.

And I hope it helps!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Mar-09   Time: 15:52:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Burt table from word frequency list

2009-03-29 Thread Ted Harding

On 29-Mar-09 16:32:11, Joan-Josep Vallbé wrote:
 Ok, thank you. And is there any function to get the table directly  
 from the original corpus?
 
 best,
 joan-josep vallbé

You will have to think about what you are doing. As Duncan said,
you need counts of pairs of words or, more precisely, of
co-occurrence. But co-occurrence within what?

Adjacent?
Within the same sentence?
Within the same paragraph?
Within the same chapter?
Within the same document (if your corpus incorporates several
  documents)?
Within documents by the same author?
  If so, then is there an additional classification by
  individual document?

Etc., etc., etc.

In short, what is the structure of your corpus, and how do
you wish this to be represented in the Burt table?

Hoping this helps to move you forward,
Ted.

 On Mar 29, 2009, at 2:00 PM, Duncan Murdoch wrote:
 
 On 29/03/2009 7:02 AM, Joan-Josep Vallbé wrote:
 Dear all,
 I have a word frequency list from a corpus (say, in .csv), where  
 the  first column is a word and the second is the occurrence  
 frequency of  that word in the corpus. Is it possible to obtain a  
 Burt table (a  table crossing all words with each other, i.e.,  
 where rows and columns  are the words) from that frequency list  
 with R? I'm exploring the ca  package but I'm not able to solve  
 this detail.

 No, because you don't have any information on that.  You only have  
 marginal counts.  You need counts of pairs of words (from the  
 original corpus, or already summarized.)

 Duncan Murdoch
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Mar-09   Time: 18:46:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R's validity on MedStats

2009-03-25 Thread Ted Harding

Hi Folks,
Some of you (since you have contributed) are aware of a quite
vigorous discussion currently in progress on the MedStats
Google Group. Others, who possibly could contribute usefully,
may not be.

For the moment it is at the top of the Discussions list at

  http://groups.google.com/group/MedStats

with the title

  {MEDSTATS} Re: The number of requests for R help

If you click on that title, you will be taken to the posting
(by Martin Holt) which initiated the discussion.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 25-Mar-09   Time: 19:47:47
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confusion about ecdf

2009-03-25 Thread Ted Harding

On 25-Mar-09 21:01:49, Mandro Litt wrote:
 Hi,
 I'm bit confused about ecdf (read the help files but still not
 sure about this).  I have an analytical expression for the pdf,
 but want to get the empirical cdf. How do I use this analytical
 expression with ecdf?
 
 If this helps make it concrete, the pdf is:
 
 f(u) = \sum_{t = 1}^T 1/n_t \sum_{i = 1}^{n_t} 1/w K((u - u_{it})/w)
 
 where K = kernel density estimator, w = weights, and u_{it} = data.
 
 Thank you!
 ML

Possibly you first need to be clear about why you need the ECDF,
and at what values of t you need it.

The ECDF is defined for series of values of t. These might be
simply the values in your data, in which case it would be ecdf(u)
where u is the vector of data. It is intended as an estimate,
from the data, of the CDF of the distribution that the data
came from. It basically counts the number of data less than or
equal to a given value, and divides by the total number of data.

Using ecdf() on the values of your estimated density function
would not make a lot of sense.

However, you can base an estimate of the CDF on your kernel
estimate of the density by evaluating at a chosen (finely
distributed) set of u-values, say u1 = 0.01*(-500,500),
and then

  ECDF - cumsum(f(u1))/sum(f(u1))

where f(u) is an implementation, as an R function, of your
expression above.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 25-Mar-09   Time: 21:20:26
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence interval or error of x intercept of a linear

2009-03-24 Thread Ted Harding

On 24-Mar-09 03:31:32, Kevin J Emerson wrote:
 Hello all,
 
 This is something that I am sure has a really suave solution in R,
 but I can't quite figure out the best (or even a basic) way to do it.
 
 I have a simple linear regression that is fit with lm for which I
 would like to estimate the x intercept with some measure of error
 around it (confidence interval). In biology, there is the concept
 of a developmental zero - a temperature under which development
 will not happen. This is often estimated by extrapolation of a
 curve of developmental rate as a function of temperature. This
 developmental zero is generally reported without error. I intend
 to change this! 
 There has to be some way to assign error to this term, I just have
 yet to figure it out.
 
 Now, it is simple enough to calculate the x-intercept itself ( -
 intercept / slope ), but it is a whole separate process to generate
 the confidence interval of it.  I can't figure out how to propagate
 the error of the slope and intercept into the ratio of the two.
 The option option I have tried to figure out is to use the predict
 function to look for where the confidence intervals cross the axis
 but this hasn't been too fruitful either.  
 
 I would greatly appreciate any insight you may be able to share.
 
 Cheers,
 Kevin
 
 Here is a small representative sample of some of my data where
 Dev.Rate ~ Temperature.

The difficulty with the situation you describe is that you are up
against what is known as the Calibration Problem in Statistics.
Essentially, the difficulty is that in the theory of the linear
regression there is sufficient probability for the slope to be
very close to zero -- and hence that the X-intercept maybe
very far out to the left or the right -- that the whole dconcept
of Standard error ceases to exist. Even the expected position of
the intercept does not exist. And this is true regardless of the
data (the one exception would be where it is absolutely certain
that the data will lie exactly on a straight line).

This has been addressed a number of times, and controversally,
in the statistical literature. One approach which I like (having
thought of it myself :)) is the use of likelihood-based confidence
intervals.

Given a linear regression

  Y = a + b*X + E

(where the error E has N(0, sig^2) distribution), suppose you
want to estimate the value X0 of X for which Y = Y0, a given value
(in your case Y0 = 0). For simplicity, measure X and Y from their
means sum(X)/N and sum(Y)/N.

The approach is based on comparing the likelihoods

[A] for unresticted ML estimation of a, b and sig
-- a.hat, b.hat, sig.hat
(where a.hat = 0 because of the origins of X and Y)
[B] for ML estimation assuming a particular value X1 for X0
-- a.tilde, b.tilde, sig.tilde

where
[A] a.hat = 0 (as above),
b.hat = (sum(X*Y))/(sum(X^2)
X0.hat= Y0/b.hat [ = -mean(Y)/b.hat in your case ]
sig.hat^2 = (sum(Y - b.hat*X)^2)/(N+1)

[B] a.tilde = (sum(X^2) - X0*sum(X*Y))/D
b.tilde = ((N+1)*sum(X*Y) + N*X1*Y0)/D
sig.hat.tilde^2
= (sum((Y - a.tilde - b.tilde*X)^2)
+ (Y0 - a.tilde - b.tilde*X1)^2)/(N+1)

where D = (N+1)*sum(X^2 + N*X1^2

Then let R(X1) = (sig.hat^2)/(sig.tilde^2)

A confidence interval for X0 is the set of values of X1 such
that R(X1)  R0 say, where Prob(R(X0)R0) = P, the desired
confidence level, when X0 is the true value.

It can be established that

  (N-2)*(1 - R(X0))/R(X0)

has the F distribution with 1 and (N-1) degrees of freedom.
Since this is independent of X0, the resulting set of values
of X1 constitute a confidence interval.

This was described in

  Harding, E.F. (1986). Modelling: the classical approach.
  The Statistician vol. 35, pp. 115-134
  (see pages 123-126)

and has been later referred to by P.J. Brown (thought I don't
have the references to hand just now).

When I have time for it (not today) I'll see if I can implement
this neatly in R. It's basically a question of solving

  (N-2)*(1 - R(X0))/R(X0) = qf(P,1,(N-1))

for X0 (two solutions, maybe one, if any exist). However, it is
quite possible that no solution exists (depending on P), so that
the confidence interval would then be (-inf,+inf); or, when only
one exists, it will be either (-inf,X0) or (X0,inf).
These possibilities of infinite intervals are directly related
to the possibility that, at significance level alpha = (1-P),
the estimated slope may not differ significantly from 0.

Maybe more later!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 24-Mar-09   Time: 10:04:30
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] variance/mean

2009-03-22 Thread Ted Harding

On 22-Mar-09 08:17:29, rkevinbur...@charter.net wrote:
 At the risk of appearing ignorant why is the folowing true?
 
 o - cbind(rep(1,3),rep(2,3),rep(3,3))
 var(o)
  [,1] [,2] [,3]
 [1,]000
 [2,]000
 [3,]000
 
 and
 
 mean(o)
 [1] 2
 
 How do I get mean to return an array similar to var? I would expect in
 the above example a vector of length 3 {1,2,3}.
 
 Thank you for your help.
 Kevin

This is a consequence of (understandable) confusion about how var()
and mean() operate! It is not explicit, in ?var, that if you apply
var() to a matrix, as in your var(o) you get the covariance matrix
between the columns of 'o' -- except where it says (almost as an
aside) that 'var' is just another interface to 'cov'. Hence in
your example var(o) is equivalent to cov(o). Looked at in this
way, it is now straightforward to expect what you got.

This is, of course, different from what you would expect if you apply
var() to a vector, namely the variance of that series of numbers
(a single value).

On the other hand, mean() works differently. According to ?mean:
  Arguments:
 x: An R object.  Currently there are methods for numeric
data frames, numeric vectors and dates.
  [...]
  Value:
 For a data frame, a named vector with the appropriate method
 being applied column by column.

which may have been what you expected. But a matrix is not a data
frame. Instead, it is an array, which (in effect) is a vector with
an attached dimensions attribute which tells R how to chop it up
into columns etc. -- whereas a data frame has its by-column
structure built in to it.

Now: ?mean says nothing about matrices. Nothing whatever.
So you have to find out the hard way that mean(o) treats the array
'o' as a vector, ignoring its dimensions attribute. Hence you
get a single number, which is the mean of all the values in the
matrix.

In order to get what you are apparently looking for (the means of
the columns of 'o'), you could:

a) (the smooth way) use the apply() function, causing mean() to be
   applied to the second dimension (columns) of 'o':

   apply(o,2,mean)
   # [1] 1 2 3

b) (the heavy way) take a hint from ?mean and feed it a data frame:

   mean(as.data.frame(o))
   # V1 V2 V3
   #  1  2  3 

Hoping this helps to clarify things!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 22-Mar-09   Time: 09:01:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula question

2009-03-17 Thread Ted Harding

On 17-Mar-09 23:04:25, Erin Hodgess wrote:
 Dear R People:
 Here is a small data frame and two particular formulas:
 test.df
 y  x
 1  -0.9261650  1
 2   1.5702700  2
 3   0.1673920  3
 4   0.7893085  4
 5   0.3576875  5
 6  -1.4620915  6
 7  -0.5506215  7
 8  -0.3480292  8
 9  -1.2344036  9
 10  0.8502660 10
 summary(lm(exp(y)~x))
 
 Call:
 lm(formula = exp(y) ~ x)
 
 Residuals:
 Min  1Q  Median  3Q Max
 -1.6360 -0.6435 -0.4722  0.4215  2.9127
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)   2.1689 0.9782   2.217   0.0574 .
 x-0.1368 0.1577  -0.868   0.4108
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 Residual standard error: 1.432 on 8 degrees of freedom
 Multiple R-squared: 0.08604,Adjusted R-squared: -0.0282
 F-statistic: 0.7532 on 1 and 8 DF,  p-value: 0.4108
 
 summary(lm(I(y^2)~x))
 
 Call:
 lm(formula = I(y^2) ~ x)
 
 Residuals:
 Min  1Q  Median  3Q Max
 -0.9584 -0.6387 -0.2651  0.5754  1.4412
 
 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept)  1.100840.62428   1.7630.116
 x   -0.038130.10061  -0.3790.715
 
 Residual standard error: 0.9138 on 8 degrees of freedom
 Multiple R-squared: 0.01764,Adjusted R-squared: -0.1052
 F-statistic: 0.1436 on 1 and 8 DF,  p-value: 0.7146
 

 
 These both work just fine.
 
 My question is:  when do you know to use I() and just the function of
 the variable, please?
 
 thanks in advance,
 Erin
 PS Happy St Pat's Day!

In the case of your formula you will find it works just as well
without I():

 summary(lm(y^2 ~ x))

Call:
lm(formula = y^2 ~ x)

Residuals:
Min  1Q  Median  3Q Max 
-0.9584 -0.6387 -0.2651  0.5754  1.4412 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  1.100840.62428   1.7630.116
x   -0.038130.10061  -0.3790.715


The point of I() is that it forces numerical evaluation in an
expression which could be interpreted as a symbolic model formula.

Thus if X1 and X2 were numeric, and you want to regress Y on the
numerical values of X1*X2, then you should use I(X1*X2), since in

  Y ~ X1*X2

this would be interpreted as (essentially) fitting both linear
terms and their interaction (equivalent to product here), namely
corresponding to

  Y = a + b1*X1 + b2*X2 + b12*X1*X2

In order to force the fitted equation to be

  Y = a + b*X1*X2

you would use Y ~ I(X1*X2). This issue does not arise when
a product is on the left-hand side of the model formula, so
you could simply use X1*X2 ~ Y

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-Mar-09   Time: 23:31:21
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R multiline expression grief

2009-03-13 Thread Ted Harding

On 13-Mar-09 12:55:35, Paul Suckling wrote:
 Dear all.
 After much grief I have finally found the source of some weird
 discrepancies in results generated using R. It turns out that this is
 due to the way R handles multi-line expressions. Here is an example
 with R version 2.8.1:
 
 
# R-script...
 
 r_parse_error - function ()
 {
   a - 1;
   b - 1;
   c - 1;
   d - a + b + c;
   e - a +
 b +
 c;
   f - a
 + b
 + c;
   cat('a',a,\n);
   cat('b',b,\n);
   cat('c',c,\n);
   cat('d',d,\n);
   cat('e',e,\n);
   cat('f',f,\n);
 }
 
 r_parse_error();
 a 1
 b 1
 c 1
 d 3
 e 3
 f 1
 
 
 As far as I am concerned f should have the value 3.
 
 This is causing me endless problems since case f is our house style
 for breaking up expressions for readability. All our code will need to
 be rechecked as a result. Is this behaviour a bug? If not, is it
 possible to get R to generate a warning that several lines of an
 expression are potentially being ignored, perhaps by turning on a
 strict mode which requires the semi-colons?
 
 Thank you,
 Paul

The lines are not being ignored! In

e - a +
  b +
  c;

each line (until the last) is syntactically incomplete, so the R
parser continues on to the next line until the expression is
complete; and the ; is irrelevant for this purpose. Unlike C,
but like (say) 'awk', the ; in R serves to terminate an expression
when this is followed on the same line by another one, so it is
basically a separator.

In

f - a
  + b
  + c;

however, f - a is complete, so the value of 'a' is assigned
to f. The line + b would have sent the value of 'b' (the +
being the unary operator + which does not change anything)
to the console if it did not occur inside a function definition.
As it is, although + b is evaluated, because it is inside the
function no putput is produced. Similarly for + c; (and, once
again, the ; is irrelevant since a ; at the end of a line
does nothing -- unless the line was syntatically incomplete at
that point, in which case ; as the expression terminator
would trigger a syntax error since an incomplete expression
was being terminated. So

f - a
  + b
  + c;

is not a multiline expression. It is three expressions on three
separate lines.

The only suggestion I can make is that you have to change your
house style -- it is at odds with the way the R parser works,
and is bound to cause much grief.

Best wishes, and good luck!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-Mar-09   Time: 14:16:08
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pseudo-random numbers between two numbers

2009-03-11 Thread Ted Harding

Because of a mistake I made in copying code into email, there has
been confusion (expressed in some private emails). Hence the
corrected version is appended at the end.

On 11-Mar-09 00:05:26, Ted Harding wrote:
 I have modified my example to make it more convincing! See at end.
 
 On 10-Mar-09 23:39:17, Ted Harding wrote:
 On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote:
 Please forget the last email I sent with the same subject.
 =
 I would like to generate pseudo-random numbers between two numbers
 using R, up to a given distribution, for instance, norm. That is
 something like
 rnorm(HowMany,Min,Max,mean,sd)
 over rnorm(HowMany,mean,sd).
 I am wondering if
 
 qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
   mean, sd)
 
 is good. Any idea? Thanks.
 -james
 
 It would certainly work as intended! For instance:
 
 truncnorm-function(HowMany,Min,Max,mean=0,sd=1){
   qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
 mean, sd)}
 Sample - truncnorm(1000,-1,2.5)
 hist(Sample,breaks=100)
 
 Hoping this helps,
 Ted.
 
 truncnorm-function(HowMany,Min,Max,mean=0,sd=1){
   qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
 mean, sd)}
 Sample - truncnorm(10,-1,2.5)
 hist(Sample,breaks=100)
 u0-(-0.975)+0.05*(0:69)
 yy-dnorm(u0)
 du-min(diff(H$breaks))
 Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0))
 lines(u0,Y)
 
 Ted.

Corrected version:
--

truncnorm-function(HowMany,Min,Max,mean=0,sd=1){
  qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
mean, sd)}
Sample - truncnorm(10,-1,2.5)
H - hist(Sample,breaks=100)
u0-(-0.975)+0.05*(0:69)
yy-dnorm(u0)
du-min(diff(H$breaks))
Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0))
lines(u0,Y)


Apologies for the error.
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 11-Mar-09   Time: 19:48:12
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pseudo-random numbers between two numbers

2009-03-10 Thread Ted Harding

On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote:
 Please forget the last email I sent with the same subject.
 =
 I would like to generate pseudo-random numbers between two numbers
 using R, up to a given distribution, for instance, norm. That is
 something like
 rnorm(HowMany,Min,Max,mean,sd)
 over rnorm(HowMany,mean,sd).
 I am wondering if
 
 qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
   mean, sd)
 
 is good. Any idea? Thanks.
 -james

It would certainly work as intended! For instance:

truncnorm-function(HowMany,Min,Max,mean=0,sd=1){
  qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
mean, sd)}
Sample - truncnorm(1000,-1,2.5)
hist(Sample,breaks=100)

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Mar-09   Time: 23:39:13
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pseudo-random numbers between two numbers

2009-03-10 Thread Ted Harding

I have modified my example to make it more convincing! See at end.

On 10-Mar-09 23:39:17, Ted Harding wrote:
 On 10-Mar-09 23:01:45, g...@ucalgary.ca wrote:
 Please forget the last email I sent with the same subject.
 =
 I would like to generate pseudo-random numbers between two numbers
 using R, up to a given distribution, for instance, norm. That is
 something like
 rnorm(HowMany,Min,Max,mean,sd)
 over rnorm(HowMany,mean,sd).
 I am wondering if
 
 qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
   mean, sd)
 
 is good. Any idea? Thanks.
 -james
 
 It would certainly work as intended! For instance:
 
 truncnorm-function(HowMany,Min,Max,mean=0,sd=1){
   qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
 mean, sd)}
 Sample - truncnorm(1000,-1,2.5)
 hist(Sample,breaks=100)
 
 Hoping this helps,
 Ted.

truncnorm-function(HowMany,Min,Max,mean=0,sd=1){
  qnorm(runif(HowMany, pnorm(Min,mean,sd), pnorm(Max,mean, sd)),
mean, sd)}
Sample - truncnorm(10,-1,2.5)
hist(Sample,breaks=100)
u0-(-0.975)+0.05*(0:69)
yy-dnorm(u0)
du-min(diff(H$breaks))
Y - 10*yy*du/(pnorm(2.5)-pnorm(-1.0))
lines(u0,Y)

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 11-Mar-09   Time: 00:05:23
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] popular R packages

2009-03-09 Thread Ted Harding

On 10-Mar-09 01:07:54, David Duffy wrote:
 Given we are talking about statistical software, one bibliometric
 measure of relative package popularity is scientific citations.
 Web of Science is not too useful where the citation has been to a
 website or computer package, but Google Scholar for lme4: Linear
 mixed-effects models using S4 classes gives us 108 journal
 citations; mgcv: GAMs and generalized ridge regression for R 80 etc
 
 Cheers, David Duffy.

A good point. But such numbers must be considered in the context
of the prevalence of the kind of study for which the respective
methods would be used.

A great number of epidemiological studies would be suitable for
application of glm(). Fewer would involve GAMs. Popularity of
a package by citation frequency would (other things being equal)
be proportional to the frequency of the kind of study for which
it could be used.

So one  should either evaluate the proportion of studies in which
an R package *could* be used, in which it *was* used; or compare
the number of citations of an R package with the number of citations
of an equiavlent package/module/proc in other software.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 10-Mar-09   Time: 02:03:22
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] popular R packages

2009-03-08 Thread Ted Harding

On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
 On 08/03/2009 10:49 AM, hadley wickham wrote:
 More seriously : I don't think relative numbers of package downloads
 can be interpreted in any reasonable way, because reasons for
 package download have a very wide range from curiosity (what's
 this ?), fun (think fortunes...), to vital need tthink lme4
 if/when a consensus on denominator DFs can be reached :-)...).
 What can you infer in good faith from such a mess ?
 
 So when we have messy data with measurement error, we should just
 give up?  Doesn't sound very statistical! ;)
 
 I think the situation is worse than messy.  If a client comes in with 
 data that doesn't address the question they're interested in, I think 
 they are better served to be told that, than to be given an answer that
 is not actually valid.  They should also be told how to design a study 
 that actually does address their question.
 
 You (and others) have mentioned Google Analytics as a possible way to 
 address the quality of data; that's helpful.  But analyzing bad data 
 will just give bad conclusions.
 Duncan Murdoch

The population of R users (which we would need to sample in order
to obtain good data) is probably more elusive than a fish population
in the ocean -- only partially visible at best, and with an unknown
proportion invisible.

At least in Fisheries research, there are long established capture
techniques (from trawling to netting to electro-fishing to ... )
which can be deployed, for research purposes, in such a way as to
potentially reach all members of a target population, with at least
a moderately good approximation to random sampling. What have we
for R?

Come to think of it, electro-fishing, ...

Suppose R were released with 2 types of cookie embedded in base R.
Each type is randomly configured, when R is first run, to be Active
or Inactive (probability of activation to be decided at the design
stage ... ). Type 1, if active, on a certain date generates an
event which brings it to the notice of R-Core (e.g. by clandestine
email or by inducing a bug report). Type 2 acts similarly on a later
date. If Type 2 acts, it carries with it information as to whether
there was a Type 1 action along with whether, apparently, the Type 1
action succeeded.

We then have, in effect, an analogue of the Mark-Recapture technique
of population estimation (along with the usual questions about
equal catchability and so forth).

However, since this sort of thing (which I am not proposing seriously,
only for the sake of argument) is undoubtedly unethical (and would
do R's reputation no good if it came to light), I tentatively conclude
that the population of R users is likely to remain as elusive as ever.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09   Time: 16:11:44
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R on netbooks et al?

2009-03-08 Thread Ted Harding

On 08-Mar-09 17:44:18, Douglas Bates wrote:
 On Sun, Mar 8, 2009 at 7:08 AM, Michael Dewey i...@aghmed.fsnet.co.uk
 wrote:
 At 08:47 05/03/2009, herrdittm...@yahoo.co.uk wrote:
 Dear useRs,
 With the rise of netbooks and 'lifestyle laptops I am tempted
 to get one of these to mainly run R on it. Processor power and
 hard disk space seem to be ok. What I wonder is the handling and
 feel with respect to R.

 Has anyone here installed or is running R on one of these, and
 if so, what is your experience? Would it be more of a nice looking
 gadget than a feasable platform to do some stats on?

 One issue is whether you wish to use Linux or Windows. If you do
 use Linux I would advise picking a netbook with one of the standard
 distributions. The early EEE PC had Xandros and dire warnings about
 using the Debian repositiories. In fact I had no problem despite a
 total lack of experience although I am not sure what will happy with
 the recent move to lenny.
 
 Because I have used Debian Linux and Debian-based distributions
 like Ubuntu for many years, I installed a eee-specific version of
 Ubuntu within a day or two of getting an ASUS eee pc1000. There are
 currently at least two versions of Ubuntu, easy peasy and eeebuntu,
 that are specific to the eee pc models.  I started with easy peasy
 at the time it was called something else (Ubuntu eee?) and later
 switched to eeebuntu. In both cases packages for the latest versions
 of R from the Ubuntu package repository on CRAN worked flawlessly.
 
 I find the netbook to be very convenient.  Having a 5 hour battery
 life and a weight of less than 3 pounds is wonderful. I teach all of
 my classes with it and even use it at home (attached to a monitor,
 USB keyboard and mouse and an external hard drive) in lieu of a
 desktop computer. (I have been eyeing the eee box covetously
 but have not yet convinced myself that I really need yet another
 computer). I develop R packages on it and don't really notice that
 it is under-powered by today's standards. Of course, when I
 started computing and even when I started working with the S
 language the memory capacity of computers was measured in kilobytes
 so the thought of only 1Gb of memory doesn't cause me to shriek
 in horror.

Thanks for sharing your experiences, Doug. Given that devices like
the EeePC are marketed in terms of less demanding users, it's good
to know what it is like for a hard user. Further related comments
would be welcome!

I have to agree about the RAM issue too. My once-trusty old Sharp
MZ-80B CP/M machine (early 1980s), with its 64KB and occupying
a good 0.25 m^3 of physical space, would have to be replicated
2^14 = 16384 times over to give the same RAM (and occupy some
400 m^3 of space, say 7.4m x 7.4m x 7.4m, or about the size of
my house). Now I have things on my desk, about the size of my
thumb, with 8MB in each.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09   Time: 18:20:45
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PCA and categorical data

2009-03-06 Thread Ted Harding

On 06-Mar-09 09:25:26, Prof Brian Ripley wrote:
 You might want to look into correspondence analysis, which has several 
 variants of PCA designed for categorical data.

In particular, have a look at the results of

  RSiteSearch(correspondence)

Ted.

 On Fri, 6 Mar 2009, Galanidis Alexandros wrote:
 
 Hi all,

 I' m trying to figure out if it is appropriate to do a PCA having only
 categorical data (not ordinal). I have only find the following quote:

 One method to find such relationships is to select appropriate
 variables and
 to view the data using a method like Principle Components Analysis
 (PCA) [4].
 This approach gives us a clear picture of the data using KL-plot of
 the PCA.
 However, the method is not settled for the data including categorical
 data.
 [http://hp.vector.co.jp/authors/VA038807/personal/covEigGiniRep17.pdf]

 but I'm still not sure if it WRONG to do so.
 
 Since normally categorical data is taken to be binomial or Poisson 
 distributed, the variance varies with the mean and least-squares (the 
 basis of PCA) is then sub-optimal.  Correspondence analysis takes that 
 into account (at least to some extent).
 
 Any opinion or reference would be very helpful
 
 There is a basic introduction in MASS4, with references to more 
 comprehensive accounts.
 
 thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 06-Mar-09   Time: 09:46:15
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fitting a gompertz model through the origin using nls

2009-03-06 Thread Ted Harding

(-a*b*Day.max))/b)

which again could be fitted using nls, with paramaters a and b
to be fitted (assuming Day.max given).
b[*] One could generalise this law to, e.g. a power law:

  (dn/dt) = a*((1-b*n)^c)

whose solution (satisfying n=0 at t=0) is

  n = (1 - 1/((1 - a*b*(c-1)*t)^(1/(c-1/b

(which can possibly be usefully simplified!)

Hoping this helps -- and that maybe people with more knowledge
of these things can contribute more directly relevant suggestions.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 06-Mar-09   Time: 13:39:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Number Regular Expressions

2009-03-04 Thread Ted Harding

On 04-Mar-09 07:55:11, Bob Roberts wrote:
 Hi,
   I'm trying to write a regular expression that captures numbers in the
 form 9,007,653,372,262.48 but does not capture dates in the form
 09/30/2005
 I have tried numerous expressions, but they all seem to return the
 dates as well. Thanks.

Testing the regular expression

  [0123456789,.][^/]*

with grep (on Linux, R is not involved in this test):

$ grep '[ ][0123456789,.][^/]*[ ]'  EOT
 A line with no numbers
 This is 9,007,653,372,262.48 one line
 Another no-numbers
 This is 09/30/2005 another line
 Yet another no-numbers
 EOT
This is 9,007,653,372,262.48 one line

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09   Time: 09:30:17
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Diff btw percentile and quantile

2009-03-04 Thread Ted Harding

On 04-Mar-09 16:10:29, megh wrote:
 Yes, I aware of those definitions. However I wanted to know the
 difference btw the words Percentile and quantile, if any.
 Secondly your link navigates to some non-english site, which I could
 not understand. 

Percentile and quantile are in effect the same thing.
The difference is in how they express what they refer to.

For example, the Median of a distribution is the 0.5 Quantile,
and is the 50% percentile.

So, for 0 = p = 1, refer to either the p-th quantile,
or to the (100*p)-th percentile.

Thus R has the function quantile(), whose ?quantile states:
 The generic function 'quantile' produces sample quantiles
 corresponding to the given probabilities. The smallest
 observation corresponds to a probability of 0 and the
 largest to a probability of 1.

R (in its basic distribution) does now have a function percentile(),
but, given a series P of percentages, e.g.
  P - c(1,5,10,25,50,75,90,95,99)
one could obtain the equivalent as
  quantile(X,probs=P/100).

So, with reference to your original question
 Excel has percentile() function. R function quantile() does the
  same thing. Is there any significant difference btw percentile
  and quantile?
the answer is that they in effect give the same results, though
differ with respect to how they are to be fed (quantile eats
probabilities, percentile eats percentages). [Though (since I am
not familiar with Excel) I cannot rule out that Excel's percentile()
function also eats probabilities; in which case its name would be
an example of sloppy nomenclature on Excel's part; which I cannot
rule out on general grounds either].

Ted.

 Dieter Menne wrote:
 megh megh74 at yahoo.com writes:
 To calculate Percentile for a set of observations Excel has
 percentile() function. R function quantile() does the same thing.
 Is there any significant difference btw percentile and quantile?
 
 If you check the documentation of quantile, you will note that
 there are 9 variants of quantile which may give different values
 for small sample sizes and many ties.
 
 I found a German page that explains the algorithm Excel uses:
 
 http://www.excel4managers.de/index.php?page=quantile01
 
 but I did not check if which of the R-variants this is equivalent to.
 
 Dieter


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09   Time: 16:31:42
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Diff btw percentile and quantile

2009-03-04 Thread Ted Harding

On 04-Mar-09 16:56:14, Wacek Kusnierczyk wrote:
 (Ted Harding) wrote:
 snip
 So, with reference to your original question
  Excel has percentile() function. R function quantile() does the
   same thing. Is there any significant difference btw percentile
   and quantile?
 the answer is that they in effect give the same results, though
 differ with respect to how they are to be fed (quantile eats
 probabilities, percentile eats percentages). [Though (since I am
 not familiar with Excel) I cannot rule out that Excel's percentile()
 function also eats probabilities; in which case its name would be
 an example of sloppy nomenclature on Excel's part; which I cannot
 rule out on general grounds either].
 
 i am not familiar enough with excel to prove or disprove what you say
 above, but in general such claims should be grounded in the respective
 documentations. 
 
 there are a number of ways to compute empirical quantiles (see, e.g.,
 [1]), and it's possible that the one used by r's quantile by default
 (see ?quantile) is not the one used by excel (where you probably have
 no choice;  help in oocalc does not specify the method, and i guess
 that excel's does not either).
 
 have you actually confirmed that excel's percentile() does the same as
 r's quantile() (modulo the scaling)?
 vQ

I have now googled around a bit. All references to the Excel
percentile() function say that you feed it the fractional value
corresponding to the percentage. So, for example, to get the
80-th percentile you would give it 0.8. Hence Excel should call
it quantile!

As to the algorithm, Wikipedia states the following (translated
into R syntax):

  Many software packages, such as Microsoft Excel, use the
  following method recommended by NIST[4] to estimate the
  value, vp, of the pth percentile of an ascending ordered
  dataset containing N elements with values v[1],v[2],...,v[N]:

n = (p/100)*(N-1) + 1

  n is then split into its integer component, k and decimal
  component, d, such that n = k + d.
  If k = 1, then the value for that percentile, vp, is the
  first member of the ordered dataset, v[1].
  If k = N, then the value for that percentile, vp, is the
  Nth member of the ordered dataset, v[N].
  Otherwise, 1  k  N and vp = v[k] + d*(v[k + 1] - v[k]).

Note that the Wikipedia article uses the % interpretation of
p-th percentile, i.e. the point which is (p/100) of the way
along the distribution.

It looks as though R's quantile with type=4 might be the same,
since it is explained as linear interpolation of the empirical
cdf, which is what the above description of Excel's method does.
However, R's default type is 7, which is different.

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Mar-09   Time: 17:29:50
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] While with 2 conditions

2009-02-27 Thread Ted Harding

On 27-Feb-09 19:37:50, MarcioRibeiro wrote:
 
 Hi listers,
 I check it out at the messages... But I didn't find how do I use
 the while with 2 conditions...
 while (a1 ? b1){
 a-2
 b-2
 }
 Should I use what...
 Thanks in advance,
 Marcio

while((a1)(b1)){
  a-2
  b-2
}

is the way to combine the two conditions. However, I would not
want to run that particular little program ...

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 28-Feb-09   Time: 00:16:10
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Moving Average

2009-02-26 Thread Ted Harding

On 26-Feb-09 13:54:51, David Winsemius wrote:
 I saw Gabor's reply but have a clarification to request. You say you  
 want to remove low frequency components but then you request smoothing 
 functions. The term smoothing implies removal of high-frequency  
 components of a series.

If you produce a smoothed series, your result of course contains
the low-frequency comsponents, with the high-frequency components
removed.

But if you then subtract that from the original series, your result
contains the high-frequency components, with the low-frequency
compinents removed.

Moving-average is one way of smoothing (but can introduce periodic
components which were not there to start with).

Filtering a time-series is a very open-ended activity! In many
cases a useful start is exploration of the spectral properties
of the series, for which R has several functions. 'spectrum()'
in the stats package (loaded bvy default) is one basic function.
help.search(time series) will throw up a lot of functions.

You might want to look at package 'ltsa' (linear time series
analysis).

Alternatively, if yuou already have good information about the
frequency-structure of the series, or (for instance) know that
it has a will-defined seasonal component, then you could embark
on designing a transfer function specifically tuned to the job.
Have a look at RSiteSearch({transfer function})

Hoping this helps,
Ted.



 If smoothing really is your goal then additional R resource would be  
 smooth.spline, loess (or lowess), ksmooth, or using smoothing terms in 
 regressions. Venables and Ripley have quite a few worked examples of  
 such in MASS.
 
 -- 
 David Winsemius
 
 
 On Feb 26, 2009, at 7:07 AM, mau...@alice.it wrote:
 
 I am looking for some help at removing low-frequency components from  
 a signal, through Moving Average on a sliding window.
 I understand thiis is a smoothing procedure that I never done in my  
 life before .. sigh.

 I searched R archives and found rollmean, MovingAverages {TTR},  
 SymmetricMA.
 None of the above mantioned functions seems to accept the smoothing  
 polynomial order and the sliding window with as input parameters.  
 Maybe I am missing something.

 I wonder whether there is some building blocks in R if not even a  
 function which does it all (I do not expect that much,though).
 Even some literature references and/or tutorials are very welcome.

 Thank you so much,
 Maura



 tutti i telefonini TIM!


  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-09   Time: 14:54:43
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Singularity in a regression?

2009-02-26 Thread Ted Harding


On 26-Feb-09 12:58:49, Bob Gotwals wrote:
 R friends,
 
 In a matrix of 1s and 0s, I'm getting a singularity error.  Any helpful
 ideas?

From the degress of freedom in your output, it seems you are fitting
10 binary variables to a total of 23 observations. In such circumstances,
it is not unlikely that the matrix of 0s and 1s representing the
binary variables would have at least 1 column which can be represented
as a linear combination of the others (which is what the 1 not defined
because of singularities means). Get more data, or use fewer variables!
Or, also worth considering, check whether there are relationahips
in the real world between your 10 variables which would tend to
generate such linear dependence.

Ted.

 lm(formula = activity ~ metaF + metaCl + metaBr + metaI + metaMe +
  paraF + paraCl + paraBr + paraI + paraMe)
 
 Residuals:
 Min 1Q Median 3QMax
 -4.573e-01 -7.884e-02  3.469e-17  6.616e-02  2.427e-01
 
 Coefficients: (1 not defined because of singularities)
  Estimate Std. Error t value Pr(|t|)
 (Intercept)   7.9173 0.1129  70.135   2e-16 ***
 metaF-0.3973 0.2339  -1.698 0.115172
 metaClNA NA  NA   NA
 metaBr0.3454 0.1149   3.007 0.010929 *
 metaI 0.4827 0.2339   2.063 0.061404 .
 metaMe0.3654 0.1149   3.181 0.007909 **
 paraF 0.7675 0.1449   5.298 0.000189 ***
 paraCl0.3400 0.1449   2.347 0.036925 *
 paraBr1.0200 0.1449   7.040 1.36e-05 ***
 paraI 1.3327 0.2339   5.697 9.96e-05 ***
 paraMe1.2191 0.1573   7.751 5.19e-06 ***
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 Residual standard error: 0.2049 on 12 degrees of freedom
 Multiple R-squared: 0.9257,   Adjusted R-squared: 0.8699
 F-statistic: 16.61 on 9 and 12 DF,  p-value: 1.811e-05
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-09   Time: 15:07:40
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] windows vs. linux code

2009-02-25 Thread Ted Harding

There is one MAJOR issue you will have to watch out for, far more
likely to turn up than calls like system().

This is that, if you want to have two or more plotting windows
in use at the same time, while the first one is autoatically
opened by the plot() command, you will have to open additional
ones explcitily.

In Linux, the command is X11() [possibly with paramaters, though
usually you don't need to bother].

In Windows, it is windows() [ditto].

I run R on Linux, so use the X11() command. However, If I write
a script which would also be run on a Windows system, I write
using windows() in the first instance, but with a conditional
alias to X11():

 if(length(grep(linux,R.Version()$os))){
  windows - function( ... ) X11( ... )
}

and put this at the beginning of the code file. Then, if the code
is run on a Windows machine, the function call windows() does the
Windows thing; but if the code is run on Linux then the above test
detects that, and defines a function windows() which does the same
as X11().

Ted.

On 26-Feb-09 01:25:36, Sherri Heck wrote:
 i am asking if, in general, r code can be written on a linux-based 
 system and then run on a windows-based system. 
 
 Rolf Turner wrote:

 On 26/02/2009, at 2:08 PM, Sherri Heck wrote:

 Dear All-

 I have been given some Rcode that was written using a Linux OS, but I
 use Windows-based R.  The person that is giving it to me said that it
 needs to run on a Linux system.  Does anyone have any insight and/or
 can
 verify this.  I haven't yet obtained the code, so I haven't been able
 to
 try it yet.

 Despite the knowledge, wisdom, insight, skill, good looks, and other
 admirable characteristics of the members of the R-help list, few of
 us are skilled in telepathy or clairvoyance.

 cheers,

 Rolf Turner

 ##
 Attention:This e-mail message is privileged and confidential. If you 
 are not theintended recipient please delete the message and notify the
 sender.Any views or opinions presented are solely those of the author.

 This e-mail has been scanned and cleared by 
 MailMarshalwww.marshalsoftware.com
 ##
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 26-Feb-09   Time: 03:58:35
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] type III effect from glm()

2009-02-19 Thread Ted Harding

On 19-Feb-09 10:38:50, Simon Pickett wrote:
 Hi all,
 
 This could be naivety/stupidity on my part rather than a problem
 with model output, but here goes
 
 I have fitted a fairly simple model 
 
 m1-glm(count~siteall+yrs+yrs:district,family=quasipoisson,
 weights=weight,data=m[x[[i]],])
 
 I want to know if yrs (a continuous variable) has a significant
 unique effect in the model, so I fit a simplified model with the
 main effect ommitted...
 
 m2-glm(count~siteall+yrs:district,family=quasipoisson,
 weights=weight,data=m[x[[i]],])

So, above, you have fitted two models: m1, m2

 then compare models using anova()
 anova(m1,m1b,test=F)

And here you are comparing two models: m1, m1b

Could this be the reason for your result?

 Analysis of Deviance Table
 
 Model 1: count ~ siteall + yrs + yrs:district
 Model 2: count ~ siteall + yrs:district
   Resid. Df Resid. Dev   Df Deviance F Pr(F)
 1  1936  75913   
 2  1936  7591300 
 
 The d.f.'s are exactly the same, is this right? Can I only test the
 significance of a main effect when it is not in an interaction? 
 
 Thanks in advance,
 Simon.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 19-Feb-09   Time: 10:56:12
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The Origins of R

2009-02-04 Thread Ted Harding

On 04-Feb-09 20:45:04, Nutter, Benjamin wrote:
 snip
 Those of us on this list (with the possible exception of one or
 two nutters) would take it that it goes without saying that R was
 developed on the basis of S --- we all ***know*** that.  
 snip
 
 Just want to clarify that the nutters referred to here are not
 the same as the Nutters that bear my name :-)

Surely the Nutters are a Movement or a Party[1] whose members
are nutters?

[1] In the UK we have long had the Monster Raving Loony Party,
which (at least in a 1990 bye-election) made a serious dent
in the political scene.
See: http://en.wikipedia.org/wiki/Monster_Raving_Looney_Party

:-)
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 04-Feb-09   Time: 21:04:55
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comparing the previous and next entry of a vector

2009-01-25 Thread Ted Harding

On 25-Jan-09 22:29:25, Jörg Groß wrote:
 Hi,
 I have a quit abstract problem, hope someone can help me here.
 I have a vector like this:
 
 x - c(1,2,3,4,5,2,6)
 x
 [1] 1 2 3 4 5 2 6
 
 now I want to get the number where the previous number is 1 and the  
 next number is 3
 (that is the 2 at the second place)
 
 I tried something with tail(x, -1) ...
 with that, I can check  the next number, but how can I check the  
 previous number?

  x  - c(1,2,3,4,5,2,6)
  x0 - x[1:(length(x)-2)]
  x1 - x[3:(length(x))]

  x[which((x0==1)(x1==3))+1]
# [1] 2

Or, changing the data:

  x  - c(4,5,1,7,3,2,6,9,8)
  x0 - x[1:(length(x)-2)]
  x1 - x[3:(length(x))]
  x[which((x0==1)(x1==3))+1]
# [1] 7

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 25-Jan-09   Time: 22:47:15
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Special function? [non-R query]

2009-01-21 Thread Ted Harding

Greetings all,

Sorry for posting what is primarily not an R question
(though it will have an R target in due course).

Let F(x) be the CDF of the Normal -- i.e. pnorm(x)
Let f(x) be the density function  -- i.e. dnorm(x)

Define G(psi) = Integral[-inf,inf] F(x)*f(x)*exp(x*psi) dx

Is G(psi) a known special function? If so, can anyone
point me to a reference for its properties?

With thanks,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 22-Jan-09   Time: 00:58:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame: how to extract parts

2009-01-17 Thread Ted Harding

On 17-Jan-09 02:57:13, Jörg Groß wrote:
 Hi,
 I have a problem with the R syntax.
 It's perhaps pretty simple, but I don't understand it ...
 
 I can extract a column from a data.frame with the following code
 for example ...
 
 b$row1[b$row1 == male]
 
 so I see all male-entries.
 
 But I cannot extract all lines of a data.frame depending on this  
 criterium;
 
 only.male - b[b$row1 == male]
 
 With that, I get an undefined columns selected message.
 
 So how can I extract lines of a data.frame depending on a level  
 variable?

  only.male - b[b$row1 == male,]

should work.
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jan-09   Time: 09:48:43
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If...while...or what else??

2009-01-13 Thread Ted Harding

On 13-Jan-09 13:20:54, Niccolò Bassani wrote:
 Dear R users,I come to you with a quite silly question I think,
 but I hope you can answer me...
 That is, I've got some problems in using the if and while conditions
 in a loop.
 Substantially, I want to select rows in a dataset depending on an
 index variable (suppose it ranges from 1 to 5), so to make a specific
 analysis (homemade of course) on thie new dataset. Mi first tought
 has been to do a double loop like this:
 
 for i in 1:5{
 for j in (i+1):5{
 data = dataset[(variable==i) | (variable==j),]
##analysis
##analysis
 }
 }
 
 This way I should select all the couples only once (gaining in
 efficiency I hope). The fact is that this arrangement has a problem:
 that j ends up with ranging from 2 to 6, not from to 2 to 5. So when
 I do a subsetting on the dataset to obtain only the rows corresponding
 to the values of i and j I want, when the loop comes to j = 6 I have
 an error, of course.
 What I want to know is: how can I use the if or while condition in such
 a loop to avoid the routine doing the computations for this case?
 I.e., can I tell somehow R Hey, if j=6, let it go and move on with
 the other computations?
 Or maybe you can see a faster and better way of using the for
 conditions??
 
 I hope I made myself clear, if not I'll carify myself!!
 
 Thanks in advance
 Niccolò

Presumably the foloowing summarises the situation you want to avoid:

  for(i in (1:5)){for(j in ((i+1):5)){ print(c(i,j))}}
# [1] 1 2
# [1] 1 3
# [1] 1 4
# [1] 1 5
# [1] 2 3
# [1] 2 4
# [1] 2 5
# [1] 3 4
# [1] 3 5
# [1] 4 5
# [1] 5 6
# [1] 5 5

I.e. you get (5,6) when case 6 is not wanted. If (as seems likely)
you don't want (5,5) either (i.e. you only want all pairs (i,j)
from 1:5 with ij), then the following does it:

  for(i in (1:4)){for(j in ((i+1):5)){ print(c(i,j))}}
# [1] 1 2
# [1] 1 3
# [1] 1 4
# [1] 1 5
# [1] 2 3
# [1] 2 4
# [1] 2 5
# [1] 3 4
# [1] 3 5
# [1] 4 5

However, if for some reason you do want (5,5) as well, then:

  for(i in (1:5)){for(j in (min(5,(i+1)):5)){ print(c(i,j))}}
# [1] 1 2
# [1] 1 3
# [1] 1 4
# [1] 1 5
# [1] 2 3
# [1] 2 4
# [1] 2 5
# [1] 3 4
# [1] 3 5
# [1] 4 5
# [1] 5 5

Also (though I suspect it was simply due to hasty typing), the syntax
of what you wrote above:

  for i in 1:5{
   for j in (i+1):5{
 ...
   }
  }

will not work, since you must write for( ... ) and not for ... :

  for i in (1:5)
# Error: unexpected symbol in for i

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jan-09   Time: 15:09:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Joining lists

2009-01-13 Thread Ted Harding

On 13-Jan-09 22:18:00, glenn wrote:
 Very simple one sorry:
 
 How do I join 2 lists please
 
 Thanks
 glenn

  c(list1,list2)

Example:

  A-list(a=1,b=2,c=3)
  B-list(d=4,e=5,f=6)
  c(A,B)
# $a
# [1] 1
# $b
# [1] 2
# $c
# [1] 3
# $d
# [1] 4
# $e
# [1] 5
# $f
# [1] 6

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jan-09   Time: 22:39:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbind for matrices - rep argument

2009-01-07 Thread Ted Harding

On 07-Jan-09 15:22:57, Niccolò Bassani wrote:
 Dear R users,I'm facing a trivial problem, but I really can't solve it.
 I've tried a dozen of codes, but I can't get the result I want.
 The question is: I have a dataframe like this one
 
 [,1] [,2] [,3] [,4] [,5]
 [1,]12345
 [2,]25549
 [3,]16812
 [4,]86415
 
 made up of decimal numbers, of course.
 I want to append this dataframe to itself a number x of times, i.e. 3.
 That is I want a dataframe like this
 
 [,1] [,2] [,3] [,4] [,5]
 [1,]12345
 [2,]25549
 [3,]16812
 [4,]86415
 [5,]12345
 [6,]25549
 [7,]16812
 [8,]86415
 [9,]12345
 [10,]25549
 [11,]16812
 [12,]86415
 
 I'm searching for an authomatic way to do this (I've already used the
 rbind re-writing x times the name of the frame...), as it must enter a
 function where one argument is exactly the number x of times to repeat
 this frame.
 
 Any ideas??
 Thanks in advance!
 Niccolò

I don't know whether there is anywhere a ready-made function which
will implement a rep paramater for an rbind, but the following ad-hoc
function will do it for you efficiently (i.e. with the minimum number
of applications of the rbind() function).

To produce a result which consists of k replicates of x, row-bound:


  Krbind - function(x,k){
y - x
if(k==1) return(x)
p - floor(log2(k))
for(i in (1:p)){
  z - rbind(y,y)
  y - z
}
k - (k - 2^p)
if(k==0) return(y) else return(rbind(y,Krbind(x,k)))
  }

## Example:

  Xdf - data.frame(X1=c(1.1,1.2),X2=c(2.1,2.2),
X3=c(3.1,3.2),X4=c(4.1,4.2))

  Krbind(Xdf,6)
# X1  X2  X3  X4
# 1  1.1 2.1 3.1 4.1
# 2  1.2 2.2 3.2 4.2
# 3  1.1 2.1 3.1 4.1
# 4  1.2 2.2 3.2 4.2
# 5  1.1 2.1 3.1 4.1
# 6  1.2 2.2 3.2 4.2
# 7  1.1 2.1 3.1 4.1
# 8  1.2 2.2 3.2 4.2
# 9  1.1 2.1 3.1 4.1
# 10 1.2 2.2 3.2 4.2
# 11 1.1 2.1 3.1 4.1
# 12 1.2 2.2 3.2 4.2

Of course, if you're not worried by efficiency, then the simple loop

  y - x
  for(i in (1:(k-1))){y - rbind(y,x)}

will do it!

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-09   Time: 18:08:14
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R in the NY Times

2009-01-07 Thread Ted Harding

On 07-Jan-09 18:03:19, Erik Iverson wrote:
 I pointed a friend of mine toward the article, to which he replied: 
 
 I hope that they run SAS on Solaris too, god only knows how tainted
 the syscalls are in that linux freeware.
 
 Of course, now Solaris is 'freeware', too, so I suppose that according
 to SAS, running SAS on Windows is the best way to be sure you're
 getting the right answers.

I'm not so sure about that. Since the article described R as
a supercharged version of Microsoft's Excel, surely people
should run R on Windows and be *ab*so*lute*ly* sure of getting
the right answers (and supercharged to boot)
Ted.

 
 On Wed, 07 Jan 2009 10:56:53 -0600, Marc Schwartz
 marc_schwa...@comcast.net wrote:
 I would also point out that the use of the term freeware as opposed
 to
 FOSS by the SAS rep, comes off as being unprofessional and
 deliberately condescending...
 
 The author of the article, to his credit, was pretty consistent in
 using
 open source terminology.
 
 Regards,
 
 Marc
 
 on 01/07/2009 10:26 AM Bryan Hanson wrote:
 I believe the SAS person shot themselves in the foot more in more
 ways
 than
 one.  In my mind, the reason you would pay, as Frank said, for

 non-peer-reviewed software with hidden implementations of analytic
 methods that cannot be reproduced by others

 Would be so that you can sue them later when a software problem in
 the
 designing of the engine makes your plane fall out of the sky!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jan-09   Time: 18:30:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R badly lags matlab on performance?

2009-01-03 Thread Ted Harding

On 03-Jan-09 18:28:03, Ben Bolker wrote:
 Ajay Shah ajayshah at mayin.org writes:
 On Sat, Jan 03, 2009 at 06:59:29PM +0100, Stefan Grosse wrote:
  On Sat, 3 Jan 2009 22:25:38 +0530 Ajay Shah ajayshah at
  mayin.org wrote:
  
  AS system.time(for (i in 1:1000) {a[i] - a[i] + 1})
  
  AS I wonder what we're doing wrong!
  
  it is no secret that R does badly with loops. Thats why it is
  recommended to use vectorized operations.
 
 But there's a big difference even on the vectorised version: a - a +
 1. Why should that be? Both systems should merely be handing down to
 the BLAS. The (stock) R install has a less carefully setup BLAS as
 compared with the (stock) matlab install?
 
   See my other message. I'm suspicious of the real size of
 the difference, I think the difference could well be
 noise.   Also, this particular bit of arithmetic doesn't
 involve BLAS -- see arithmetic.c (dig down until you get to
 integer_binary) ...
   Ben Bolker

I just ran Ajay's examples 3 times over:
R 2.8.1 on Debian Etch using 1MB of RAM in a VirtualBox VM
running on a 1.73GHz CPU. Results:

  user  system elapsed
Vector:  0.112   0.288   0.393
Loop:   65.276   0.300  65.572

Vector:  0.076   0.312   0.389
Loop:   65.744   0.332  66.076

Vector:  0.068   0.328   0.394
Loop:   65.292   0.308  65.597

Not dissimilar to Ajay's R times (though my loop was about 50% longer).

However, all three runs were very similar -- a little noise,
but not much!

I don't have octave (on the same machine) to compare these with.
And I don't have MatLab at all. So I can't provide a comparison
on that front, I'm afraid.
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jan-09   Time: 19:10:51
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linux words file

2009-01-03 Thread Ted Harding

On 03-Jan-09 21:39:55, Erin Hodgess wrote:
 Dear R People:
 
 I have a small function that solves the Jumble puzzle from the
 newspaper (I know...big deal).  It uses the the Linux words file.
 
 My question is:  is there a similar words file for Windows, please?
 
 Thanks,
 Happy New (Gnu) Year.
 Sincerely,
 Erin

And the same to you!
I don't know whether you can easily find the same readymade
for Windows, but you could always make one from the Linux file,
since it is a plain text file.

I have it in /etc/dictionaries-common/words

You might need to DOSify it first (i.e. end lines with CRLF
instead of just LF), though I don't know how Windows reacts
to Unix text files these days. In that case (in Linux):

  cp /etc/dictionaries-common/words words.txt
  unix2dos words.txt

unix2dos is an old program which is not present on all Linux
distributions. Maybe you don't need it anyway, in which case
just do the 'cp' (to get the extension .txt).

And are you going to share your R program?

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jan-09   Time: 22:30:06
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] understanding recursive functions

2008-12-18 Thread Ted Harding

On 18-Dec-08 22:33:28, Jeffrey Horner wrote:
 joseph.g.bo...@gsk.com wrote on 12/18/2008 04:22 PM:
 I'm trying to understand the use of recursive functions described
 on page 45 of An Introduction to R by the R core development team.
 
 A function is a list of expressions, which all get executed with
 only the last being assigned to a global variable, right? 
 So if a function refers recursively to itself, it should simply
 start with the first expression and go from there. At least that
 is my understanding of why the example given on page 45 works.
 
 In light of the above, I would appreciate it if someone would
 understand why the following example does not work:
 
 q - function(x,h) {if (x  2) {x - x+1; return(q(x))} else
 return(x)}
 
 If x  1, this should add 1 to x and go back to the beginning of
 the if expression, and the final result should be 2. So q(0) should
 return 2.
 But it returns an error message.
 
 All references to x save one (the assignment with the - operator)
 are found within the current frame, not by lexical scoping, and
 hence is never changed... producing infinite recursion. The following
 at least fixes your example:
 
 q - function(x,h) {if (x  2) {x - x+1; x - x+1; return(q(x))} else
 return(x)}
 ls() # no x in global env just yet
 q(-10)
 ls()

The following fixes it even more simply (using the same principles):

  q - function(x,h){
if (x  2) {u - x+1; return(q(u))} else return(x)
  }

Note that - is not necessary.

Just to test the method more thoroughly:

  q - function(x,h){
if (x  2) {u - x+h; return(q(u,h))} else return(x)
  }

  q(0,0.3)
# [1] 2.1

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 18-Dec-08   Time: 22:51:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical inconsistency

2008-12-07 Thread Ted Harding

On 07-Dec-08 18:41:16, Charles C. Berry wrote:
 On Mon, 8 Dec 2008, Berwin A Turlach wrote:
 G'day Wacek,
 On Sat, 06 Dec 2008 10:49:24 +0100
 Wacek Kusnierczyk [EMAIL PROTECTED] wrote:

 []
 there is, in principle, no problem in having a high-level language
 perform the computation in a logically consistent way.

 Is this now supposed to be a Radio Eriwan joke?
 [snip]
 
 Classical Radio Eriwan stuff, and I thought this class of jokes have
 died out.
 
 Evidently there are a few diehards still out there. From this
   http://www.theologyweb.com/campus/showthread.php?t=55535
 thread:
 ---
 
 [Question:]Can the gamma function be used in combinatorics?
 
 [Answer:] As Radio Eriwan would have said In principle, yes,
 because it interpolates the factorial.
 
 But what does it mean to throw a dice with pi sides sqrt(15) times ?
 ---
 
:-)
 Chuck
 [rest deleted]
 Charles C. Berry(858) 534-2098

Well, in principle you can. But achieving the fractional part
has probability zero.

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Dec-08   Time: 19:06:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Function output difficulties

2008-12-03 Thread Ted Harding

On 03-Dec-08 13:41:53, emj83 wrote:
 is there anyway for some parts of the function output to not be
 returned, even if the output has been used to calculate a further
 part of the function? i have tried invisible() to no avail.
 
 Thanks Emma

It's not clear what you mean by some parts of the function output.
The function output is whatever you nominate it to be, and can
include any selection of whatever was computed within the function.

For example:

  testfn - function(x){
X - x^2 ; Y - cos(X) ; Z - (X + Y)/(1 + x)
c(X,Z)
  }

  testfn(1.1)
# [1] 1.21 0.744295

However, I suspect your case may be a bit more complicated than that!
If you were to post the function (or something like it), explaining
what you want it to return, and in what way it returns what you do
not want, it would be easier to help.

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 03-Dec-08   Time: 19:03:29
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Examples of advanced data visualization

2008-12-01 Thread Ted Harding

On 01-Dec-08 03:19:57, Duncan Temple Lang wrote:
 Hans W. Borchers wrote:
 [...]
 A few days before your mail, I started putting together some
 examples of using R and SVG/ECMAScript and R and 
 Flash/Flex/MXML/ActionScript.
 
 There are some examples of R graphics that provide interactivity in 
 various forms and ways at
 
 http://www.omegahat.org/SVGAnnotation/tests/examples.html
 
 (Most examples will work with Firefox, Opera is the most
 comprehensive browser however for these examples.)
 
 The Flex examples will take more time.
   D.

I visited that URL (with the extra t!), and got a message
from my browser (Iceweasel on Debian Etch, which is Firefox
under another name) that additional plugins (unspecified)
were needed to display the material.

When I clicked on the Install Missing Plugins button, the
result was

  No suitable plugins found
  Unknown Plugin (text/svg+xml)

Any suggestions for further progress?

With thanks,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Dec-08   Time: 09:08:05
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Examples of advanced data visualization

2008-12-01 Thread Ted Harding

On 01-Dec-08 09:22:34, Gábor Csárdi wrote:
 On Mon, Dec 1, 2008 at 10:21 AM, Gábor Csárdi [EMAIL PROTECTED]
 wrote:
 On Mon, Dec 1, 2008 at 10:08 AM, Ted Harding
 [EMAIL PROTECTED] wrote:
 [...]
 I visited that URL (with the extra t!), and got a message
 from my browser (Iceweasel on Debian Etch, which is Firefox
 under another name) that additional plugins (unspecified)
 were needed to display the material.

 When I clicked on the Install Missing Plugins button, the
 result was
 
  No suitable plugins found
  Unknown Plugin (text/svg+xml)

 Any suggestions for further progress?

 Ted, what is your browser version? If 2.x, then I'm afraid that you'll
 have to upgrade to 3.x, or install Opera.
 
 Oooops, maybe I am wrong, see
 http://perlitist.com/articles/on-firefox2-and-svg
 
 Gabor
 
 Gabor

 With thanks,
 Ted.

Hmm ... Interesting. Many thanks for the reasearches, Gabor.
Yes, it is a 2.x version.

That URL states:

  [...] So what was the cause of all this woe and despair?
   Well it all stemmed from the fact that I upgrades from
   1.5 to 2, and in 1.5 I was using the Adobe SVG plugin
   which disables the built-in SVG rendering. Firefox 2
   helpfully copied this setting, but not the plugin.

   For future reference svg.enabled in about:config needs
   to be true for the built-in SVG viewer to be used.

I checked the status of svg.enabled in 'about.config',
and I see that it is TRUE.

Further, exploring my ~/.mozilla tree, I do not see any reference
to an Adobe SVG plugin -- the only Adobe plugins are for Acrobat
Reader and Flash Player.

So there is a bit more to this than meets the eye. Maybe I should
try to find a different source of SVG material, maybe less
demanding than Hans Borchers's examples, to see whether the alleged
built-in SVG capability really works anyway.

Thanks again!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Dec-08   Time: 10:26:23
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA and logical indexes

2008-11-28 Thread Ted Harding

On 28-Nov-08 21:25:36, Sebastian P. Luque wrote:
 Hi,
 I vaguely remember this issue being discussed at some length in the
 past, but am having trouble relocating the proper thread (defining an
 adequate search string to do so):
 
 --cut here---start-
 R foo - data.frame(A=gl(2, 5, labels=letters[1:2]), X=runif(10))
 R foo$A[1] - NA
 R foo$A == b
  [1]NA FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
 R foo$A[foo$A == b]
 [1] NA bbbbb   
 Levels: a b
 R foo$X[foo$A == b]
 [1] NA 0.4425 0.7164 0.3171 0.1967 0.8300
 R foo[foo$A == b, ]
   A  X
 NA NA NA
 6 b 0.4425
 7 b 0.7164
 8 b 0.3171
 9 b 0.1967
 10b 0.8300
 --cut here---end---
 
 Why is foo$X[1] set to NA in that last call?
 
 Cheers,
 Seb

It is not! In my repetition (which has different runifs):

  foo[foo$A == b, ]
#   A X
# NA NANA
# 6 b 0.2300618
# 7 b 0.5109791
# 8 b 0.7947862
# 9 b 0.3400228
# 10b 0.5464989
  foo
#   A X
# 1  NA 0.5013591
# 2 a 0.4475963
# 3 a 0.2600449
# 4 a 0.9240698
# 5 a 0.4205284
# 6 b 0.2300618
# 7 b 0.5109791
# 8 b 0.7947862
# 9 b 0.3400228
# 10b 0.5464989

NA can seem to have a bewildering logic, but it all becomes
clear if you interpret NA as value unkown.

You asked for foo[foo$A == b, ]. What happens is that
when the test foo$A == b encounters f$A[1] it sees NA,
so it does not know what the value is. Hence it does not
know whether this row of foo satisfies the test. Hence
the entire row is of unkown status. Hence a row is output
all of whose elements (including the row label, i.e. the
row number) are flagged unknown, i.e. NA.

AFter all, if it gave the value of foo$X[1] = 0.5013591,
and you subsequently acessed foo[foo$A == b,][1,2] and got
0.5013591, you would presumably proceed as though this was
a value corresponding to a case where foo$A == b. But it
is not -- since foo$A[1] = NA, you don't know whether that is
the case. Hence you don't know the value of foo[foo$A == b,][1,2].

Clear? ( :))
Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 28-Nov-08   Time: 22:01:11
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Ted Harding

On 26-Nov-08 17:57:52, Andrew Choens wrote:
 [...]
 And, since I do work for government, if I ask for a roomful of
 calculators, I might just get them. And really, what am I going
 to do with a roomful of calculators?
 
 --andy
 Insert something humorous here.  :-)

Next time the launch of an incoming nuclear strike is detected,
set them to work as follows (following Karl Pearson's historical
precedent):

  Anti-aircraft guns all day long: Computing for the
 Ministry of Munitions
 JUNE BARROW GREEN (Open University)
   From January 1917 until March 1918 Pearson and his
   staff of mathematicians and human computers at the
   Drapers Biometric Laboratory worked tirelessly on
   the computing of ballistic charts, high-angle range
   tables and fuze-scales for AV Hill of the Anti-Aircraft
   Experimental Section. Things did not always go smoothly
   -- Pearson did not take kindly to the calculations of
   his staff being questioned -- and Hill sometimes had
   to work hard to keep the peace.

If you have enough of them (and Pearson undoubtedly did, so you
can quote that in your requisition request), then you might just
get the answer in time!

[ The above excerpted from http://tinyurl.com/6byoub ]

Good luck!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Nov-08   Time: 18:35:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help With Permutations

2008-11-24 Thread Ted Harding

On 24-Nov-08 13:36:31, Mulazzani Fabio (Student Com06) wrote:
 I have a problem with permutations functions in R
 I just started using R so maybe it is an easy question
 I need to obtain all the 9.somthingExp 157 permutations that can be
 given from the number from 1 to 100
 
 I wrote the following commands:
 
 library(gregmisc)
options(expressions=1e5)
  cmat - combinations(300,2)
  dim(cmat) # 44850 by 2 
permutations(n=100, r=100)
 
 Unfortunately at a certain point (after few minutes) I get the
 following Error:
 
 Error: cannot allocate vector of size 609.1 Mb
 
 What can I do?
 Thanks
 Fabio

The first thing to do is to decide how long you are willing to wait!

To an adequate approximation there are 10^158 of them.
Simply to obtain them all (at a rate of 10^10 per second, which is
faster than the CPU frequency of most desktop computers) would take
10^148 seconds, or slightly longer than 3*(10^140) years.

Current estimates of the age of the Universe are of the order of
1.5*(10^10) years, so the Universe will have to last about 2*(10^130)
times as long as it has already existed, before the task could
be finished.

So: why do you want to do this?

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 24-Nov-08   Time: 16:43:29
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to test for the empty set

2008-11-24 Thread Ted Harding

On 24-Nov-08 17:41:25, G. Jay Kerns wrote:
 Dear R-help,
 I first thought that the empty set (for a vector) would be NULL.
 
 x - c()
 x
 
 However, the documentation seems to make clear that there _many_ empty
 sets depending on the vector's mode, namely, numeric(0), character(0),
 logical(0), etc.  This is borne out by
 
 y - letters[1:3]
 z - letters[4:6]
 intersect(y,z)
 
 which, of course, is non-NULL:
 
 is.null(character(0))   # FALSE

In the above (and below) cases, would not

  (length(x)==0)

do?
Ted.

 So, how can we test if a vector is, say, character(0)?  The following
 doesn't (seem to) work:
 
 x - character(0)
 x == character(0)  # logical(0)
 
 More snooping led to the following:
 
 wiki.r-project.org/rwiki/doku.php?id=tips:surprises:emptysetfuncs
 
 and at the bottom of the page it says logical(0) is an empty set,
 thus is TRUE.  However, I get
 
 isTRUE(logical(0))   # FALSE
 
 but, on the other hand,
 
 all.equal(x, character(0))  # TRUE
 
 This would seem to be the solution, but am I missing something? and in
 particular, is there an elegant way to check in the case that the mode
 of the vector is not already known?
 
 Thanks in advance for any insight you may have.
 
 Best,
 Jay


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 24-Nov-08   Time: 18:15:31
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep for asterisks *'s

2008-11-23 Thread Ted Harding

On 23-Nov-08 08:41:02, Robin Clark wrote:
 Hello,
 I'm trying to determine if a string contains asterisks using the grep
 function. I know that this is slightly difficult because * is a special
 character in regular expressions and needs to be escaped. However,
 escaping the * using \ doesn't work either:

In a regular expression, a term like [*] will match * literally,
even though * is normally a special (meta) character. Thus:

  test-c(1*1,222,*33,44*,555)
  test
# [1] 1*1 222 *33 44* 555
  ?grep
  grep([*],test)
# [1] 1 3 4
  test[grep([*],test)]
# [1] 1*1 *33 44*

This also works for other metacharacters, even (surprise!) ]:
  test2-c(1[1,222,[33,44],55])
  grep([]],test2)
# [1] 4 5
  grep([[],test2)
# [1] 1 3


However, your error was to use \* with only one \.
You should have used \\*:

  grep(\\*,test)
# [1] 1 3 4

and the \\ also works for other special characters, though \
itself can be tricky since to enter a \ into a string you have
to type it as \\, and R will print it back as \\ as well:

  test3-c(1\\1,222,\\33,44\\,55\\)
  test3
# [1] 1\\1 222  \\33 44\\ 55\\
  nchar(test3)
# [1] 3 3 3 3 3

so there really are only 3 characters in each element of test3.
But to grep for \ you have to enter \\ twice over:

  grep(\\,test3)
# Error in grep(pattern, x, ignore.case, extended, value, fixed) : 
# invalid regular expression
  grep(\\\,test3)
# Error: syntax error
  grep(,test3)
# [1] 1 3 4 5

And, just as [*] catches *, so [\\] is needed to catch \;

  grep([\\],test3)
# [1] 1 3 4 5

Hoping this helps!
Ted.

 if(grep(\*, model)0) #does the model have an interaction
 {
do something...
 }
 
 produces the following error message:
 
 Error in grep(*, model) : invalid regular expression '*'
 In addition: Warning messages:
 1: '\*' is an unrecognized escape in a character string 
 2: unrecognized escape removed from \* 
 3: In grep(*, model) :
   regcomp error:  'Invalid preceding regular expression'
 Execution halted
 
 Any ideas anyone?
 Robin
 -- 
 View this message in context:
 http://www.nabble.com/grep-for-asterisks-%22*%22%27s-tp20644195p20644195
 .html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 23-Nov-08   Time: 09:29:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2^2 factorial design question

2008-11-07 Thread Ted Harding

On 07-Nov-08 00:53:56, Edna Bell wrote:
 Dear R Gurus:
 
 How do you put together a 2^2 (or even 2^k) factorial problem, please?

I'm not clear about what you mean here, especially about put together!
Can you describe what you want to end up with in more detail?

 Since you have 2 levels for A and B, do you put in A+ and A- as
 factors, please?

In any case, I would not use the names A+ and A-, since this
provides too many opportunities for expressions involving these
to be interpreted as arithmetic operations. Better, and safer,
to use names A1 and A0, say.

Best wishes,
Ted.

 Thanks,
 Edna Bell


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Nov-08   Time: 09:51:39
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count data with some conditions

2008-11-01 Thread Ted Harding

On 01-Nov-08 02:51:37, David Winsemius wrote:
 Do you want the count of remaining  elements which are strictly  
 greater than the first element?
 
   length(which(a[1]  a[2:10]))
 [1] 4
 
 or perhaps a bit more deviously:
 
   sum( a[1]a[2:10]+0 ) #adding 0 to TRUE or FALSE creates 1 or 0.
 [1] 4

No need to be devious! Simply
  sum(a[1]  a[2:10])
# [1] 4
will do it. The reason is that when TRUE or FALSE are involved in
an arithmetic operation (which sum() is), they are cast into 1 or 0.

Ted.

 On Oct 31, 2008, at 7:56 PM, sandsky wrote:
 Hi there,
 I have a data set:

 a=cbind(5,2,4,7,8,3,4,11,1,20)

 I want to count # of data, satistfying a[1]a[2:10].
 Anyone helps me solving this case?

 Thank you in advance,
 Jin


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Nov-08   Time: 07:30:17
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How R calculate p-value ?

2008-11-01 Thread Ted Harding

On 01-Nov-08 20:59:29, RON70 wrote:
 Still no reply. Is my question not understandable at all?
 RON70 wrote:
 I am wondering how R calculate p-value for a test. Does R do some
 Approximate integration on the p.d.f of null distribution?
 How I can see the code for this particular calculation?
 
 Your help will be highly appreciated.
 Regards,

How R calculates a P-value depends on the analysis being done,
but it will use a standard method.

For example, a t-test (standard frm, not Welch)will assume a normal
diretribution for the deviations from the mean (or means, for a
two-sample test, with the same variance for each group); will calculate
the T-statistic, and will calculate the exact (to within numerical
accuracy) for this T-value from the distribution of t on the correct
number of degrees of freedom.

In the case of fitting a linear model with normally distributed
errors (using lm()), the P-value for each coefficient will be
based on the z value (value of coefficient divided by the
standard error), referred to a t distribution with appropriate
degrees of freedom. Again, an exact P-value.

Similarly, for an analysis of variance, the F-statistic will be
calculated in the standard way, and the P-value will be obtained
by reference to the F distribution with appropriate degrees of
freedom. Again exact, to within numerical accuracy.

For some other kinds of analysis, standard (often large-sample
chi-squared approximations, in some cases to the distribution
of a likelihood ratio) methods will be used, and the resulting
P-value will be approximate to within th accuracy of the
approximating distribution. This kind of thing occurs for
example in fitting generalised linear models using glm().
These P-values are approximate, but the approximations
are standard.

There is as much variety in all this as in the variety of standard
methods for obtaining a test statistic and in using tractable
approximations to the distribution of the test statistic.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Nov-08   Time: 21:56:14
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R random number generation

2008-10-23 Thread Ted Harding

On 23-Oct-08 19:58:12, hiphop wrote:
 
 i have to generate random numbers from length 2 to 30 and
 length two should have the numbers 1,2 and length 3 should
 have the numbers 1,2,3 so on and so forth till size 30.
 i should get an sequence like 221 or 112 till both
 the values appear and for 3 it should be 123 or 3331112.
 i should get similar output for all sizes which means i should
 get 30 sequences.but am getting only one sequence.
 please help. this is my code
 
 y=0;
 for (i in 1:3)
 {
 while(y!=i)
 {
 y=sample(1:3,replace=TRUE);
 }
 }

If I understand you correctly, the following should do what
you are looking for (i.e. sample repeatedly from (1:k) until
you first have all of 1:k in the sample, repetitions being
allowed):

  mksample - function(k){
All - (1:k) ; Left - rep(TRUE,k) ; Done - FALSE
Result - NULL
while(! Done){
  j - sample(All,1)
  Result - c(Result, j); Left[j] - FALSE
  Done - (sum(Left)==0)
}
Result
  }

Example:
  mksample(5)
# [1] 1 4 4 5 2 1 1 3

I'm not clear about why, in your illustrations (such as
3331112) you presented it as a sequence of runs (333,
111,2) -- maybe this is not relevant to your query.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 23-Oct-08   Time: 21:51:04
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating confidence limits for the difference between

2008-10-22 Thread Ted Harding

On 22-Oct-08 17:09:27, Dennis Fisher wrote:
 Colleagues,
 I am working on a problem at the edge of my knowledge of statistics.
 
 Basically, I have values for two groups - I am trying to calculate the 
 90% confidence limits for the difference between means.  I can  
 implement this without difficulty based on standard equations that are 
 available in stat textbooks: (difference between means) / (pooled  
 standard error)
 
 My problem arises when I try to correct for the value of a covariate.  
 I can do the ANOVA with the covariate as a factor.  But, I don't know  
 how to use the output to obtain the confidence limits of the
 difference.
 
 I suspect that several participants in this board have implemented
 code to so this.  I hope that someone is willing to share the code.

Mostly, we haven't, though a few of us once did (in coding lm()
and t.test()).

Assuming that you will adopt equal variances for the two groups
(as in standard ANOVA), you can approach this in at least two
different ways. Both illustrated below, for comparison.

## Make up some data, and a Group indicator
  set.seed(213243)
  X1 - rnorm(15) ; X2 - 0.1+rnorm(25) ; X - c(X1,X2)
  Group - factor(c(rep(0,length(X1)),rep(1,length(X2
  mean(X1)
# [1] 0.1108255
  mean(X2)
# [1] 0.1002999
  mean(X2)-mean(X1)
# [1] -0.01052560

  summary(lm(X ~ Group))$coef
#Estimate Std. Error t value  Pr(|t|)
# (Intercept)  0.11082552  0.2543404  0.43573696 0.6654929
# Group1  -0.01052560  0.3217180 -0.03271686 0.9740716
## Note that the Group coefficient is the difference of means here

  Diff-summary(lm(X ~ Group))$coef[2,1] ## -0.01052560
  SE  -summary(lm(X ~ Group))$coef[2,2] ##  0.3217180
  Diff + qt(0.975,38)*c(-1,1)*SE ## d.f. = (15-1)+(25-1)
# [1] -0.6618097  0.6407585
## 95% confidence interval calculated from output of lm() above

  t.test(X2,X1,var.equal=TRUE)
# Two Sample t-test
# data:  X2 and X1
# t = -0.0327, df = 38, p-value = 0.974
# alternative hypothesis: true difference in means is not equal to 0 
# 95 percent confidence interval:
#  -0.6618097  0.6407585 
# sample estimates:
# mean of x mean of y 
# 0.1002999 0.1108255

## The 95% CI fron t.test() is the same as previously calulculated
## from the ouput of lm(). But t.test() does not directly output
## the difference of means.

Hoping this helps
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-08   Time: 19:05:47
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] card games

2008-10-22 Thread Ted Harding

On 22-Oct-08 23:04:34, Barry Rowlingson wrote:
 2008/10/22 Erin Hodgess [EMAIL PROTECTED]:
 This is for programming purposes.

 I wanted my students to do some games, but I wanted to see if there
 were items already out there.
 
  A quick googling (oooh, if ONLY R was called something more googly!)
 found Duncan Murdoch's poker package:
 
 http://www.stats.uwo.ca/faculty/murdoch/repos/html/pokerv0.0.html
 
  v0.0 could give your students something to start on. It has some card
 handling and shuffling packages as well as hand evaluations.
 
  Here's my poker calculator:
 
  action = function(hand, bet, cash){
if(rubbish(hand)){
   drink() ; drink()
   if(drunk()) {
  return(bluff)
   }else{
  return(fold)
}
} else {
 drink()
 bet = runif(1,min,max)
 return(bet)
}
 }
 
  It's a pretty accurate simulation. *hic*
 Barry

*** Segmentation fault - virtuous memory exceeded






E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 23-Oct-08   Time: 00:21:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ? extended rep()

2008-10-20 Thread Ted Harding

Hi Folks,
I'm wondering if there's a compact way to achieve the
following. The dream is that, by analogy with

  rep(c(0,1),times=c(3,4))
# [1] 0 0 0 1 1 1 1

one could write

  rep(c(0,1),times=c(3,4,5,6))

which would produce

# [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1

in effect recycling x through 'times'.

The objective is to produce a vector of alternating runs of
0s and 1s, with the lengths of the runs supplied as a vector.
Indeed, more generally, something like

  rep(c(0,1,2), times=c(1,2,3,2,3,4))
# [1] 0 1 1 2 2 2 0 0 1 1 1 2 2 2 2

Suggestions appreciated! With thanks,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 20-Oct-08   Time: 21:57:15
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ? extended rep()

2008-10-20 Thread Ted Harding

On 20-Oct-08 21:19:21, Stefan Evert wrote:
 On 20 Oct 2008, at 22:57, (Ted Harding) wrote:
 I'm wondering if there's a compact way to achieve the
 following. The dream is that one could write

  rep(c(0,1),times=c(3,4,5,6))

 which would produce

 # [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1

 in effect recycling x through 'times'.
 
 rep2 - function (x, times) rep(rep(x, length.out=length(times)),
 times)
 
 rep2(c(0,1),times=c(3,4,5,6))
   [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1
 
 Any prizes for shortest solution? ;-)
 
 Best,
 Stefan

If ever we are both within reach of 'en øl', then yes.
But Gabor came up with a shorter one.

I tried to shorten Gabor's but failed.

However, all competitors are entitled to a consolation prize!
(And that includes me ... )
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 20-Oct-08   Time: 22:59:16
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ? extended rep()

2008-10-20 Thread Ted Harding

On 20-Oct-08 21:17:22, Gabor Grothendieck wrote:
 Try this:
 with(data.frame(x = 0:1, times = 3:6), rep(x, times))
 
 or even shorter:
 do.call(rep, data.frame(x = 0:1, times = 3:6))

That is sneaky!

  data.frame(x = 0:1, times = 3:6)
#   x times
# 1 0 3
# 2 1 4
# 3 0 5
# 4 1 6

(Which is why it won't work with list(x=0:1,times=3:6))
Ted.

 On Mon, Oct 20, 2008 at 4:57 PM, Ted Harding
 [EMAIL PROTECTED] wrote:
 Hi Folks,
 I'm wondering if there's a compact way to achieve the
 following. The dream is that, by analogy with

  rep(c(0,1),times=c(3,4))
 # [1] 0 0 0 1 1 1 1

 one could write

  rep(c(0,1),times=c(3,4,5,6))

 which would produce

 # [1] 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1

 in effect recycling x through 'times'.

 The objective is to produce a vector of alternating runs of
 0s and 1s, with the lengths of the runs supplied as a vector.
 Indeed, more generally, something like

  rep(c(0,1,2), times=c(1,2,3,2,3,4))
 # [1] 0 1 1 2 2 2 0 0 1 1 1 2 2 2 2

 Suggestions appreciated! With thanks,
 Ted.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 20-Oct-08   Time: 21:57:15
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 20-Oct-08   Time: 23:02:00
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R plot

2008-10-17 Thread Ted Harding

On 17-Oct-08 09:01:08, Benoit Boulinguiez wrote:
 Hi,
 Personally I always use xlim and ylim with the plot or points
 function like that:
 
 plot( X,Y,pch=16,col=2,cex.axis=1.5,cex.lab=1.5,
   xlim=c(0,1.05*max(X)),ylim=c(0,1.05*max(Y))
   )
 
 Regards/Cordialement
 Benoit Boulinguiez 

I think (from his original post) that Haoda already knows about
the use of xlim and ylim. What he finds annoying is keeping track
of what they should be!

Here, I'm afraid, I am inclined to agree with him. For example,
if you want to plot say 10 time series, with different time-ranges
and different value-ranges, all on the one graph, and they have
to be obtained separately (even by reading in from 10 different
data files), then the only way I have found is to wait until all
the objects are available, then compute the min and max of the
x-range and the y-range of each, and finally base xlim on the
min of the min x-ranges and the max of the max x-ranges (and
similarly for ylim). Of course you could alternatively do this
cumulatively as you go along, and even build the process into
a function. But it is a lot of admin along the way, and it can
get complicated.

This sort of thing has caused me a lot of extra work on many
occasions, which would have been unnecessary if plots could
re-size themselves when asked to go outside existing ranges.
I grin and bear it, because that's how things work; but I have
to admit that I don't like it!

Best wishes to all,
Ted.


 -Message d'origine-
 De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 De
 la part de Wacek Kusnierczyk
 Envoyé : vendredi 17 octobre 2008 10:47
 À : Haoda Fu
 Cc : R help
 Objet : Re: [R] R plot
 
 Haoda Fu wrote:
 All -


 When I plot something like

 a-rnorm(5)
 b-rnorm(5)
 plot(a,b,col = red)
 points(10,-10)

 The last point is missing because it is out of range of the first 
 plot.

 I just try to switch from Matlab to R. In Matlab, it always can 
 automatic adjust the xlim and ylim for such case.

 Is it possible auto adjust in R? Otherwise keep tracking xlim and ylim
 is really annoying.

   
 
 if you know the range in advance, you can specify it using the xlim and
 ylim
 parameters to plot.  you can also use them in points (it doesn't cause
 an
 error), but it does not seem to have the desired effect of reshaping
 the
 plot.
 
 it's perhaps a pity it works this way, but you have to get used to it. 
 or drop r if you find matlab better.
 
 
 vQ
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 17-Oct-08   Time: 10:31:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting 2D Logistic regression

2008-10-17 Thread Ted Harding

On 17-Oct-08 17:59:55, Feldman, Ruben wrote:
 Hi,
 My data has time series for different variables and I want to predict
 ctw with the value of each other variable at that point in the
 series.
 I have run a logistic regression: 
 logreg - glm(ctw ~ age + OFICO + ... + CCombLTV, data=mydata,
 family=binomial(logit))
 And I am trying to get a plot of the logistic regression I have just
 performed, on the same plot as the actual time series. Apparently
 lin.reg can only deal with linear regressions, from the error I am
 getting.

Does

  plot(mydata$age, logreg$fit)

do something like what you are looking for?
Ted.

 Do you know what function I should be using and where I can get it?
 
 Thanks so much!
 RF


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 17-Oct-08   Time: 22:20:31
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add notes to sink output

2008-10-13 Thread Ted Harding

On 13-Oct-08 20:02:20, Michael Just wrote:
 Hello,
 How can I add notes (i.e. text) to a sink output?
 
 sink(test.txt)
#This text will describe the test
 summary(x)
 sink()
 
 How can I add that text above to the sink output?
 Thanks,
 Michael

Anything on the lines of:

  sink(test.txt)
  cat(This text will describe the test\n)
  cat(\n)
  summary(x)
  sink()




E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 13-Oct-08   Time: 21:12:54
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditionally skip over for(..){ and }??

2008-10-12 Thread Ted Harding

On 12-Oct-08 11:32:31, Barry Rowlingson wrote:
 2008/10/12 Ted Harding [EMAIL PROTECTED]:
 Hi Folks,
 I'm wondering if there's a secret trick to achieve the following.

 I have some big code for analysis related to a named variable,
 which will be one of several in the columns of a dataframe, and
 I would like to be able to choose between a run for just one of
 these variables, or a run which loops over them all.

 So, for a single run, I could have the following kind of thing
 in a file code.R to be sourced by source(code.R):

 # VarName - Var1
  VarName - Var2
 # VarName - Var3
 # VarName - Var4

 ### CUT OUT LOOP
 # VarNames - c(Var1,Var2,Var3,Var4)
 # for( VarName in VarNames ) {

  Lots of code related to analysis of variable VarName 

 ### CUT OUT END OF LOOP
 # }

 which would do the single case for Var2. A run for a different VarName
 would be done by editing the first block to select the new VarName.

 Now, when I want to run the loop over all VarNames, I could of course
 likewise edit the file so as to uncomment the code which follows the
 CUT OUT ... lines. But I would prefer to avoid having to do the
 latter, and am therefore wondering if there is some way to
 conditionally
 skip over those bits of code.
 
  What's the problem with doing a loop over a single VarName in
 VarNames? I'd put something like:
 
  VarNames = c(
 Var1,
 Var2,
 Var3
 )
 
  at the start of my code for easy editing, and then if I only wanted
 to do it for one VarName you can just do:
 
 VarNames = c(
#Var1,
 Var2
#Var3
 )
 
  - it's just a shame that R doesn't like trailing commas in c() calls,
 which would make it even easier.
 
  Of course the real way to do it is to rewrite it as a function and do
 foo(c(Var1,Var2)) as desired
 
 Barry

Now why didn't that occur to me? Must do something about the
hypocaffeinaemia. Many thanks, Baz, and also to Berwin Turlach
for a similar suggestion!

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Oct-08   Time: 13:02:37
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditionally skip over for(..){ and }??

2008-10-12 Thread Ted Harding

Following up on Barry's suggestion led on to a more straightforward
approach:

  VarNames - NULL
# VarNames - c(VarNames, Var1)
# VarNames - c(VarNames, Var2)
# VarNames - c(VarNames, Var3)
# [...]
  for(i in (1:length(VarNames))){
VarName - VarNames[i]
[...]

Then any one or some of those lines can be uncommented.

With thanks for the suggestions.
Ted.

On 12-Oct-08 11:32:31, Barry Rowlingson wrote:
 2008/10/12 Ted Harding [EMAIL PROTECTED]:
 Hi Folks,
 I'm wondering if there's a secret trick to achieve the following.

 I have some big code for analysis related to a named variable,
 which will be one of several in the columns of a dataframe, and
 I would like to be able to choose between a run for just one of
 these variables, or a run which loops over them all.

 So, for a single run, I could have the following kind of thing
 in a file code.R to be sourced by source(code.R):

 # VarName - Var1
  VarName - Var2
 # VarName - Var3
 # VarName - Var4

 ### CUT OUT LOOP
 # VarNames - c(Var1,Var2,Var3,Var4)
 # for( VarName in VarNames ) {

  Lots of code related to analysis of variable VarName 

 ### CUT OUT END OF LOOP
 # }

 which would do the single case for Var2. A run for a different VarName
 would be done by editing the first block to select the new VarName.

 Now, when I want to run the loop over all VarNames, I could of course
 likewise edit the file so as to uncomment the code which follows the
 CUT OUT ... lines. But I would prefer to avoid having to do the
 latter, and am therefore wondering if there is some way to
 conditionally
 skip over those bits of code.
 
  What's the problem with doing a loop over a single VarName in
 VarNames? I'd put something like:
 
  VarNames = c(
 Var1,
 Var2,
 Var3
 )
 
  at the start of my code for easy editing, and then if I only wanted
 to do it for one VarName you can just do:
 
 VarNames = c(
#Var1,
 Var2
#Var3
 )
 
  - it's just a shame that R doesn't like trailing commas in c() calls,
 which would make it even easier.
 
  Of course the real way to do it is to rewrite it as a function and do
 foo(c(Var1,Var2)) as desired
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Oct-08   Time: 16:00:14
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R vs SPSS contrasts

2008-10-12 Thread Ted Harding

Very many thanks, Chuck and Gabor, for the comments and the
references to on-line explanations. It is beginning to become
clear!

Most grateful.
Ted.

On 12-Oct-08 18:03:53, Gabor Grothendieck wrote:
 I found this link:
 
 http://webs.edinboro.edu/EDocs/SPSS/SPSS%20Regression%20Models%2013.0.pd
 f
 
 which indicates that the contrast in SPSS that is used
 depends not only on the contrast selected but also on the
 reference category selected and the two can be chosen
 independently.  Thus one could have simple/first, simple/last,
 deviation/first, deviation/last, etc.  An R contr.SPSS function
 would have to specify both the deviation type and the
 first/last in order to handle all SPSS variations.
 
 On Sun, Oct 12, 2008 at 1:48 PM, Gabor Grothendieck
 [EMAIL PROTECTED] wrote:
 The formula should be (diag(n) - 1/n)[, -n]

 On Sun, Oct 12, 2008 at 1:36 PM, Gabor Grothendieck
 [EMAIL PROTECTED] wrote:
 Looks like the contrast matrix for indicator is contr.SAS(n),
 for deviation is contr.sum(n) and for simple is:

 (diag(n) - 1/n)[, -1]

 That works at least for the n = 3 example in the link.
 Perhaps the others could be checked against SPSS
 for a variety of values of n to be sure.

 On Sun, Oct 12, 2008 at 12:32 PM, Chuck Cleland
 [EMAIL PROTECTED] wrote:
 On 10/11/2008 3:31 PM, Ted Harding wrote:
 Hi Folks,

 I'm comparing some output from R with output from SPSS.
 The coefficients of the independent variables (which are
 all factors, each at 2 levels) are identical.

 However, R's Intercept (using default contr.treatment)
 differs from SPSS's 'constant'. It seems that the contrasts
 were set in SPSS using

   /CONTRAST (varname)=Simple(1)

 I can get R's Intercept to match SPSS's 'constant' if I use
 contr.sum in R.

 Can someone please confirm that that is a correct match for
 the SPSS Simple(1), with identical effect?

 And is there a convenient on-line reference where I can look
 up what SPSS's /CONTRAST statements exactly mean?
 I've done a lot of googling, withbout coming up with anything
 satisfactory.

 With thanks,
 Ted.

 Hi Ted:
  Here are two links with the same content giving a brief description
  of
 SPSS simple contrasts:

 http://www.ats.ucla.edu/stat/spss/library/contrast.htm
 http://support.spss.com/productsext/spss/documentation/statistics/art
 icles/contrast.htm

  These pages explain how simple contrasts differ from indicator
 (contr.treatment) and deviation (contr.sum) contrasts.  For a factor
 with 3 levels, I believe you can reproduce SPSS simple contrasts
 (with
 the first category as reference) like this:

 C(warpbreaks$tension, contr=matrix(c(-1/3,2/3,-1/3,-1/3,-1/3,2/3),
 ncol=2))
 ...
 attr(,contrasts)
[,1]   [,2]
 L -0.333 -0.333
 M  0.667 -0.333
 H -0.333  0.667
 Levels: L M H

  For a factor with 2 levels, like this:

 C(warpbreaks$wool, contr=matrix(c(-1/2,1/2), ncol=1))
 ...
 attr(,contrasts)
  [,1]
 A -0.5
 B  0.5
 Levels: A B

  Your description of the effect of SPSS simple contrasts - intercept
 coefficient of contr.sum and non-intercept coefficients of
 contr.treatment - sounds accurate to me.

 hope this helps,

 Chuck

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 11-Oct-08   Time:
 20:31:53
 -- XFMail
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 Chuck Cleland, Ph.D.
 NDRI, Inc. (www.ndri.org)
 71 West 23rd Street, 8th floor
 New York, NY 10010
 tel: (212) 845-4495 (Tu, Th)
 tel: (732) 512-0171 (M, W, F)
 fax: (917) 438-0894

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Oct-08   Time: 21:04:48
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

[R] png(): Linux vs Windows

2008-10-12 Thread Ted Harding

Hi Folks,
Quick question. I have the following line in an R code file
which runs fine on Linux:

if(PNG) png(GraphName,width=12,height=15,units=cm,res=200)

I learn that, when the same code was run on a Windows machine,
there was the following error:

Error in png(GraphName,width=12,height=15,units=cm,res=200): 
unused argument(s) (units = cm)

Sorry to be a bother -- but could a Windows Ruser put me wise
on any differences between png() on Windows and Linux?

(And, sorry, I don't know what version of R, nor what version
of Windows, this occurred on).

Thanks,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 12-Oct-08   Time: 23:02:41
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlation among correlation matrices cor() - Interpret

2008-10-10 Thread Ted Harding

On 10-Oct-08 08:07:34, Michael Just wrote:
 Hello,
 If I have two correlation matrices (e.g. one for each of two
 treatments) and then perform cor() on those two correlation
 matrices is this third correlation matrix interpreted as the
 correlation between the two treatments?
 
 In my sample below I would interpret that the treatments are 0.28
 correlated. Is this correct?
 
 var1- c(.0008, .09, .1234, .5670008, .00110011002200,
  8, 9, 12.34, 56.7, 1015245698745859)
 var2- c(11, 222, 333, , 5, 1.1, 2.2, 3.3, 4.4, .55)
 var3 -c(22.2, 66.7, 99.9, 108, 123123, .1, .2, .3, .4, .5)
 var4- c(.01,.1,.0001, .001, .1, .12345, .56789, .67890,
  .78901, .89012)
 dat - cbind(var1,var2,var3,var4)
 dat.d - data.frame(dat)
 treatment1 - dat.d[1:5,]
 treatment2 -dat.d[6:10,]
 t1.d.cor - cor(treatment1)
 t2.d.cor - cor(treatment2)
 I -lower.tri(t1.d.cor)
 t1.t2 - cor(cbind(T1 = t1.d.cor[I], T2 = t2.d.cor[I]))
 t1.t2
   T1T2
 T1 1.000 0.2802750
 T2 0.2802750 1.000
 
 My code may be unpolished,
 Thanks,
 Michael

You cannot conclude that, because corresponding elements of two
correlation matrices are correlated, the two sets of variables
that they were derived from are correlated (between the two sets).
Here (in R) is an example of the contrary. One set of 4 variables
(X1, X2, X3, X4) is such that there are correlations (by construction)
between X1, X2, X3, X4. The other set of four variables Y1, Y2, Y3, Y4
is constructed in the same way. But the four variables (X1,X2,X3,X4)
are independent of the four variables (Y1,Y2,Y3,Y4) and hence the two
sets are not correlated. Nevertheless, the correlation matrix
(analagous to your t1.t2) between the corresponding elements of the
two correlation matrices is in general quite high.

N - 500 ; Cors - numeric(N)
for(i in (1:500)){
  X1-rnorm(10);X2-rnorm(10);X3-rnorm(10);X4-rnorm(10)
V1-X1; V2-X1+X2; V3-X1+X2+X3; V4-X1+X2+X3+X4
D-cor(cbind(V1,V2,V3,V4))
  Y1-rnorm(10);Y2-rnorm(10);Y3-rnorm(10);Y4-rnorm(10)
W1-Y1; W2-Y1+Y2; W3-Y1+Y2+Y3; W4-Y1+Y2+Y3+Y4
E-cor(cbind(W1,W2,W3,W4))
  I - lower.tri(D)
  D.E - cor(cbind(D=D[I],E=E[I]))
  Cors[i] - D.E[2,1]
}
hist(Cors)

In any case, with the distribution of values exhibited by some of
your variables, even the primary correlation matrices are of
very questionable interpretability.

Indeed, I wonder, if these are real data, what on Earth (literally)
they are data on?

The ratio of the largest to the smallest of, for instance, your
var1 is (approx) 10^26.

A cup of small cup of Espresso coffee is about 60 grams (2 ounces).

60 gm * 10^26 = 6*(10^27) gm = 6*(10^24) kg,

which is the mass of the Earth.

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 10-Oct-08   Time: 11:40:50
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: MLE

2008-10-08 Thread Ted Harding

On 08-Oct-08 11:14:39, Ron Michael wrote:
 I made one typo in my previous mail.
 May I ask one statistics related question please? I have one query
 on MLE itself. It's property says that, for large sample size it is
 normally distributed [i.e. asymptotically normal]. On the other hand
 it is Consistent as well. My doubt is, how this two asymptotic
 properties exist simultaneously? If it is consistent then
 asymptotically it should collapse to truth i.e. for large sample
 size, variance of MLE should be zero. However asymptotic normality
 says, MLE have some distribution and hence variance.
 
 Can anyone please clarify me? Your help will be highly appreciated.

The false step in your argument is in the following:

 If it is consistent then asymptotically it should collapse to truth
  i.e. for large sample size, variance of MLE should be zero.

The first part would better expressed as:

  If it is consistent then asymptotically it should collapse
  *towards* truth

and indeed that is pretty well the definition of consistent.
More precisely:

1. Decide how close you want the MLE to be to the true value.
   (Say, for example, that this is 0.0001). You're not allowed
   to choose spot on (i.e. zero).

2. Decide how sure you want to be that it is that close
   (Say, for example, that you want to be 99.999% sure).
   You're not allowed to choose 100%.

3. Then you can find a sample size N (which may be very large,
   but you are being asymptotic so you can take as much as you need)
   such that, if the sample size it as least N, then

   Probability(|MLE - Truth|  0.0001)  0.9

N, of course, depends on the numbers you chose in (1) and (2).

Not that this does NOT say, anywhere, that the distribution
of the MLE has, for such an N, collapsed strictly to truth,
i.e. that the variance is zero. All that is implied is that the
variance is very small, sufficiently small for (3) to be true.

And that is all that consistency is saying: That, for large
enough N, you can be as sure as you wish (via variance as small
as you need) that the MLE is at least as close as you wish to
the true value. Consistency is not saying more than that.

Therefore the second part of your statement:

 i.e. for large sample size, variance of MLE should be zero.

is not true: you don't attain zero for any large sample size;
you can only get very close. (Except in certain very special
cases -- e.g. sampling 100 different items out of a population
of 100 items, i.e. without replacement, will give you exactly
the value of some quantity calculated on that population).

Hoping that helps!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 08-Oct-08   Time: 13:14:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using grep

2008-10-08 Thread Ted Harding

On 08-Oct-08 15:19:02, mentor_ wrote:
 Hi,
 I have a vector A with (200, 201, 202, 203, 204, ... 210) and
 a vector B with (201, 204, 209).
 Now I would like to get the position in vector A matches with
 the entries in vector B
 So what I want to have is the following result:
 [1] 2 5 10

First of all:

  A - (200:210)
  B-c(201, 204, 209)
  A
# [1] 200 201 202 203 204 205 206 207 208 209 210
  B
# [1] 201 204 209
  which(A %in% B)
# [1]  2  5 10

as desired.

 I tried the following:
 grep(B, A)
 
 grep(c(B), A)
 
 A - as.character(A)
 B - as.character(B)
 
 grep(B, A)
 grep(c(B), A)
 
 and several other combinations. But nothing is giving me the right
 result?!
 Does anyone know why?

In grep(pattern,x,...):
pattern: character string containing a regular expression
 (or character string for 'fixed = TRUE') to be matched
 in the given character vector. Coerced by 'as.character'
 to a character string if possible.

 x, text: a character vector where matches are sought, or an
  object which can be coerced by 'as.character' to a
  character vector.

you can clearly have 'x' as a vector of character strings, so
your as.character(A) is valid for 'x'.

But as.character(B) is neither a regular expression nor a
character string -- it is a vector of character strings.
So it is not valid for 'pattern'.
What seems to happen here is that grep() takes the first
element (201) of as.character(B), and uses this as the
regular expression (or character string):

  grep(A,c(ABC,BCD,CDE,EAB))
# [1] 1 4
# (as expected)

  grep(c(A,B),c(ABC,BCD,CDE,EAB))
# [1] 1 4
# (the same as the previous; B in c(A,B) is ignored)

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 08-Oct-08   Time: 17:43:03
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R seven years ago

2008-10-08 Thread Ted Harding

On 08-Oct-08 18:00:27, Liviu Andronic wrote:
 Hello everyone,
 
 As some may know, today Google unveiled its 2001 search index [1]. I
 was curious to see how was R like at that time, and was not
 disappointed. Compared to today's main page [2], seven years ago the
 page looked [3] a bit rudimentary, especially the graphic. (It is wort
 noting that structurally the pages are very similar.) What definitely
 changed is the `Contributed packages' section. Then R featured 29
 contributed packages [4], while now it features 1500+ [5]. It was
 surprising to realize the growth of R during the past seven years.
 
 Regards,
 Liviu
 
 [1] http://www.google.com/search2001.html
 [2] http://www.r-project.org/
 [3] http://web.archive.org/web/20010722202756/www.r-project.org/
 [4]
 http://web.archive.org/web/20010525004023/cran.r-project.org/bin/macos/c
 ontrib/src/
 [5] http://cran.at.r-project.org/web/packages/

Many thanks for this, Liviu! One might also compare the mailing list
usage:

[R-help 1997]:   484 messages
[R-help 2001]:  4309 messages
[R-help 2007]: 26250
   1721+1909+2196+2145+2210+2309+
   2142+2246+2028+2711+2602+2031

So we now get more posts in a week than we did in the whole of 1997!

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 08-Oct-08   Time: 19:34:28
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Statistically significant in linear and non-linear model

2008-10-07 Thread Ted Harding

On 07-Oct-08 17:46:52, Hsiao-nan Cheung wrote:
 Hi,
 I have a question to ask. if in a linear regression model, the
 independent variables are not statistically significant, is it
 necessary to test these variables in a non-linear model?
 Since most of non-linear form of a variable can be represented
 to a linear combination using Taylor's theorem,

That depends on the coefficients in the Taylor's series expansion.
It is quite possible to have the linear coefficient zero, and the
quadratic coefficient non-zero.

 so I wonder whether the non-linear form is also not statistically
 significant in such a situation.
 
 Best Regards
 Hsiao-nan Cheung
 2008/10/08

Example:

  X -  0.2*((-10):10)
  Y -  0.5*(X^2) + 0.2*rnorm(21)
  X2 - X^2

[A] Linear regression, Y on X:
  summary(lm(Y ~ X))$coef
#   Estimate Std. Error   t value Pr(|t|)
# (Intercept) 0.72840442  0.1554215 4.6866382 0.0001606966
# X   0.06570652  0.1283351 0.5119919 0.6145564688

So the coefficient of X is not significant.

[B] Quadratic regression, Y on X and X^2:
  summary(lm(Y ~ X + X2))$coef
#Estimate Std. Error t value Pr(|t|)
# (Intercept) 0.003425041 0.07203265  0.04754846 9.625997e-01
# X   0.065706524 0.03957727  1.66020864 1.141924e-01
# X2  0.494304121 0.03666239 13.48259513 7.570563e-11

So the coefficient of X is still not significant (P = 0.14),
but the coefficient of X^2 is *highly* significant!

So it all depends ... of course the original coefficients
(Taylor) could be anything.
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Oct-08   Time: 19:16:04
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sexy Little Number :-)-O

2008-10-05 Thread Ted Harding

On 05-Oct-08 10:26:29, Dr Eberhard W Lisse wrote:
 Got me the Asus EEE PC 1000 (the one with the 8 and 32 GIG solid state
 drives, and added a 4GiG SD for the swap space. Will add another Gig
 of RAM for a total of 2). Threw the old (Xandros) Linux off and the
 EEE specific Ubuntu 8.04.1 onto it. Got an atom Intel processor which
 apparently has two cores. Quite impressive...
 
 Does anyone know, off hand, how I set up the CRAN repository so it
 updates from the latest packages (2.7.2)?
 
 What happenend to the South African mirror by the way?
 
 greetings, el

I'll be interested in responses too! (Though mine is only the
EEE PC 900, still a nice little thing). And in expereinces of
replacing the Xandros with the Ubuntu.

Meanwhile, I was tickled by your subject line, so could not resist:

sexyNo-function(){
  for(i in (1:20)){
S-paste(makeNstr( ,i),0,makeNstr( ,(40-2*i)),--1\n,sep=)
cat(S); Sys.sleep(1)
  }
  cat(paste(makeNstr( ,21),0~1\n,sep=)); Sys.sleep(2)
  for(i in (1:9)){
cat(paste(makeNstr( ,22),|\n,sep=)); Sys.sleep(2)
  }
  cat(  ***!!! 0.1 !!!***\n)
}

sexyNo()

:-)
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 05-Oct-08   Time: 13:18:10
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [OOPS!]Sexy Little Number :-)-O

2008-10-05 Thread Ted Harding

[OOPS! By oversight I perpetrated a show-stopper! (Overlooked
 that I already had Hmisc loaded). Corrected below]

On 05-Oct-08 12:18:13, Ted Harding wrote:
 On 05-Oct-08 10:26:29, Dr Eberhard W Lisse wrote:
 Got me the Asus EEE PC 1000 (the one with the 8 and 32 GIG solid state
 drives, and added a 4GiG SD for the swap space. Will add another Gig
 of RAM for a total of 2). Threw the old (Xandros) Linux off and the
 EEE specific Ubuntu 8.04.1 onto it. Got an atom Intel processor which
 apparently has two cores. Quite impressive...
 
 Does anyone know, off hand, how I set up the CRAN repository so it
 updates from the latest packages (2.7.2)?
 
 What happenend to the South African mirror by the way?
 
 greetings, el
 
 I'll be interested in responses too! (Though mine is only the
 EEE PC 900, still a nice little thing). And in expereinces of
 replacing the Xandros with the Ubuntu.
 
 Meanwhile, I was tickled by your subject line, so could not resist:

library(Hmisc)
sexyNo-function(){
  for(i in (1:20)){
S-paste(makeNstr( ,i),0,makeNstr( ,(40-2*i)),--1\n,sep=)
cat(S); Sys.sleep(1)
  }
  cat(paste(makeNstr( ,21),0~1\n,sep=)); Sys.sleep(2)
  for(i in (1:9)){
cat(paste(makeNstr( ,22),|\n,sep=)); Sys.sleep(2)
  }
  cat(  ***!!! 0.1 !!!***\n)
}

sexyNo()

:-)
 Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 05-Oct-08   Time: 13:49:49
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sample mean in R

2008-10-05 Thread Ted Harding

On 05-Oct-08 20:00:00, dilian wrote:
 I am having issues with the following:
 
 (muhat = 1/n^2(sum of all the xi's) )
 
 essentially if xbar = the sample mean, muhat = sample mean but square
 the n. 
 
 Question:
 Use R to run a Monte Carlo simulation which compares the finite-sample
 performance of xbar and muhat. Specifically generate 1000 samples n=30
 from a standard normal distribution. For each sample calculate xbar and
 muhat. I have no problem calculating the mean of the xbar's - however
 I cannot figure out how to set up the muhat variable and find the
 means. My code is as follows:
 
# R code starts here
 rm(list=ls()) 
 set.seed(100) 
 
 n-30 
 s-1000 
 
 xbar-rep(0,s) 
 muhat-rep(0,s)
 
 for (i in 1:s) {
 x-rnorm(0,n=10)
 xbar[i]-mean(x) 
 muhat[i]-mean(x^(-1/2))
 }

The line muhat[i]-mean(x^(-1/2)) is anomalous -- in more than
one way! [1] It does not match up with your stated definition
of muhat (there is no x^(-1/2) there); [2] x^(-1/2) is going
to give a bad result for negative values of x anyway (as will
be the case with your rnorm(0,n=10)).

To achieve what you defined as muhat, surely

  muhat[i] - mean(x)/n

(where n - length(x) somewhere, or simply n - 10).

But in any case I am wondering why you are interested in that
muhat = 1/n^2(sum of all the xi's) definition of muhat.
Part of your message seems to be going into one ear, and part into
my other; when they meet in the middle, they compare notes and
being to wonder if you are getting Mean mixed up with Standard
Error (SE^2 = var(x)/n).

Hmmm.
Hoping this helps,
Ted.

 cat(Estimated mean of xbar:,mean(xbar),\n)
 cat(Estimated mean of muhat:,mean(muhat),\n)
 
 Any help would be greatly appreciated.
 -- 
 View this message in context:
 http://www.nabble.com/Sample-mean-in-R-tp19828546p19828546.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 05-Oct-08   Time: 21:37:21
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is the meaning of segfault 'memory not mapped' ?

2008-10-05 Thread Ted Harding

One thing I have often done with code (including C and R) that
falls through the ice unexpectedly is to plant (as a temporary
measure) debug prints which emit information about where they
are in the program, how many times they have gone round a particular
loop, values of any potentially suspect variables, etc.

Then you can look at the tail end of what they print and begin to
localise the place where things go wrong. Having approximately
located it, one can then replace the debug prints with more
narrowly focussed ones. Often, just a few at a time will work fine.

One can often wrap such a print into a function. Exactly how it
is best done depends on how the code is written, and on what
seems to be going wrong.

Hoping this helps,
Ted.


On 05-Oct-08 20:22:53, Thomas Lumley wrote:
 On Fri, 3 Oct 2008, Ubuntu.Diego wrote:
 I'm trying to get some easy coding to reproduce the error. In the
 meantime I have R code that run for 20 or more hours and suddenly i
 got
 a segfault 'memory not mapped' error. I have to clarify that the
 error
 not alway occurs and sometimes the process end satisfactorily. The
 process is basically a search using an MCMC strategy, sometimes the
 algorithm converge and stops others I got the error message. I was
 wondering if it could be meaning that I run out of RAM.
 
   The 'memory not mapped' error means that your code is reading from or
 writing to memory that it doesn't own. This should have nothing to do
 with 
 running out of RAM.
 
 If this is pure R code you have found a bug in R. If you are calling
 your 
 own C code it is more likely to be a bug there. If your C code uses 
 PROTECT/UNPROTECT it is even more likely to be there.
 
 Usually the best way to track down these bugs is to run the code under 
 Valgrind, but this is slow, which will be a problem for code that
 already 
 takes many hours.
 
 If you are extremely lucky, the crash will have happened on the first 
 incorrect memory access. Running R under a debugger such as gdb you can
 use backtrace after the crash occurs to find where the bug happened.
 Unfortunately it is quite likely that the crash happened because some 
 earlier piece of code overwrote a stored variable or something similar.
 The fact that the segfault occurs only some of the time reinforces
 this, 
 especially if you don't always get the crash even with the same
 sequence 
 of random numbers.
 
 If you have a Linux computer that you can use for a week or so it would
 be 
 worth running your code under Valgrind to see if it finds the problem.
 
 
   -thomas
 
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED]  University of Washington, Seattle
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 05-Oct-08   Time: 21:46:27
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to get the day of the year

2008-10-01 Thread Ted Harding

On 01-Oct-08 10:44:10, [EMAIL PROTECTED] wrote:
 I am new to R and I would like to get the day of the year from 
 numerous data in the following form:  %a %d/%b/%Y %H:%M:%S (for 
 example from Tu 10/Jul/2007 21:59:13 I would like to get 191)
 
 Whatever I try, I get NAs.
 
 The problem is th 'Tu' bit.  Abbreviated day-of-week names and three 
 letters long.  You'll have to replace/ delete them.
 
 datestr - Tu 10/Jul/2007 21:59:13
 newdatestr - sub(Tu, Tue, datestr)
 dateval - strptime(newdatestr, %a %d/%b/%Y %H:%M:%S)
 dateval$yday#190

In addition to the above: BEWARE that strptime(...)$yday counts
upwards from 0 to 364 (normal year) or 365 (leap year). In other
words, 1 Jan is day 0.

Hence Richard got the result 190 where you expected 191.
So add 1 to get the answer everyone expects!

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 01-Oct-08   Time: 12:24:12
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pattern match in R

2008-09-30 Thread Ted Harding

On 30-Sep-08 14:36:04, bioinformatics_guy wrote:
 I want to make sure this piece of code I wrote is doing what
 I want it to do.
 
 ll-function(string)
 {
   grep(string,dir(),value=T)
 }
 
 subdir = ll(Coverage_[1-9][0-9]$)
 
 I basically wrote a little function that would grab all the
 files of form Coverage_[0-99]
 
 The way I wrote it, will it grab Coverage_5 or does it have to have 2
 numbers (10-99)?

I think you want Coverage_[1-9]*[0-9]$, since your form will
only
match Coverage_mn where m is one of 1-9 and n is one of 0-9.
The * means zero or any number of ... .

Example (command-line grep in Linux):

grep 'Coverage_[1-9]*[0-9]$'  EOT
 Coverage
 Coverage_5
 Coverage_01
 Coverage_19
 Coverage_19q
 EOT
Coverage_5
Coverage_19

Note that this does not catch Coverage_01 because [1-9]
does not include 0. Use [0-9] here as well if you need this.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 30-Sep-08   Time: 16:43:01
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quantile

2008-09-29 Thread Ted Harding

On 29-Sep-08 20:09:14, liujb wrote:
 Hello,
 I need to assign a number to each x[i], i=1:100, based on the
 value of x[i] and where they are in the distribution of x[i].
 For example 1 for x[4] means x[4] is below 25%. I can obtain
 the quantile using quantile command, and just loop through the
 1:100 and assign the correct number. But I was just wondering
 whether there are a more efficient way.
 
 Thank you,
 sincerely,
 Julia

Well, you can certainly do it with a much shorter loop!

set.seed(31425)
x   - rnorm(13)
x.v - numeric(length(x))
ix  - order(x)
Q - quantile(x)
for(i in (5:2)){ x.v[x=Q[i]] - (i-1) }

cbind(x,x.v, x[ix],x.v[ix])
   x x.v 
 [1,] -0.7565336   2 -1.7045077 1
 [2,] -0.3287683   2 -1.0693801 1
 [3,] -1.7045077   1 -1.0671752 1
 [4,]  0.7259883   4 -0.9718954 1
 [5,]  0.6174724   3 -0.7565336 2
 [6,] -1.0693801   1 -0.3668566 2
 [7,]  1.9826596   4 -0.3287683 2
 [8,] -0.9718954   1  0.2491123 3
 [9,] -1.0671752   1  0.4733287 3
[10,] -0.3668566   2  0.6174724 3
[11,]  0.2491123   3  0.7259883 4
[12,]  0.4733287   3  1.9826596 4
[13,]  2.2095536   4  2.2095536 4

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 29-Sep-08   Time: 22:33:33
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression and data types

2008-09-27 Thread Ted Harding

On 27-Sep-08 15:18:32, Gavin Simpson wrote:
 On Fri, 2008-09-26 at 13:17 +0100, GRANT Lewis wrote:
 Dear All
 I have three data sets, X1, X2 and Y. X1 is data, X2 and Y were
 generated in (different) R programs. All three vectors have one
 column of 60 data points.
 I am using the code lm(Y~X1)$coef and lm(Y~X2)$coef. 
 
 Others have replied with an answer to your question. I just wanted
 to suggest you don't rummage around in R model objects taking what
 you like using '$'. 9 times out of 10 you'll get what you want, but
 that last remaining time will have you rubbing your head in confusion
 at best, at worst, answers may be just plain wrong.
 
 Use extractor functions, such as coef() instead:
 
 coef(lm(Y ~ X1))
 
 G

Surely, if you read (for example) in ?lm that:

Value:
[...]
 An object of class 'lm' is a list containing at least the
 following components:

 coefficients: a named vector of coefficients
 [...]

then (subject to using enough of the name to give a unique partial
matching, as is the case here) you should find that

  lm(...)$coef

returns what ?lm says!

Even with more complex model fitting, such as lme (in 'nlme') you
should still get what ?lme -- ?lmeObject says:

coefficients: a list with two components, 'fixed' and 'random', where
  the first is a vector containing the estimated fixed effects
  and the second is a list of matrices with the estimated
  random effects for each level of grouping. For each matrix in
  the 'random' list, the columns refer to the random effects
  and the rows to the groups.

Since coef() is a gneric function, you might expect coef(lme(...))
to give the same; or even perhaps give a more easily interpreted
presentation. But there's nothing wrong with lme(...)$coef provided
you're fully aware of what it is (as explained in the 'help')..

BUT: In the case of lme, if you run the example:

  fm1 - lme(distance ~ age, data = Orthodont) # random is ~ age

then fm1$coef really does return a list with components:

  $fixed
  (Intercept) age 
   16.761   0.6601852 

  $random
  $random$Subject
  (Intercept)  age
  M16  -0.1877576 -0.068853674
  [...]
  F04   1.0691628 -0.029862143
  F11   1.2176440  0.083191188

whereas if you do coef(fm1) you will get only:

  (Intercept)   age
  M1616.57335 0.5913315
  [...]
  F0417.83027 0.6303230
  F1117.97876 0.7433764

so (a) not only do you NOT get the fixed effects part,
but (b) what you might interpret as the random effects part
has very different values from the $random component of
fm1$coef! Well, there must be some explanation for this, mustn't
there? But, in ?lmeObject, I find ONLY:

 Objects of this class have methods for the generic functions 
 'anova', 'coef', [...]

so I can't find out from there what coef(fm1) does; and if I look
in ?lme again I find only:

 The functions 'resid', 'coef',
 'fitted', 'fixed.effects', and 'random.effects'  can be used to
 extract some of its components.

so that doesn't tell me much (and certainly not why I get the
different results). And ?coef doesn't say anything specific either.

That being the case, I would, myself, stick with fm1$coef.
At least, then I know what I'm getting. With coef(fm1), I don't.

Of course, summary(fm1) is informative in its own way; but that's a
different matter; my point is that coef() doesn't necessarily give
you what you might expect, while (...)$coef does -- since it is
explained in full in ?lmeObject. This is, in fact, the opposite
conclusion to that drawn by Gavin!

Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 27-Sep-08   Time: 18:09:54
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

< 3 4 5 6 7 8 9 10 >

701 - 800 of 975 matches

Mail list logo