[R] bayesm - question about 'rscaleUsage function'

2007-08-07 Thread paulandpen
Hi all,

I have managed to get the r-scale usage algorithm to work, but I need to obtain 
the final results from this.  As I understand it, this code is designed to 
generate a matrix after processing and store it somewhere?  

Here is the code.

I get this part of the code, it all makes sense.

##
if(nchar(Sys.getenv(LONG_TEST)) != 0) {R=1000} else {R=5} 
{
data(customerSat)
surveydat = list(k=10,x=as.matrix(customerSat))
Mcmc1 = list(R=R)
set.seed(66)
out=rscaleUsage(Data=surveydat,Mcmc=Mcmc1)
summary(out$mudraw)
}

My question is how do I retrieve the results from this in a matrix format

I want to extract the matrix in its complete form and save this as a file for 
further processing.

What I want to get is.

V1 V2  V3  V4
V1   1.470.991.141.56 
V2   1.670.931.351.21 

If I type 'out' see below in the GUI interface I get the following:

out
   [,1288] [,1289] [,1290] [,1291] ...
[3,]1.470.991.141.56
[4,]1.170.960.631.32 
[5,]1.650.510.391.62... 

summary(out)
   Length Class  Mode   
Sigmadraw   500   bayesm.var numeric
mudraw   50   bayesm.mat numeric
taudraw9055   bayesm.mat numeric
sigmadraw  9055   bayesm.mat numeric
Lambdadraw   20   bayesm.mat numeric
edraw 5   bayesm.mat numeric

Thanks Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to convert decimal date to its equivalent date format(YYYY.mm.dd.hr.min.sec)

2007-08-07 Thread Yogesh Tiwari
Hello R Users,

How to convert decimal date to date as .mm.dd.hr.min.sec

For example, I have decimal date in one column , and want to convert and
write it in equivalent date(.mm.dd.hr.min.sec) in another next six
columns.

1979.00

1979.020833

1979.041667

1979.062500



Is it possible in R ?

Kindly help,

Regards,

Yogesh





-- 
Dr. Yogesh K. Tiwari,
Scientist,
Indian Institute of Tropical Meteorology,
Homi Bhabha Road,
Pashan,
Pune-411008
INDIA

Phone: 0091-99 2273 9513 (Cell)
 : 0091-20-258 93 600 (O) (Ext.250)
Fax: 0091-20-258 93 825

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About grep

2007-08-07 Thread Vladimir Eremeev


Shao wrote:
 
 Hi,everyone.
 
 I have a problem when using the grep.
 for example:
 a - c(aa,aba,abac)
 b- c(ab,aba)
 
 I want to match the whole word,so
 grep(^aba$,a)
 it returns 2
 
 but when I used it a more useful way:
 grep(^b[2]$,a),
 it doesn't work at all, it can't find it, returning integer(0).
 
 How can I chang the format in the second way?
 
 Thanks.
 
The regexp ^b[2]$ matches only two words: b2 and b (in your present
grep call with defaults
extended = TRUE,  perl = FALSE, fixed = FALSE).

They don't present in a, that's why grep returns integer(0).
If you want to find the word b[2] (the expression similar to an array
indexing operator), either use fixed=TRUE and remove any regexp markup (^
and $)

 a- c(aa,aba,abac,b[2])
 grep(b[2],a,fixed=TRUE)
[1] 4

, or use escapes

 grep(^b\\[2\\]$,a)
[1] 4

-- 
View this message in context: 
http://www.nabble.com/About-grep-tf4228021.html#a12029243
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About grep

2007-08-07 Thread Ted Harding
On 07-Aug-07 04:20:27, Shao wrote:
 Hi,everyone.
 
 I have a problem when using the grep.
 for example:
 a - c(aa,aba,abac)
 b- c(ab,aba)
 
 I want to match the whole word,so
 grep(^aba$,a)
 it returns 2
 
 but when I used it a more useful way:
 grep(^b[2]$,a),
 it doesn't work at all, it can't find it, returning integer(0).
 
 How can I chang the format in the second way?
 
 Thanks.
 
 -- 
 Shao

The problem is that in the string ^b[2]$ the element b[2] of b
is not evaluated, but simply the successive characters ^ b [ 2 ] $
are passed to grep as a character string, which of course is not
found. You can construct a character string with the value of b[2]
in it by using paste():

grep(paste(^,b[2],$,sep=),a)
[1] 2

Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 07:43:14
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bayesm - question about 'rscaleUsage function'

2007-08-07 Thread Dieter Menne
 paulandpen at optusnet.com.au writes:

 
 I get this part of the code, it all makes sense.
 
 ##
 if(nchar(Sys.getenv(LONG_TEST)) != 0) {R=1000} else {R=5} 
 {
 data(customerSat)
 surveydat = list(k=10,x=as.matrix(customerSat))
 Mcmc1 = list(R=R)
 set.seed(66)
 out=rscaleUsage(Data=surveydat,Mcmc=Mcmc1)
 summary(out$mudraw)
 }
 
 My question is how do I retrieve the results from this in a matrix format
 

str() is your friend...

-
if(nchar(Sys.getenv(LONG_TEST)) != 0) {R=1000} else {R=5} 
{
data(customerSat)
surveydat = list(k=10,x=as.matrix(customerSat))
Mcmc1 = list(R=R)
set.seed(66)
out=rscaleUsage(Data=surveydat,Mcmc=Mcmc1)
summary(out$mudraw)
}
 str(out)

List of 6
 $ Sigmadraw : bayesm.var [1:5, 1:100] 7.37 7.04 5.56 5.00 4.50 ...
  ..- attr(*, class)= chr [1:3] bayesm.var bayesm.mat mcmc
  ..- attr(*, mcpar)= num [1:3] 1 5 1
 $ mudraw: bayesm.mat [1:5, 1:10] 6.21 6.54 6.49 6.56 6.59 ...
  ..- attr(*, class)= chr [1:2] bayesm.mat mcmc
  ..- attr(*, mcpar)= num [1:3] 1 5 1
 $ taudraw   : bayesm.mat [1:5, 1:1811]  0.1862 -1.4988 -1.0348  0.0754  1.0258 
...
  ..- attr(*, class)= chr [1:2] bayesm.mat mcmc
  ..- attr(*, mcpar)= num [1:3] 1 5 1
 $ sigmadraw : bayesm.mat [1:5, 1:1811] 1.11 1.29 1.47 1.2 1.95 0.72 0.63 0.69
1.2 1.14 ...
  ..- attr(*, class)= chr [1:2] bayesm.mat mcmc
  ..- attr(*, mcpar)= num [1:3] 1 5 1
 $ Lambdadraw: bayesm.mat [1:5, 1:4] 3.35 2.73 2.58 2.49 2.73 ...
  ..- attr(*, class)= chr [1:2] bayesm.mat mcmc
  ..- attr(*, mcpar)= num [1:3] 1 5 1
 $ edraw :Classes 'bayesm.mat', 'mcmc'  atomic [1:5] 0.000 0.001 0.001 0.001
0.001
  .. ..- attr(*, mcpar)= num [1:3] 1 5 1
 
Looks like you want sigmadraw (lowercase!)? So try

out$sigmadraw

Dieter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cannot add lines to plot

2007-08-07 Thread Zeno Adams

Hello,

I want to plot a time series and add lines to the plot later on.
However, this seems to work only as long as I plot the series against
the default index. As soon as I plot against an object
of class chron or POSIXt (i.e. I want to add a date/time axis), the
lines do not appear anymore. The command to add the lines is executed
without an error message.

(THIS DOES NOT ADD THE LINES)
plot(datum2[(3653):(3653+i)],dlindus[(3653):(3653+i)], col
=hcl(h=60,c=35,l=60), ylim=c(-8,8), type = l, xlab=(),
ylab=(Return), main = (Industry))
lines(gvarindus, type=l, lwd=2)
lines(quantindustlow, col =black, type = l,lty=3)
lines(quantindusthigh, col =black, type = l,lty=3)

(THIS ADDS THE LINES, but then I dont have an date axis)
 plot(dlindus[(3653):(3653+i)], col =hcl(h=60,c=35,l=60), ylim=c(-8,8),
type = l, xlab=(), ylab=(Return), main = (Industry))
lines(gvarindus, type=l, lwd=2)
lines(quantindustlow, col =black, type = l,lty=3)
lines(quantindusthigh, col =black, type = l,lty=3)

This sounds like a fairly simple problem, but I cannot find any answer
in the R-help archives.

Thanks alot.

Zeno

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help on glmmML

2007-08-07 Thread Fluss
Hello!
I am using glmmML for a logitic regression with random effect.
I use the posterior.mode as an estimate for the random effects.
These can be very different from the estimates obtained using SAS , NLMIXED
in the random with out= option. (all the fixed and standard error of random
effect estimators are almost identical)
Can someone explain to me why is that.

The codes I use:
R:
glmm1-glmmML(mort30 ~ x , data=dat2,cluster=hospital,family=binomial)
print(sort(glmm1$posterior.mode))

SAS:
*

proc* *nlmixed* data*=*dat*;*

eta = b0 + b1*x+ u;

expeta = exp(eta);

p = expeta/(*1*+expeta);

model mort30 ~ binomial(*1*,p);

random u ~ normal(*0*,s2) subject=hospital out=blue;

*run*;
*

proc* *sort* data=blue;by estimate;
*

proc* *print* data=blue;*run*;

**

*THANKS FOR THE HELP*

*RON*

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot add lines to plot

2007-08-07 Thread Achim Zeileis
On Tue, 7 Aug 2007, Zeno Adams wrote:


 Hello,

 I want to plot a time series and add lines to the plot later on.
 However, this seems to work only as long as I plot the series against
 the default index. As soon as I plot against an object
 of class chron or POSIXt (i.e. I want to add a date/time axis), the
 lines do not appear anymore. The command to add the lines is executed
 without an error message.

1. Read the posting guide!
2. Provide a reproducible example (as suggested in the posting guide).
3. Use a time series object for your time series, see e.g. package zoo.
   If you set up a multivariate time series, you can get what you want by
 plot(z, plot.type = single)
   See the zoo vignettes for more information.
4. If you really want to modify the code below, you probably need to
   repeat (the right elements of) datum2 in lines(), e.g.,
 lines(datum2[...], gvarindus, lwd = 2)
   and you should really drop unnecessary brackets as in
 ylab = (Return)

hth,
Z

 (THIS DOES NOT ADD THE LINES)
 plot(datum2[(3653):(3653+i)],dlindus[(3653):(3653+i)], col
 =hcl(h=60,c=35,l=60), ylim=c(-8,8), type = l, xlab=(),
 ylab=(Return), main = (Industry))
 lines(gvarindus, type=l, lwd=2)
 lines(quantindustlow, col =black, type = l,lty=3)
 lines(quantindusthigh, col =black, type = l,lty=3)

 (THIS ADDS THE LINES, but then I dont have an date axis)
  plot(dlindus[(3653):(3653+i)], col =hcl(h=60,c=35,l=60), ylim=c(-8,8),
 type = l, xlab=(), ylab=(Return), main = (Industry))
 lines(gvarindus, type=l, lwd=2)
 lines(quantindustlow, col =black, type = l,lty=3)
 lines(quantindusthigh, col =black, type = l,lty=3)

 This sounds like a fairly simple problem, but I cannot find any answer
 in the R-help archives.

 Thanks alot.

 Zeno

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Opening a script with the R editor via file association (on Windows)

2007-08-07 Thread S Ellison
Does ?Rscript help?



 Christopher Green [EMAIL PROTECTED] 06/08/2007 23:13:47 
On Fri, 3 Aug 2007, Duncan Murdoch wrote:

 On 03/08/2007 7:16 PM, Christopher Green wrote:
 Is there an easy way to open an R script file in the R Editor found in
 more recent versions of R on Windows via a file association? I am looking
 for functionality akin to how the .ssc file extension works for S-Plus:
 upon double-clicking on a .R file, Rgui.exe starts up and loads the 
 script file in the R Editor.
 
 As far as I can tell, Rgui.exe does not have a command line option to load
 a file (so associating .R with Rgui.exe %1 won't work). I can get 
 Windows
 to start Rgui.exe when I double-click on a script file, but that's about 
 it.
 I also don't see any way to have Rgui.exe evaluate an expression provided 
 on
 the command line (which would allow file.edit(%1)). I also thought about
 creating a custom .First calling 'file.edit' in a batch file, but then I'd 
 have
 to start R, load the .First file, then quit and restart R, so that's out.
 
 Is there an easy way to do this without resorting to some other R GUI?

 Easy?  Not sure.  But the following works:

 Set up the association as

 F:\R\R-2.5.1\bin\Rgui.exe --args %1

 (with obvious modifications to the path) and put this line in your 
 RHOME/etc/Rprofile.site file:

 utils::file.edit(commandArgs(TRUE))

 You could make things more sophisticated if you don't want a blank edit 
 window to open in case you're not using this shortcut.

 Duncan Murdoch


Thanks to Duncan for pointing me towards commandArgs()...I was able to 
modify things so that the script editor only opens when there is a file to edit.
Essentially, there are two steps:

1. Add the line

if ( length(z - commandArgs(TRUE)) ) utils::file.edit(z[1])

to the Rprofile.site file; and

2. Edit the registry to associate the .R extension with the command

C:\Program Files\R\R-2.5.1\bin\Rgui.exe --args %1

I have written up some instructions on how to do this for the faculty 
member who asked me about doing this; others may find them useful.

http://www.stat.washington.edu/software/statsoft/R/Rwinassoc.shtml 

Chris Green

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

***
This email contains information which may be confidential and/or privileged, 
and is intended only for the individual(s) or organisation(s) named above. If 
you are not the intended recipient, then please note that any disclosure, 
copying, distribution or use of the contents of this email is prohibited. 
Internet communications are not 100% secure and therefore we ask that you 
acknowledge this. If you have received this email in error, please notify the 
sender or contact +44(0)20 8943 7000 or [EMAIL PROTECTED] immediately, and 
delete this email and any attachments and copies from your system. Thank you. 

LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex TW11 0LY, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Naming Lists

2007-08-07 Thread Tom.O

Hi
Im pretty new to R and I have run in to a problem. How do I name all the
levels in this list.

Lev1 - c(A1,A2)
Lev2 - c(B1,B2)
Lev3 - c(C1,C2)

MyList - lapply(TEST1,function(x){
assign(x,x,where=1)
lapply(TEST2,function(y){
assign(y,y,where=1)
lapply(TEST3,function(z){
paste(unlist(x),unlist(y),unlist(z))
})})})

I would like to name the different levels in the list with the differnt
inputs.
So that the first level in named by the inputs in Lev1 and the second Level
by the inputs in Lev2, and so on

I would then like to use this call

MyList$A1$B1$C1
 A1 B1 C1

Regards Tom
-- 
View this message in context: 
http://www.nabble.com/Naming-Lists-tf4228968.html#a12030717
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot add lines to plot

2007-08-07 Thread Prof Brian Ripley
On Tue, 7 Aug 2007, Zeno Adams wrote:


 Hello,

 I want to plot a time series and add lines to the plot later on.
 However, this seems to work only as long as I plot the series against
 the default index. As soon as I plot against an object
 of class chron or POSIXt (i.e. I want to add a date/time axis), the
 lines do not appear anymore. The command to add the lines is executed
 without an error message.

 (THIS DOES NOT ADD THE LINES)
 plot(datum2[(3653):(3653+i)],dlindus[(3653):(3653+i)], col
 =hcl(h=60,c=35,l=60), ylim=c(-8,8), type = l, xlab=(),
 ylab=(Return), main = (Industry))
 lines(gvarindus, type=l, lwd=2)
 lines(quantindustlow, col =black, type = l,lty=3)
 lines(quantindusthigh, col =black, type = l,lty=3)

 (THIS ADDS THE LINES, but then I dont have an date axis)
 plot(dlindus[(3653):(3653+i)], col =hcl(h=60,c=35,l=60), ylim=c(-8,8),
 type = l, xlab=(), ylab=(Return), main = (Industry))
 lines(gvarindus, type=l, lwd=2)
 lines(quantindustlow, col =black, type = l,lty=3)
 lines(quantindusthigh, col =black, type = l,lty=3)

 This sounds like a fairly simple problem, but I cannot find any answer
 in the R-help archives.

Look at the help for lines: the standard call is lines(x, y, col =black, 
type = l,lty=3) and you have omitted x.  See ?xy.coords for what 
happens then.

I think the reason you did not find this in the archives is that this is a 
rare misreading (or non-reading) of the help pages.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting series of matrices on a single plot.

2007-08-07 Thread Urmi Trivedi
Dear experts,

I have in all 14 matrices which stands for gene expression divergence and 14 
matrices which stands for gene sequence divergence. I have tried joining them 
by using the concatanation function giving

SequenceDivergence - c(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,X13,X14)
ExpressionDivergence - c(Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8,Y9,Y10,Y11,Y12,Y13,Y14) 

where X1,X2..X14 are the expression matrices containing r-values and 
Y1,Y2..Y14 are the ones with patristic distances

Now, I want to plot SequenceDivergence vs. Expression Divergence

Tried doing that using plot (Sequence Divergence vs. Expression Divergence)

But then getting the error 

 Error in as.double.default(x) : (list) object cannot be coerced to 'double'

Note: The diagonal values for X1 was neglected as its seems to give some bias 
in my results. Code to derive X1,X2,X3..and Y1,Y2,Y3 is given as:

 EDant - read.table(C:/ant.txt,sep=\t)
 ED1 - as.matrix(EDant)
 X1 - lapply(1:ncol(ED15), function(a) ED15[-a, a]) # to neglect diagonal 
 values 

I am attaching couple of matrices for your reference. ANT.txt is X1 after 
deleting the diagnol values, similarly,ANTEXP.TXT is Y1,  apetella.txt is X2 
and 
apetellaexp.txt is Y2

I tried reading R-manual and other sources but have'nt cracked it yet. Can 
anyone please help me regarding that?
 
 Thanks very much
 
 Yours sincerely,
 
 Urmi Trivedi. 








   
-
 Once upon a time there was 1 GB storage in your inbox. Click here for happy 
ending.0   1.21133 1.4202211.4993581.4751491.513886
1.6088161.6759161.8590381.9929582.420123
2.3094142.413222
1.21133 0   1.4635851.5427221.5185131.55725 1.65218 
1.71928 1.9024022.0363222.4634872.3527782.456586
1.4202211.4635850   1.4186191.39441 1.433147
1.5280771.5951771.7782991.9122192.339384
2.2286752.332483
1.4993581.5427221.4186190   0.931235
0.9699721.3544981.4215981.60472 1.73864 2.165805
2.0550962.158904
1.4751491.5185131.39441 0.9312350   0.206013
1.3302891.3973891.5805111.7144312.141596
2.0308872.134695
1.5138861.55725 1.4331470.9699720.2060130   
1.3690261.4361261.6192481.7531682.180333
2.0696242.173432
1.6088161.65218 1.5280771.3544981.330289
1.3690260   0.6049721.5376961.671616
2.0987811.9880722.09188
1.6759161.71928 1.5951771.4215981.397389
1.4361260.6049720   1.6047961.738716
2.1658812.0551722.15898
1.8590381.9024021.7782991.60472 1.580511
1.6192481.5376961.6047960   1.09121 1.968427
1.8577181.961526
1.9929582.0363221.9122191.73864 1.714431
1.7531681.6716161.7387161.09121 0   2.102347
1.9916382.095446
2.4201232.4634872.3393842.1658052.141596
2.1803332.0987812.1658811.9684272.102347
0   1.1287011.232509
2.3094142.3527782.2286752.0550962.030887
2.0696241.9880722.0551721.8577181.991638
1.1287010   0.91546
2.4132222.4565862.3324832.1589042.134695
2.1734322.09188 2.15898 1.9615262.0954461.232509
0.91546 0
1   -0.046373   -0.033044   0.0039020.466827
0.3893270.131989-0.063346   -0.118093   0.016386
-0.140215   0.34511 0.233074
-0.046373   1   0.0007170.8322270.077053
-0.106815   -0.050838   0.9364310.2973870.031358
-0.127265   0.31245 -0.053169
-0.033044   0.0007171   0.011265-0.018602   
-0.023669   -0.028744   -0.007588   0.148827-0.007067   
0.005362-0.029172   -0.017367
0.0039020.8322270.0112651   0.071247
-0.101225   0.0971170.8291670.2943960.017809
-0.129199   0.278831-0.106427
0.4668270.077053-0.018602   0.0712471   
0.0782410.1406790.0700690.0557440.013194
-0.196092   -0.027692   0.028857
0.389327-0.106815   -0.023669   -0.101225   0.078241
1   0.140146-0.086806   0.079   0.194904  

Re: [R] Goodman-Kruskal tau

2007-08-07 Thread Antti Arppe
On Wed, 1 Aug 2007, Upasna Sharma [EMAIL PROTECTED] wrote:
 From: Upasna Sharma [EMAIL PROTECTED]
 Subject: [R] Goodman Kruskal's tau

 I need to know which package in R calculates the Goodman Kruskal's
 tau statistic for nominal data. Also is there any implementation for
 multiple classification analysis (Andrews at al 1973) in R? Any
 information on this would be greatly appreciated.

As far as I know, the Goodman-Kruskal tau has not yet been implemented
in any package, but I've coded it as a function for my own research
purposes following Liebetrau 1983, see the code attached below (save
it and load it into R with the source-command. You simply provide the
function with a matrix 'x' with the nominal data, and get the values
of the G-K tau as well as the significance values and asymptotic
standard errors, calculated in both directions, i.e. GK.tau(dat=x) or
GK.tau(x), e.g.

 source(GK.tau.R)
 x
 [,1] [,2] [,3]
[1,]   30   101
[2,]2   205
[3,]1   10   15
 GK.tau(x)
 tau.RCtau.CR p.tau.RC p.tau.CR  var.tau.RC ASE.tau.RC
1 0.3522439 0.3219675 2.002024e-13 3.065436e-12 0.004584542 0.06770924
  ASE.tau.CR
1 0.07004566

GK.tau - function(dat)
{ N - sum(dat);
  dat.rows - nrow(dat);
  dat.cols - ncol(dat);
  max.col - sum.col - L.col - matrix(,dat.cols);
  max.row - sum.row - L.row - matrix(,dat.rows);
  for(i in 1:dat.cols)
 { sum.col[i] - sum(dat[,i]); max.col[i] - max(dat[,i]); }
  for(i in 1:dat.rows)
 { sum.row[i] - sum(dat[i,]); max.row[i] - max(dat[i,]); }
  max.row.margin - max(apply(dat,1,sum));
  max.col.margin - max(apply(dat,2,sum));

# Goodman-Kruskal tau
# Tau Column|Row
  n.err.unconditional - N^2;
  for(i in 1:dat.rows)
 n.err.unconditional - n.err.unconditional-N*sum(dat[i,]^2/sum.row[i]);
  n.err.conditional - N^2-sum(sum.col^2);
  tau.CR - 1-(n.err.unconditional/n.err.conditional);
  v - n.err.unconditional/(N^2);
  d - n.err.conditional/(N^2);
  f - d*(v+1)-2*v;
  var.tau.CR - 0;
  for(i in 1:dat.rows)
 for(j in 1:dat.cols)
var.tau.CR - var.tau.CR + 
dat[i,j]*(-2*v*(sum.col[j]/N)+d*((2*dat[i,j]/sum.row[i])-sum((dat[i,]/sum.row[i])^2))-f)^2/(N^2*d^4);
  ASE.tau.CR - sqrt(var.tau.CR);
  z.tau.CR - tau.CR/ASE.tau.CR;
  U.tau.CR - (N-1)*(dat.cols-1)*tau.CR; # Chi-squared approximation for H0 
according to Margolin  Light 1974, see also Liebetrau 1983
  p.tau.CR - pchisq(U.tau.CR,df=(dat.rows-1)*(dat.cols-1),lower=FALSE);
# Tau Row|Column
  n.err.unconditional - N^2;
  for(j in 1:dat.cols)
 n.err.unconditional - n.err.unconditional-N*sum(dat[,j]^2/sum.col[j]);
  n.err.conditional - N^2-sum(sum.row^2);
  tau.RC - 1-(n.err.unconditional/n.err.conditional);
  v - n.err.unconditional/(N^2);
  d - n.err.conditional/(N^2);
  f - d*(v+1)-2*v;
  var.tau.RC - 0;
  for(i in 1:dat.rows)
 for(j in 1:dat.cols)
var.tau.RC - var.tau.RC + 
dat[i,j]*(-2*v*(sum.row[i]/N)+d*((2*dat[i,j]/sum.col[j])-sum((dat[,j]/sum.col[j])^2))-f)^2/(N^2*d^4);
  ASE.tau.RC - sqrt(var.tau.RC);
  z.tau.RC - tau.CR/ASE.tau.RC;
  U.tau.RC - (N-1)*(dat.rows-1)*tau.RC; # Chi-squared approximation for H0 
according to Margolin  Light 1974, see also Liebetrau 1983
  p.tau.RC - pchisq(U.tau.RC,df=(dat.rows-1)*(dat.cols-1),lower=FALSE);
  results - data.frame(tau.RC, tau.CR, p.tau.RC, p.tau.CR, var.tau.RC, 
ASE.tau.RC, ASE.tau.CR);
  return(results);
}

Hope this helps.

-Antti
-- 
==
Antti Arppe - Master of Science (Engineering)
Researcher  doctoral student (Linguistics)
E-mail: [EMAIL PROTECTED]
WWW: http://www.ling.helsinki.fi/~aarppe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R and excell differences in calculation F distributionfunction

2007-08-07 Thread Luis Ridao Cruz
R-help,

I'm trying to work out the density function by doing:

 df(5.22245, 3, 22)
[1] 0.005896862

 df(15.20675, 6, 4)
[1] 0.001223825


In excell the result is :

0.0071060464 *   FDIST(5.22245,3,22) 

0.011406 --   FDIST(15.20675,6,4)


From my point of view the differences in the second case
are substantial.

Can anyone give me a hint on this?

Thanks in advance
 







Luis Ridao Cruz
Faroese Fisheries Laboratory
Nóatún 1, P.O. Box 3051
FR-110 Tórshavn
Faroe Islands
Tel : (+298) 353900, Tel (direct) : (+298) 353912
Mob.:(+298) 580800, Fax: : (+298) 353901
E-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] warning in library(GRASS)

2007-08-07 Thread javier garcia-pintado
Hi,
With your help I've found out that library(GRASS) is the responsible for
several warnings I'm obtaining.
I've got a loop (~100 runs) and for every loop, I'm carrying out several
operations over a sequence of raster maps (12) read from the GRASS GIS.

But for every map and loop I always use RGRASS.temp as the R-workspace
name for the imported raster files, no matter the name of the source
GRASS raster map, e.g;

RGRASS.temp -
rast.get(G,rlist=paste(A2.grassname,it,.Bmaxind.,ib,sep=),catlabels=FALSE)  


As I always overwrite  the same R object, I should hope that the number
of calls to rast.get would have no influence on a warning like Too many
open raster files, which I'm getting after the fourth loop.

Am I missinterpreting something about the way R objects are stored, and
every call to rast.get generates a new internal copy of RGRASS.temp?
If so, is there a way to solve this problem?

Thanks again and best wishes,

Javier

-- 
Javier García-Pintado
Institute of Earth Sciences Jaume Almera (CSIC)
Lluis Sole Sabaris s/n, 08028 Barcelona
Phone: +34 934095410
Fax:   +34 934110012
e-mail:[EMAIL PROTECTED] 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Classifly problems

2007-08-07 Thread Dani Valverde
Hello,
I am trying to explore a classification with GGobi. I am trying to
generate additional data according to the model so I can draw the
decision boundaries on the GGobi plot. The problem is that I always get
the same error: Error in predict.lda(model,data): wrong number of
variables, even if I know that I used the same number of variables for
the model generation (6) and for the additional data generation (6
also). I paste the code I am using:

library(MASS)
Tumor - c(rep(MM,20),rep(GBM,18),rep(LGG,17))
data.lda - lda(data,Tumor)
data.ld - predict(data.lda)
data.ldd - data.frame(data.ld$x,data.ld$class)

library(rggobi)
data.g - ggobi(data.ldd)

library(classifly)
generate_classification_data(data.lda,data,method=grid,n=10)

Could you help me?
Best regards,

Dani




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and excell differences in calculation F distributionfunction

2007-08-07 Thread Frede Aakmann Tøgersen
I think you'll have to compare FDIST to pf() and not df()



Best regards

Frede Aakmann Tøgersen
Scientist


UNIVERSITY OF AARHUS
Faculty of Agricultural Sciences
Dept. of Genetics and Biotechnology
Blichers Allé 20, P.O. BOX 50
DK-8830 Tjele

Phone:   +45 8999 1900
Direct:  +45 8999 1878

E-mail:  [EMAIL PROTECTED]
Web:   http://www.agrsci.org

This email may contain information that is confidential.
Any use or publication of this email without written permission from Faculty of 
Agricultural Sciences is not allowed.
If you are not the intended recipient, please notify Faculty of Agricultural 
Sciences immediately and delete this email.


 

 -Oprindelig meddelelse-
 Fra: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] På vegne af Luis Ridao Cruz
 Sendt: 7. august 2007 12:27
 Til: r-help@stat.math.ethz.ch
 Emne: [R] R and excell differences in calculation F 
 distributionfunction
 
 R-help,
 
 I'm trying to work out the density function by doing:
 
  df(5.22245, 3, 22)
 [1] 0.005896862
 
  df(15.20675, 6, 4)
 [1] 0.001223825
 
 
 In excell the result is :
 
 0.0071060464 *   FDIST(5.22245,3,22) 
 
 0.011406 --   FDIST(15.20675,6,4)
 
 
 From my point of view the differences in the second case
 are substantial.
 
 Can anyone give me a hint on this?
 
 Thanks in advance
  
 
 
 
 
 
 
 
 Luis Ridao Cruz
 Faroese Fisheries Laboratory
 Nóatún 1, P.O. Box 3051
 FR-110 Tórshavn
 Faroe Islands
 Tel : (+298) 353900, Tel (direct) : (+298) 353912
 Mob.:(+298) 580800, Fax: : (+298) 353901
 E-mail: [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and excell differences in calculation F distributionfu

2007-08-07 Thread Ted Harding
On 07-Aug-07 10:27:06, Luis Ridao Cruz wrote:
 R-help,
 
 I'm trying to work out the density function by doing:
 
 df(5.22245, 3, 22)
 [1] 0.005896862
 
 df(15.20675, 6, 4)
 [1] 0.001223825
 
 
 In excell the result is :
 
 0.0071060464 *   FDIST(5.22245,3,22) 
 
 0.011406 --   FDIST(15.20675,6,4)
 
 
From my point of view the differences in the second case
 are substantial.
 
 Can anyone give me a hint on this?
 
 Thanks in advance

It would seem that Excel is giving you the upper tail  of the
F-distribution (cumulative distribution, not density), i.e.
what R would calculate as

pf(5.22245,3,22,lower.tail=FALSE)
[1] 0.007106046

pf(15.20675,6,4,lower.tail=FALSE)
[1] 0.0114

so in effect using pf rather than df. If you want the
density function (corresponding to df) from Excel, then
that's not something I know how to do (nor want to know ... ).

Best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 11:47:52
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Robust Standard Errors for lme object

2007-08-07 Thread Lucy Radford
Hi

I have fitted a linear mixed model using the lme function but now wish to
calculate robust standard errors for my model.  I have been able to find
several functions which calculate robust s.e for lm objects but have not been
able to find a function which calcualtes robust s.e for lme objects.  Would
anyone know of a function that will allow me to do this.

Many Thanks

Lucy

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Functions for autoregressive Regressionmodels (Mix between times series and Regression Models) ?

2007-08-07 Thread Maja Schröter
Hello everybody,

I've a question about autoregressive Regressionmodels.

Let Y[1],.,Y[n], be a time series.

Given the model: 

Y[t] = phi[1]*Y[t-1] + phi[2]*Y[t-1] + ... + phi[p]*Y[t-p] + x_t^T*beta + u_t,


where x_t=(x[1t],x[2t],x[mt]) and beta=(beta[1],...,beta[m]) and u_t~(0,1)

I want to estimate the coefficients phi and beta.

Are in R any functions or packages for autoregressive Regressionmodel with 
special summaries?. I'm not meaning the function ar.


Example: I have the data

working.time- rnorm(100)   # Y
vacation- rnorm(100)   #x1
bank.holidays   - rnorm(100)   #x2
illnes  - rnorm(100)   #x3
education   - rnorm(100)   #x3



Now I want to analyse:

 Y[t] = phi[1]*Y[t-1] + phi[2]*Y[t-1] + ... + phi[p]Y[t-p] + beta1*vacation_t  
+beta2*bank.holidays + beta3*illnes + beta4*eductation + u_t-



Has anyone an idea? 

I would be more than glad if so.

Thank you VERY much in advance.

Kindly regards from the Eastern Part of Berlin,

Maja

--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classifly problems

2007-08-07 Thread hadley wickham
Have you tried using classifly directly?

library(classifly)
classifly(Tumor ~ ., my.data.set, lda)

generate_classification_data is an internal function, and you are
passing it the wrong arguments.

Hadley

On 8/7/07, Dani Valverde [EMAIL PROTECTED] wrote:
 Hello,
 I am trying to explore a classification with GGobi. I am trying to
 generate additional data according to the model so I can draw the
 decision boundaries on the GGobi plot. The problem is that I always get
 the same error: Error in predict.lda(model,data): wrong number of
 variables, even if I know that I used the same number of variables for
 the model generation (6) and for the additional data generation (6
 also). I paste the code I am using:

 library(MASS)
 Tumor - c(rep(MM,20),rep(GBM,18),rep(LGG,17))
 data.lda - lda(data,Tumor)
 data.ld - predict(data.lda)
 data.ldd - data.frame(data.ld$x,data.ld$class)

 library(rggobi)
 data.g - ggobi(data.ldd)

 library(classifly)
 generate_classification_data(data.lda,data,method=grid,n=10)

 Could you help me?
 Best regards,

 Dani




 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functions for autoregressive Regressionmodels (Mix between times series and Regression Models) ?

2007-08-07 Thread Gabor Grothendieck
arima (see ?arima) supports explanatory variables in the xreg= argument.

On 8/7/07, Maja Schröter [EMAIL PROTECTED] wrote:
 Hello everybody,

 I've a question about autoregressive Regressionmodels.

 Let Y[1],.,Y[n], be a time series.

 Given the model:

 Y[t] = phi[1]*Y[t-1] + phi[2]*Y[t-1] + ... + phi[p]*Y[t-p] + x_t^T*beta + u_t,


 where x_t=(x[1t],x[2t],x[mt]) and beta=(beta[1],...,beta[m]) and u_t~(0,1)

 I want to estimate the coefficients phi and beta.

 Are in R any functions or packages for autoregressive Regressionmodel with 
 special summaries?. I'm not meaning the function ar.


 Example: I have the data

 working.time- rnorm(100)   # Y
 vacation- rnorm(100)   #x1
 bank.holidays   - rnorm(100)   #x2
 illnes  - rnorm(100)   #x3
 education   - rnorm(100)   #x3



 Now I want to analyse:

  Y[t] = phi[1]*Y[t-1] + phi[2]*Y[t-1] + ... + phi[p]Y[t-p] + beta1*vacation_t 
  +beta2*bank.holidays + beta3*illnes + beta4*eductation + u_t-



 Has anyone an idea?

 I would be more than glad if so.

 Thank you VERY much in advance.

 Kindly regards from the Eastern Part of Berlin,

 Maja

 --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Number of days in each month

2007-08-07 Thread Lauri Nikkinen
Hi R-users,



What is the best way to achieve a table which contains all days and months
between years 2007-2020? I would like to calculate number of days in each
month within those years (to data frame).



Regards,

Lauri

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems saving jpg files

2007-08-07 Thread Monica Pisica

Hi,
 
I have a batch routine that does PCA on a series of files and saves the results 
as a csv file, and the respective graphs as pdf and jpg. While pdf's are fine, 
jpg files have a light grey background does not matter what color i set the bg 
param. I am running this on a PC with 4 GB RAM - if this makes any difference. 
My command is as follows:
 
dev.print(jpeg, file=filejpg, width=1024, height=768, quality = 100, bg = 
white)
 
I would appreciate if you have any explanations, and yes, i did read the help 
files this time ;-)))
 
Thanks,
 
Monica
_
[[trailing spam removed]]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function for trim blanks from a string(s)?

2007-08-07 Thread S Ellison
Stripping either or both ends can be achieved by the somewhat tortuous but 
fairly general

stripspace-function(x) sub(^\\s*([^ ]*.*[^ ])\\s*$, \\1,x)

which uses a replacement buffer to keep the useful part of the string.

 Prof Brian Ripley [EMAIL PROTECTED] 06/08/2007 21:23:49 
I am sure Marc knows that ?sub has examples of trimming trailing space and 
whitespace in various styles.


***
This email contains information which may be confidential and/or privileged, 
and is intended only for the individual(s) or organisation(s) named above. If 
you are not the intended recipient, then please note that any disclosure, 
copying, distribution or use of the contents of this email is prohibited. 
Internet communications are not 100% secure and therefore we ask that you 
acknowledge this. If you have received this email in error, please notify the 
sender or contact +44(0)20 8943 7000 or [EMAIL PROTECTED] immediately, and 
delete this email and any attachments and copies from your system. Thank you. 

LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex TW11 0LY, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of days in each month

2007-08-07 Thread Dirk Eddelbuettel
On Tue, Aug 07, 2007 at 03:53:28PM +0300, Lauri Nikkinen wrote:
 What is the best way to achieve a table which contains all days and months
 between years 2007-2020? I would like to calculate number of days in each
 month within those years (to data frame).

Here you go -- just replace length.out=3 by 24*12 for your twentyfour years:

as.POSIXlt(seq(as.Date(2007-02-01), by=month, length.out=3)-1)$mday
  [1] 31 28 31
   

A sequence of month at the first of the next month, shifted back a day
gives the last one of that month -- converted to POSIXlt from which we
extract the day-of-the-month.

Hth, Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of days in each month

2007-08-07 Thread Gabor Grothendieck
 diff(seq(as.Date(2007-01-01), as.Date(2021-01-01), by = month))
Time differences in days
  [1] 31 28 31 30 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31 31
 [26] 28 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31 31 28
 [51] 31 30 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31 31 28 31
 [76] 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31 31 28 31 30
[101] 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31
[126] 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30
[151] 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31


On 8/7/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
 Hi R-users,



 What is the best way to achieve a table which contains all days and months
 between years 2007-2020? I would like to calculate number of days in each
 month within those years (to data frame).



 Regards,

 Lauri

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of days in each month

2007-08-07 Thread Gabor Grothendieck
Just one refinement if you want it nicely formatted:

 ts(diff(seq(as.Date(2007-01-01), as.Date(2021-01-01), by = month)), 
 start = c(2007, 01), freq = 12)
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2007  31  28  31  30  31  30  31  31  30  31  30  31
2008  31  29  31  30  31  30  31  31  30  31  30  31
2009  31  28  31  30  31  30  31  31  30  31  30  31
2010  31  28  31  30  31  30  31  31  30  31  30  31
2011  31  28  31  30  31  30  31  31  30  31  30  31
2012  31  29  31  30  31  30  31  31  30  31  30  31
2013  31  28  31  30  31  30  31  31  30  31  30  31
2014  31  28  31  30  31  30  31  31  30  31  30  31
2015  31  28  31  30  31  30  31  31  30  31  30  31
2016  31  29  31  30  31  30  31  31  30  31  30  31
2017  31  28  31  30  31  30  31  31  30  31  30  31
2018  31  28  31  30  31  30  31  31  30  31  30  31
2019  31  28  31  30  31  30  31  31  30  31  30  31
2020  31  29  31  30  31  30  31  31  30  31  30  31


On 8/7/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  diff(seq(as.Date(2007-01-01), as.Date(2021-01-01), by = month))
 Time differences in days
  [1] 31 28 31 30 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31 
 31
  [26] 28 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31 31 
 28
  [51] 31 30 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31 31 28 
 31
  [76] 30 31 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31 31 28 31 
 30
 [101] 31 30 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31 31 28 31 30 
 31
 [126] 30 31 31 30 31 30 31 31 28 31 30 31 30 31 31 30 31 30 31 31 28 31 30 31 
 30
 [151] 31 31 30 31 30 31 31 29 31 30 31 30 31 31 30 31 30 31


 On 8/7/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
  Hi R-users,
 
 
 
  What is the best way to achieve a table which contains all days and months
  between years 2007-2020? I would like to calculate number of days in each
  month within those years (to data frame).
 
 
 
  Regards,
 
  Lauri
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of days in each month

2007-08-07 Thread Chuck Cleland
Lauri Nikkinen wrote:
 Hi R-users,
 
 What is the best way to achieve a table which contains all days and months
 between years 2007-2020? I would like to calculate number of days in each
 month within those years (to data frame).
 
 Regards,
 
 Lauri

  How about this?

X - seq(ISOdate(2007,1,1), ISOdate(2020,12,31), by=60*60*24)

mytab - table(substring(X, 3, 4), substring(X, 6, 7))

mytab

 01 02 03 04 05 06 07 08 09 10 11 12
  07 31 28 31 30 31 30 31 31 30 31 30 31
  08 31 29 31 30 31 30 31 31 30 31 30 31
  09 31 28 31 30 31 30 31 31 30 31 30 31
  10 31 28 31 30 31 30 31 31 30 31 30 31
  11 31 28 31 30 31 30 31 31 30 31 30 31
  12 31 29 31 30 31 30 31 31 30 31 30 31
  13 31 28 31 30 31 30 31 31 30 31 30 31
  14 31 28 31 30 31 30 31 31 30 31 30 31
  15 31 28 31 30 31 30 31 31 30 31 30 31
  16 31 29 31 30 31 30 31 31 30 31 30 31
  17 31 28 31 30 31 30 31 31 30 31 30 31
  18 31 28 31 30 31 30 31 31 30 31 30 31
  19 31 28 31 30 31 30 31 31 30 31 30 31
  20 31 29 31 30 31 30 31 31 30 31 30 31

head(as.data.frame(mytab))

  Var1 Var2 Freq
1   07   01   31
2   08   01   31
3   09   01   31
4   10   01   31
5   11   01   31
6   12   01   31

   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code. 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in as.double.default(x) : (list) object cannot be coerced to 'double'

2007-08-07 Thread Urmi Trivedi
Dear experts,

I have in all 14 matrices which stands for gene expression divergence and 14 
matrices which stands for gene sequence divergence. I have tried joining them 
by using the concatanation function giving

SequenceDivergence - c(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,X13,X14)
ExpressionDivergence - c(Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8,Y9,Y10,Y11,Y12,Y13,Y14) 

where X1,X2..X14 are the expression matrices containing r-values and 
Y1,Y2..Y14 are the ones with patristic distances

Now, I want to plot SequenceDivergence vs. Expression Divergence

Tried doing that using plot (Sequence Divergence vs. Expression Divergence)

But then getting the error 

 Error in as.double.default(x) : (list) object cannot be coerced to 'double'

Note: The diagonal values for X1 was neglected as its seems to give some bias 
in my results. Code to derive X1,X2,X3..and Y1,Y2,Y3 is given as:

 EDant - read.table(C:/ant.txt,sep=\t)
 ED1 - as.matrix(EDant)
 X1 - lapply(1:ncol(ED15), function(a) ED15[-a, a]) # to neglect diagonal 
 values 

I am attaching couple of matrices for your reference. ANT.txt is X1 after 
deleting the diagnol values, similarly,ANTEXP.TXT is Y1,  apetella.txt is X2 
and 
apetellaexp.txt is Y2

I tried reading R-manual and other sources but have'nt cracked it yet. Can 
anyone please help me regarding that?
 
 Thanks very much
 
 Yours sincerely,
 
 Urmi Trivedi. 


   
-
 Once upon a time there was 1 GB storage in your inbox. Click here for happy 
ending.0   1.21133 1.4202211.4993581.4751491.513886
1.6088161.6759161.8590381.9929582.420123
2.3094142.413222
1.21133 0   1.4635851.5427221.5185131.55725 1.65218 
1.71928 1.9024022.0363222.4634872.3527782.456586
1.4202211.4635850   1.4186191.39441 1.433147
1.5280771.5951771.7782991.9122192.339384
2.2286752.332483
1.4993581.5427221.4186190   0.931235
0.9699721.3544981.4215981.60472 1.73864 2.165805
2.0550962.158904
1.4751491.5185131.39441 0.9312350   0.206013
1.3302891.3973891.5805111.7144312.141596
2.0308872.134695
1.5138861.55725 1.4331470.9699720.2060130   
1.3690261.4361261.6192481.7531682.180333
2.0696242.173432
1.6088161.65218 1.5280771.3544981.330289
1.3690260   0.6049721.5376961.671616
2.0987811.9880722.09188
1.6759161.71928 1.5951771.4215981.397389
1.4361260.6049720   1.6047961.738716
2.1658812.0551722.15898
1.8590381.9024021.7782991.60472 1.580511
1.6192481.5376961.6047960   1.09121 1.968427
1.8577181.961526
1.9929582.0363221.9122191.73864 1.714431
1.7531681.6716161.7387161.09121 0   2.102347
1.9916382.095446
2.4201232.4634872.3393842.1658052.141596
2.1803332.0987812.1658811.9684272.102347
0   1.1287011.232509
2.3094142.3527782.2286752.0550962.030887
2.0696241.9880722.0551721.8577181.991638
1.1287010   0.91546
2.4132222.4565862.3324832.1589042.134695
2.1734322.09188 2.15898 1.9615262.0954461.232509
0.91546 0
1   -0.046373   -0.033044   0.0039020.466827
0.3893270.131989-0.063346   -0.118093   0.016386
-0.140215   0.34511 0.233074
-0.046373   1   0.0007170.8322270.077053
-0.106815   -0.050838   0.9364310.2973870.031358
-0.127265   0.31245 -0.053169
-0.033044   0.0007171   0.011265-0.018602   
-0.023669   -0.028744   -0.007588   0.148827-0.007067   
0.005362-0.029172   -0.017367
0.0039020.8322270.0112651   0.071247
-0.101225   0.0971170.8291670.2943960.017809
-0.129199   0.278831-0.106427
0.4668270.077053-0.018602   0.0712471   
0.0782410.1406790.0700690.0557440.013194
-0.196092   -0.027692   0.028857
0.389327-0.106815   -0.023669   -0.101225   0.078241
1   0.140146-0.086806   0.079   0.194904

[R] logistic models validation for categorical data

2007-08-07 Thread Tom Willems
Dear fellow R-ussers,
Thanks for your help on th AIC.
I have yet an other question about logistic models. to buuild the model i 
used sevral packages, (MASS, nlme, binom), just to make sure i had all the 
tools at my disposal. now i would like to validat the model. ussing a 
cross validation tool.
In a normal logistical case you get an error table giving the number of 
cases classified under  0/1  by the model, and a comparison with the real 
classification of the test sample.

table 1
glm 0   1
FALSE  8 14
TRUE 0   8

Apperently this is not the case when you have categorical data. Then you 
get an output for every interval of your predictor variable , and this is 
compared to the frequency of this variable, so it is very hard to 
interpred.

table 2
glm 0 2 3 4 7
FALSE  1 0 0 1 1
TRUE 1 2 1 0 0
One of the problems is, not only that i can't interpret table 2, but also 
that it does not give reference to the number of test samples, wich is 
clear in table 1 ( 30 test samples).
Kind regards,
Tom.








 
Tom Willems
CODA-CERVA-VAR
Department of Virology
Epizootic Diseases Section
Groeselenberg 99
B-1180 Ukkel

Tel.: +32 2 3790522
Fax: +32 2 3790666
E-mail: [EMAIL PROTECTED]

www.var.fgov.be 



Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust Standard Errors for lme object

2007-08-07 Thread Doran, Harold
Lucy:

Why are you interested in robust standard errors from lme? Typically,
robust standard errors are sought when there is model misspecification
due to ignoring some covariance among units with a group.

But, a mixed model is designed to directly account for covariances among
units within a group such that the standard errors more adequately
represent the true sampling variance of the parameters.

So, the lme standard errors are robust in a sense that they are
presumably correct if you have your model correctly specified. 

It would do better service to explain your issue a little more to get
better help.

Harold


 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Lucy Radford
 Sent: Tuesday, August 07, 2007 8:17 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Robust Standard Errors for lme object
 
 Hi
 
 I have fitted a linear mixed model using the lme function but 
 now wish to calculate robust standard errors for my model.  I 
 have been able to find several functions which calculate 
 robust s.e for lm objects but have not been able to find a 
 function which calcualtes robust s.e for lme objects.  Would 
 anyone know of a function that will allow me to do this.
 
 Many Thanks
 
 Lucy
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functions for autoregressive Regressionmodels (Mix between times series and Regression Models) ?

2007-08-07 Thread Achim Zeileis
On Tue, 7 Aug 2007, Gabor Grothendieck wrote:

 arima (see ?arima) supports explanatory variables in the xreg= argument.

In addition to arima(), the function dynlm() in package dynlm might be
useful if you want to fit the model by OLS. See also package dyn for a
similar approach.
Z

 On 8/7/07, Maja Schröter [EMAIL PROTECTED] wrote:
  Hello everybody,
 
  I've a question about autoregressive Regressionmodels.
 
  Let Y[1],.,Y[n], be a time series.
 
  Given the model:
 
  Y[t] = phi[1]*Y[t-1] + phi[2]*Y[t-1] + ... + phi[p]*Y[t-p] + x_t^T*beta + 
  u_t,
 
 
  where x_t=(x[1t],x[2t],x[mt]) and beta=(beta[1],...,beta[m]) and 
  u_t~(0,1)
 
  I want to estimate the coefficients phi and beta.
 
  Are in R any functions or packages for autoregressive Regressionmodel 
  with special summaries?. I'm not meaning the function ar.
 
 
  Example: I have the data
 
  working.time- rnorm(100)   # Y
  vacation- rnorm(100)   #x1
  bank.holidays   - rnorm(100)   #x2
  illnes  - rnorm(100)   #x3
  education   - rnorm(100)   #x3
 
 
 
  Now I want to analyse:
 
   Y[t] = phi[1]*Y[t-1] + phi[2]*Y[t-1] + ... + phi[p]Y[t-p] + 
  beta1*vacation_t  +beta2*bank.holidays + beta3*illnes + beta4*eductation + 
  u_t-
 
 
 
  Has anyone an idea?
 
  I would be more than glad if so.
 
  Thank you VERY much in advance.
 
  Kindly regards from the Eastern Part of Berlin,
 
  Maja
 
  --
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Phi parameter in corAR(1) in NLMEs

2007-08-07 Thread Nelson, Kerrie P.
Hello,
 
I have been fitting nonlinear models with random effects using nlme with the
corCAR1 correlation structure, as I have unequally spaced observations per
subject, and serially correlated errors.  My question is regarding the Phi
parameter.  
 
Example of output:
Correlation Structure: Continuous AR(1)
 Formula: ~day | subject 
 Parameter estimate(s):
  Phi 
0.8475842 
 
After reading the Pinheiro and Bates (2000), and Jones (1993) texts I am
uncertain whether the estimate of Phi given in the output is:
 
- the correlation between two observations one time unit apart, i.e. phi(t2-t1)
= phi(1) = 0.8475842
 
- or is the parameter in the exponential term, exp(-phi(t2-t1)), i.e the
correlation between two observations one unit apart is exp(-0.8475842*(t2-t1)) =
exp(-0.8475842*1) = 0.4284
 
- or if there is another interpretation altogether
 
Thank-you very much for any clarification on this, I appreciate your help!
 
Kerrie Nelson,
Biostatistics Unit,
MGH
 


The information transmitted in this electronic communication...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Catch errors

2007-08-07 Thread Gang Chen
Seth and Stephen,

Thanks a lot for the explanation and help! I really appreciate it.

Gang


On Aug 6, 2007, at 6:41 PM, Stephen Tucker wrote:
 That's because tag - 1 is evaluated in a local environment (of the  
 function)
 - once function(err) exits, the information is lost. Try R's -  
 operator:

 tag - 0
 tryCatch(fit.lme - lme(Beta ~ Trust*Sex*Freq, random = ~1|Subj,
  Model), error=function(err) tag - 1)


On Aug 6, 2007, at 6:55 PM, Seth Falcon wrote:

 Gang Chen [EMAIL PROTECTED] writes:

 I wanted something like this:

 tag - 0;
 tryCatch(fit.lme - lme(Beta ~ Trust*Sex*Freq, random = ~1|Subj,
 Model), error=function(err) tag - 1);

 but it seems not working because 'tag' does not get value of 1 when
 error occurs. How can I make it work?

 You can use '-' to assign to the parent frame like this:

 tag - 0
 tryCatch({
 print(inside tryCatch)
 print(paste(tag, tag))
 stop(forcing an error)
 }, error=function(e) {
 print(caught an error)
 tag - 1
 })

 But you can also use the return value of tryCatch as a return code if
 that is what tag is.  Like this:

 fail - TRUE
 tag - tryCatch({
 print(inside tryCatch)
 if (fail)
   stop(forcing an error)
 0
 }, error=function(e) {
 print(caught an error)
 1
 })
 tag

 + seth

 -- 
 Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
 Research Center
 BioC: http://bioconductor.org/
 Blog: http://userprimary.net/user/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Contrast testing with lme

2007-08-07 Thread Gang Chen
I'm trying to run a contrast with lme, but could not get it work:

  anova(lme(Beta ~ Trust*Sex*Freq, random = ~1|Subj, Model), L=c 
(TrustT:Sex:Freq=1, TrustU:Sex:Freq=-1))
Error in anova.lme(lme(Beta ~ Trust * Sex * Freq, random = ~1 | Subj,  :
 Effects TrustT:Sex:Freq, TrustU:Sex:Freq not matched

What is missing? Here is more information:

  summary(lme(Beta ~ Trust*Sex*Freq, random = ~1|Subj, Model))
Linear mixed-effects model fit by REML
Data: Model
 AIC   BIC   logLik
   -825.4663 -791.4348 426.7331

Random effects:
Formula: ~1 | Subj
 (Intercept)Residual
StdDev: 0.001144573 0.001167392

Fixed effects: Beta ~ Trust * Sex * Freq
ValueStd.Error DFt-value p-value
(Intercept) 0.0001090007 0.0005780194 77  0.1885762  0.8509
TrustU  0.0014426378 0.0005836959 77  2.4715572  0.0157
SexM0.0008230359 0.0005836959 77  1.4100423  0.1626
FreqLo  0.0001998191 0.0005836959 77  0.3423343  0.7330
FreqNo  0.0004900107 0.0005836959 77  0.8394965  0.4038
TrustU:SexM-0.0012598266 0.0008254707 77 -1.5261918  0.1311
TrustU:FreqLo  -0.0012383346 0.0008254707 77 -1.5001558  0.1377
TrustU:FreqNo  -0.0009141543 0.0008254707 77 -1.1074341  0.2716
SexM:FreqLo-0.0008469211 0.0008254707 77 -1.0259857  0.3081
SexM:FreqNo 0.0006361012 0.0008254707 77  0.7705922  0.4433
TrustU:SexM:FreqLo  0.0013272173 0.0011673918 77  1.1369082  0.2591
TrustU:SexM:FreqNo  0.0006241524 0.0011673918 77  0.5346555  0.5944
...

  anova(lme(Beta ~ Trust*Sex*Freq, random = ~1|Subj, Model))
numDF denDF  F-value p-value
(Intercept)177 4.813938  0.0312
Trust  177 3.113293  0.0816
Sex177 3.535774  0.0638
Freq   277 6.083832  0.0035
Trust:Sex  177 1.634858  0.2049
Trust:Freq 277 0.678558  0.5104
Sex:Freq   277 2.165059  0.1217
Trust:Sex:Freq 277 0.647042  0.5264

Thanks,
Gang

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interaction factor and numeric variable versus separate regressions

2007-08-07 Thread Sven Garbade
Dear list members,

I have problems to interpret the coefficients from a lm model involving
the interaction of a numeric and factor variable compared to separate lm
models for each level of the factor variable.

## data:
y1 - rnorm(20) + 6.8
y2 - rnorm(20) + (1:20*1.7 + 1)
y3 - rnorm(20) + (1:20*6.7 + 3.7)
y - c(y1,y2,y3)
x - rep(1:20,3)
f - gl(3,20, labels=paste(lev, 1:3, sep=)) 
d - data.frame(x=x,y=y, f=f)

## plot
# xyplot(y~x|f)

## lm model with interaction
summary(lm(y~x:f, data=d))

Call:
lm(formula = y ~ x:f, data = d)

Residuals:
Min  1Q  Median  3Q Max 
-2.8109 -0.8302  0.2542  0.6737  3.5383 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  3.687990.41045   8.985 1.91e-12 ***
x:flev1  0.208850.04145   5.039 5.21e-06 ***
x:flev2  1.496700.04145  36.109   2e-16 ***
x:flev3  6.708150.04145 161.838   2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 1.53 on 56 degrees of freedom
Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984 
F-statistic: 1.191e+04 on 3 and 56 DF,  p-value:  2.2e-16 

## separate lm fits
lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
$lev1
(Intercept)   x 
 6.77022860 -0.01667528 

$lev2
(Intercept)   x 
   1.0190781.691982 

$lev3
(Intercept)   x 
   3.2746566.738396 


Can anybody give me a hint why the coefficients for the slopes
(especially for lev1) are so different and how the coefficients from the
lm model with interaction are related to the separate fits?

Thanks, Sven

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust Standard Errors for lme object

2007-08-07 Thread Thomas Lumley
On Tue, 7 Aug 2007, Doran, Harold wrote:

 Lucy:

 Why are you interested in robust standard errors from lme? Typically,
 robust standard errors are sought when there is model misspecification
 due to ignoring some covariance among units with a group.

 But, a mixed model is designed to directly account for covariances among
 units within a group such that the standard errors more adequately
 represent the true sampling variance of the parameters.


I think it's a perfectly reasonable thing to want, but it is not easy to 
provide because of the generality of lme().  For example, models with crossed 
effects need special handling, and possibly so do time-series or spatial 
covariance models with large numbers of observations per group.

I imagine that misspecification of the variance, rather than the correlation, 
would be the main concern, just as it is with independent observations. Of 
course the model-robust variances would only be useful if the sample size of 
top-level units was large enough, and if the variance components were not of 
any direct interest.


 So, the lme standard errors are robust in a sense that they are
 presumably correct if you have your model correctly specified.

To paraphrase the Hitchikers' Guide: This must be some definition of the word 
'robust' that I was not previously aware of. :)


  -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Positioning text in top left corner of plot

2007-08-07 Thread Daniel Brewer
Simple question how can you position text in the top left hand corner of
a plot?  I am plotting multiple plots using par(mfrow=c(2,3)) and all I
want to do is label these plots a), b), c) etc.  I have been fiddling
around with both text and mtext but without much luck.  text is fine but
 each plot has a different scale on the axis and so this makes it
problematic.  What is the best way to do this?

Many thanks

Dan
-- 
**
Daniel Brewer, Ph.D.

Institute of Cancer Research
Email: [EMAIL PROTECTED]
**

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addre...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] saving output

2007-08-07 Thread Lynn Disney
I have a question about how to save the output of a logistic regression
model in some format (delimited, preferably) so that I can manipulate it
to make nice tables in excel. I have tried print, save, write, none seem
to do what I want them to. Does anyone have any ideas?

 

Lynn D. Disney, Ph.D., J.D., M.P.H.

Research Analyst

SEIU 1199

1771 E. 30th Street

Cleveland, Ohio 44114

Telephone: (216) 566-0117

Email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Positioning text in top left corner of plot

2007-08-07 Thread Greg Snow


You can call par('usr') to find the current coordinates, then use the
text function (the pos and offset arguments may be helpful as well)

The cnvrt.coords function from the TeachingDemos package may also be of
help for finding coordinates for the text function.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Daniel Brewer
 Sent: Tuesday, August 07, 2007 9:09 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Positioning text in top left corner of plot
 
 Simple question how can you position text in the top left 
 hand corner of a plot?  I am plotting multiple plots using 
 par(mfrow=c(2,3)) and all I want to do is label these plots 
 a), b), c) etc.  I have been fiddling around with both text 
 and mtext but without much luck.  text is fine but  each plot 
 has a different scale on the axis and so this makes it 
 problematic.  What is the best way to do this?
 
 Many thanks
 
 Dan
 --
 **
 Daniel Brewer, Ph.D.
 
 Institute of Cancer Research
 Email: [EMAIL PROTECTED]
 **
 
 The Institute of Cancer Research: Royal Cancer Hospital, a 
 charitable Company Limited by Guarantee, Registered in 
 England under Company No. 534147 with its Registered Office 
 at 123 Old Brompton Road, London SW7 3RP.
 
 This e-mail message is confidential and for use by the\  ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interaction factor and numeric variable versus separate regressions

2007-08-07 Thread Gabor Grothendieck
In the single model all three levels share the same intercept which
means that the slope must change to accomodate it
whereas in the three separate models they each have their own
intercept.

Try looking at it graphically and note how the black dotted lines
are all forced to go through the same intercept, i.e. the same point
on the y axis, whereas the red dashed lines are each able to
fit their portion of the data using both the intercept and the slope.

y.lm - lm(y~x:f, data=d)
plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
for(i in 1:3) {
abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = dotted)
abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = red, lty = dashed)
}
grid()


On 8/7/07, Sven Garbade [EMAIL PROTECTED] wrote:
 Dear list members,

 I have problems to interpret the coefficients from a lm model involving
 the interaction of a numeric and factor variable compared to separate lm
 models for each level of the factor variable.

 ## data:
 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - rep(1:20,3)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(x=x,y=y, f=f)

 ## plot
 # xyplot(y~x|f)

 ## lm model with interaction
 summary(lm(y~x:f, data=d))

 Call:
 lm(formula = y ~ x:f, data = d)

 Residuals:
Min  1Q  Median  3Q Max
 -2.8109 -0.8302  0.2542  0.6737  3.5383

 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)  3.687990.41045   8.985 1.91e-12 ***
 x:flev1  0.208850.04145   5.039 5.21e-06 ***
 x:flev2  1.496700.04145  36.109   2e-16 ***
 x:flev3  6.708150.04145 161.838   2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Residual standard error: 1.53 on 56 degrees of freedom
 Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984
 F-statistic: 1.191e+04 on 3 and 56 DF,  p-value:  2.2e-16

 ## separate lm fits
 lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
 $lev1
 (Intercept)   x
  6.77022860 -0.01667528

 $lev2
 (Intercept)   x
   1.0190781.691982

 $lev3
 (Intercept)   x
   3.2746566.738396


 Can anybody give me a hint why the coefficients for the slopes
 (especially for lev1) are so different and how the coefficients from the
 lm model with interaction are related to the separate fits?

 Thanks, Sven

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saving output

2007-08-07 Thread Kyle.
It's not so straightforward with the manipulation, but if you're familiar
with LaTeX, I'd recommend the xtable package.  It makes very readable
regression summaries.



Kyle H. Ambert
Department of Behavioral Neuroscience
Oregon Health  Science University



On 8/7/07, Lynn Disney [EMAIL PROTECTED] wrote:

 I have a question about how to save the output of a logistic regression
 model in some format (delimited, preferably) so that I can manipulate it
 to make nice tables in excel. I have tried print, save, write, none seem
 to do what I want them to. Does anyone have any ideas?



 Lynn D. Disney, Ph.D., J.D., M.P.H.

 Research Analyst

 SEIU 1199

 1771 E. 30th Street

 Cleveland, Ohio 44114

 Telephone: (216) 566-0117

 Email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] array

2007-08-07 Thread Tiandao Li
Hello,

I have some files generated from microarray experiments. I used scan() to
read the files, and assigned each file to a unique name with many rows and
columns. Now I want to create a array (ArrayA) with unique names, and I can
use ArrayA[1,2][[6]] to refer the data in each file. Is there any packages
available for array of array?

Thanks!

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saving output

2007-08-07 Thread Henrique Dallazuanna
See ?sink

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 07/08/07, Lynn Disney [EMAIL PROTECTED] wrote:

 I have a question about how to save the output of a logistic regression
 model in some format (delimited, preferably) so that I can manipulate it
 to make nice tables in excel. I have tried print, save, write, none seem
 to do what I want them to. Does anyone have any ideas?



 Lynn D. Disney, Ph.D., J.D., M.P.H.

 Research Analyst

 SEIU 1199

 1771 E. 30th Street

 Cleveland, Ohio 44114

 Telephone: (216) 566-0117

 Email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Positioning text in top left corner of plot

2007-08-07 Thread Benilton Carvalho
maybe this is what you want?

plot(rnorm(10))
legend(topleft, A), bty=n)

?

b

On Aug 7, 2007, at 11:08 AM, Daniel Brewer wrote:

 Simple question how can you position text in the top left hand  
 corner of
 a plot?  I am plotting multiple plots using par(mfrow=c(2,3)) and  
 all I
 want to do is label these plots a), b), c) etc.  I have been fiddling
 around with both text and mtext but without much luck.  text is  
 fine but
  each plot has a different scale on the axis and so this makes it
 problematic.  What is the best way to do this?

 Many thanks

 Dan
 -- 
 **
 Daniel Brewer, Ph.D.

 Institute of Cancer Research
 Email: [EMAIL PROTECTED]
 **

 The Institute of Cancer Research: Royal Cancer Hospital, a  
 charitable Company Limited by Guarantee, Registered in England  
 under Company No. 534147 with its Registered Office at 123 Old  
 Brompton Road, London SW7 3RP.

 This e-mail message is confidential and for use by the add...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Robust Standard Errors for lme object

2007-08-07 Thread Doran, Harold
 -Original Message-
 From: Thomas Lumley [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, August 07, 2007 11:06 AM
 To: Doran, Harold
 Cc: Lucy Radford; r-help@stat.math.ethz.ch
 Subject: Re: [R] Robust Standard Errors for lme object
 
 On Tue, 7 Aug 2007, Doran, Harold wrote:
 
  Lucy:
 
  Why are you interested in robust standard errors from lme? 
 Typically, 
  robust standard errors are sought when there is model 
 misspecification 
  due to ignoring some covariance among units with a group.
 
  But, a mixed model is designed to directly account for covariances 
  among units within a group such that the standard errors more 
  adequately represent the true sampling variance of the parameters.
 
 
 I think it's a perfectly reasonable thing to want, but it is 
 not easy to provide because of the generality of lme().  For 
 example, models with crossed effects need special handling, 
 and possibly so do time-series or spatial covariance models 
 with large numbers of observations per group.

I still don't understand why we might want them. If there were some
correlation among units within a group, then we would not have indep.
observations. That is, we have less information from a cluster sample
than from a simple random sample because each individual does not
provide unique information. Hence, if one were to use an OLS regression
when there is clustering, then the likelihood function is misspecified.
Now the MLEs from this regression may still retain some pragmatic
interest (they *may* still be consistent), but the sampling variances of
the fixed effects would be incorrect (most likely too small). So, one
may obtain robust standard errors to account for model misspecification
in this scenario.

On the other hand, if the researcher knows that there may be a violation
of independence because of clustering, then one may be motivated to rely
on a mixed model in which case the likelihood is not misspecified since
the posterior views the observations as conditionally ind given the
population distribution. Therefore, the sampling variances of the fixed
effects are correct. Why get robust standard errors when the likelihood
function is correctly specified?

To make this concrete, assume it were an educational testing example
where each student is tested and students are grouped into classrooms.
Hence we have N total students clustered within K groups. Instruction
happens at the group level and students within a classroom are
presumably not providing independent information.

Now, if I were to do an OLS regression with test scores as the outcome
and some variable x as the ind. variable, this model would be
misspecified given the clustering and the standard errors would not be
correct. But, if I were to use lmer and accounted for this clustering,
then the standard errors of the variable x would be accurate. Why would
I go and get robust standard errors in this case?

Another way to put it is this. The OLS standard errors are

Se = (X'V^{-1}X){-1}

Where V is a diagonal matrix and all elements along the diagonal are
equal. This same equation is used for the standard errors in a mixed
model, but V is now a block-diagonal matrix where the off-diagonals are
the covariances among individuals within the same group. Doug Bates will
kill me for using GLS notation since it doesn't jive with lmer.


 
 I imagine that misspecification of the variance, rather than 
 the correlation, would be the main concern, just as it is 
 with independent observations. Of course the model-robust 
 variances would only be useful if the sample size of 
 top-level units was large enough, and if the variance 
 components were not of any direct interest.
 
 
  So, the lme standard errors are robust in a sense that they are 
  presumably correct if you have your model correctly specified.
 
 To paraphrase the Hitchikers' Guide: This must be some 
 definition of the word 'robust' that I was not previously aware of. :)
 
 
   -thomas
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interaction factor and numeric variable versus separate

2007-08-07 Thread Ted Harding
On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
 In the single model all three levels share the same intercept which
 means that the slope must change to accomodate it
 whereas in the three separate models they each have their own
 intercept.

I think this arose because of the formulation of the model with
interaction as:

  summary(lm(y~x:f, data=d))

If it has been formulated as

  summary(lm(y~x*f, data=d))

there would be three separate intercepts, and three different slopes
(and the differences would be the same as the differences for the
separate models).

Ted.

 Try looking at it graphically and note how the black dotted lines
 are all forced to go through the same intercept, i.e. the same point
 on the y axis, whereas the red dashed lines are each able to
 fit their portion of the data using both the intercept and the slope.
 
 y.lm - lm(y~x:f, data=d)
 plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
 for(i in 1:3) {
   abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = dotted)
   abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = red, lty =
 dashed)
 }
 grid()
 
 
 On 8/7/07, Sven Garbade [EMAIL PROTECTED] wrote:
 Dear list members,

 I have problems to interpret the coefficients from a lm model
 involving
 the interaction of a numeric and factor variable compared to separate
 lm
 models for each level of the factor variable.

 ## data:
 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - rep(1:20,3)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(x=x,y=y, f=f)

 ## plot
 # xyplot(y~x|f)

 ## lm model with interaction
 summary(lm(y~x:f, data=d))

 Call:
 lm(formula = y ~ x:f, data = d)

 Residuals:
Min  1Q  Median  3Q Max
 -2.8109 -0.8302  0.2542  0.6737  3.5383

 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)  3.687990.41045   8.985 1.91e-12 ***
 x:flev1  0.208850.04145   5.039 5.21e-06 ***
 x:flev2  1.496700.04145  36.109   2e-16 ***
 x:flev3  6.708150.04145 161.838   2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Residual standard error: 1.53 on 56 degrees of freedom
 Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984
 F-statistic: 1.191e+04 on 3 and 56 DF,  p-value:  2.2e-16

 ## separate lm fits
 lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
 $lev1
 (Intercept)   x
  6.77022860 -0.01667528

 $lev2
 (Intercept)   x
   1.0190781.691982

 $lev3
 (Intercept)   x
   3.2746566.738396


 Can anybody give me a hint why the coefficients for the slopes
 (especially for lev1) are so different and how the coefficients from
 the
 lm model with interaction are related to the separate fits?

 Thanks, Sven

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 17:01:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interaction factor and numeric variable versus separate

2007-08-07 Thread Gabor Grothendieck
Also check this post

https://stat.ethz.ch/pipermail/r-help/2007-May/132866.html

for a number of formulations.

On 8/7/07, Ted Harding [EMAIL PROTECTED] wrote:
 On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
  In the single model all three levels share the same intercept which
  means that the slope must change to accomodate it
  whereas in the three separate models they each have their own
  intercept.

 I think this arose because of the formulation of the model with
 interaction as:

  summary(lm(y~x:f, data=d))

 If it has been formulated as

  summary(lm(y~x*f, data=d))

 there would be three separate intercepts, and three different slopes
 (and the differences would be the same as the differences for the
 separate models).

 Ted.

  Try looking at it graphically and note how the black dotted lines
  are all forced to go through the same intercept, i.e. the same point
  on the y axis, whereas the red dashed lines are each able to
  fit their portion of the data using both the intercept and the slope.
 
  y.lm - lm(y~x:f, data=d)
  plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
  for(i in 1:3) {
abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = dotted)
abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = red, lty =
  dashed)
  }
  grid()
 
 
  On 8/7/07, Sven Garbade [EMAIL PROTECTED] wrote:
  Dear list members,
 
  I have problems to interpret the coefficients from a lm model
  involving
  the interaction of a numeric and factor variable compared to separate
  lm
  models for each level of the factor variable.
 
  ## data:
  y1 - rnorm(20) + 6.8
  y2 - rnorm(20) + (1:20*1.7 + 1)
  y3 - rnorm(20) + (1:20*6.7 + 3.7)
  y - c(y1,y2,y3)
  x - rep(1:20,3)
  f - gl(3,20, labels=paste(lev, 1:3, sep=))
  d - data.frame(x=x,y=y, f=f)
 
  ## plot
  # xyplot(y~x|f)
 
  ## lm model with interaction
  summary(lm(y~x:f, data=d))
 
  Call:
  lm(formula = y ~ x:f, data = d)
 
  Residuals:
 Min  1Q  Median  3Q Max
  -2.8109 -0.8302  0.2542  0.6737  3.5383
 
  Coefficients:
 Estimate Std. Error t value Pr(|t|)
  (Intercept)  3.687990.41045   8.985 1.91e-12 ***
  x:flev1  0.208850.04145   5.039 5.21e-06 ***
  x:flev2  1.496700.04145  36.109   2e-16 ***
  x:flev3  6.708150.04145 161.838   2e-16 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
  Residual standard error: 1.53 on 56 degrees of freedom
  Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984
  F-statistic: 1.191e+04 on 3 and 56 DF,  p-value:  2.2e-16
 
  ## separate lm fits
  lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
  $lev1
  (Intercept)   x
   6.77022860 -0.01667528
 
  $lev2
  (Intercept)   x
1.0190781.691982
 
  $lev3
  (Intercept)   x
3.2746566.738396
 
 
  Can anybody give me a hint why the coefficients for the slopes
  (especially for lev1) are so different and how the coefficients from
  the
  lm model with interaction are related to the separate fits?
 
  Thanks, Sven
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 094 0861
 Date: 07-Aug-07   Time: 17:01:33
 -- XFMail --


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interaction factor and numeric variable versus separate regressions

2007-08-07 Thread Prof Brian Ripley
These are not the same model.  You want x*f, and then you will find
the differences in intercepts and slopes from group 1 as the coefficients.

Remember too that the combined model pools error variances and the 
separate model has separate error variance for each group.

To understand model formulae, study Bill Venables' exposition in chapter 6 
of MASS.

On Tue, 7 Aug 2007, Sven Garbade wrote:

 Dear list members,

 I have problems to interpret the coefficients from a lm model involving
 the interaction of a numeric and factor variable compared to separate lm
 models for each level of the factor variable.

 ## data:
 y1 - rnorm(20) + 6.8
 y2 - rnorm(20) + (1:20*1.7 + 1)
 y3 - rnorm(20) + (1:20*6.7 + 3.7)
 y - c(y1,y2,y3)
 x - rep(1:20,3)
 f - gl(3,20, labels=paste(lev, 1:3, sep=))
 d - data.frame(x=x,y=y, f=f)

 ## plot
 # xyplot(y~x|f)

 ## lm model with interaction
 summary(lm(y~x:f, data=d))

 Call:
 lm(formula = y ~ x:f, data = d)

 Residuals:
Min  1Q  Median  3Q Max
 -2.8109 -0.8302  0.2542  0.6737  3.5383

 Coefficients:
Estimate Std. Error t value Pr(|t|)
 (Intercept)  3.687990.41045   8.985 1.91e-12 ***
 x:flev1  0.208850.04145   5.039 5.21e-06 ***
 x:flev2  1.496700.04145  36.109   2e-16 ***
 x:flev3  6.708150.04145 161.838   2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 Residual standard error: 1.53 on 56 degrees of freedom
 Multiple R-Squared: 0.9984,   Adjusted R-squared: 0.9984
 F-statistic: 1.191e+04 on 3 and 56 DF,  p-value:  2.2e-16

 ## separate lm fits
 lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
 $lev1
 (Intercept)   x
 6.77022860 -0.01667528

 $lev2
 (Intercept)   x
   1.0190781.691982

 $lev3
 (Intercept)   x
   3.2746566.738396


 Can anybody give me a hint why the coefficients for the slopes
 (especially for lev1) are so different and how the coefficients from the
 lm model with interaction are related to the separate fits?

 Thanks, Sven

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Naming Lists

2007-08-07 Thread Charles C. Berry
On Tue, 7 Aug 2007, Tom.O wrote:


 Hi
 Im pretty new to R and I have run in to a problem. How do I name all the
 levels in this list.

One way:

MyList -
   mapply(function(x) mapply(
 function(y) mapply( function(z)
paste(unlist(x),unlist(y),unlist(z)),
Lev3, SIMPLIFY=FALSE  ),
 Lev2, SIMPLIFY=FALSE   ),
  Lev1, SIMPLIFY=FALSE )


However, for many purposes you might better use an array:

MyArray.dimnames - list(Lev1,Lev2,Lev3)
combos -  rev( expand.grid (Lev3,Lev2,Lev1) )
elements - do.call( paste, combos )
MyArray - array( elements, dim=sapply(MyArray.dimnames,length), 
dimnames=MyArray.dimnames )
MyArray['A1','B1','C2']




 Lev1 - c(A1,A2)
 Lev2 - c(B1,B2)
 Lev3 - c(C1,C2)

 MyList - lapply(Lev1,function(x){
 lapply(Lev2,function(y){
 lapply(Lev3,function(z){
 paste(unlist(x),unlist(y),unlist(z))
 })})})

 I would like to name the different levels in the list with the differnt
 inputs.
 So that the first level in named by the inputs in Lev1 and the second Level
 by the inputs in Lev2, and so on

 I would then like to use this call

 MyList$A1$B1$C1
 A1 B1 C1

 Regards Tom
 -- 
 View this message in context: 
 http://www.nabble.com/Naming-Lists-tf4228968.html#a12030717
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial sampling problem

2007-08-07 Thread Julian Burgos
Hello SK,

I suggest you try the RandomFields package.  It has specific functions 
for simulating random fields with an array of different algorithms and 
distributions.  The GaussRF function in particular normal generates 
random fields.

Julian

Julian M. Burgos

Fisheries Acoustics Research Lab
School of Aquatic and Fishery Science
University of Washington

1122 NE Boat Street
Seattle, WA  98105 

Phone: 206-221-6864


sk wrote:
 Hi All,
   I am new in R and trying to simulate random normal 2D field with mean trend 
 say north-south. My domain is 10x10 grid and I am trying to use mvnorm but do 
 not know how to specify the domain and the mean field. 
   I would appreciate any help.
   Cheers, SK  


 -


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] XL files

2007-08-07 Thread Acharjee, Animesh
Dear All,
I am new to R. I need to read XLs files, parse the data
and convert them into txt format for further calculation . If any one
have code for that , please send me code or suggestions for the
same.Waiting for reply.
 
Thanking you 
 
Animesh 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] installing RMySql-0.6.0

2007-08-07 Thread yuewu
Hi,

I am trying to install RMySql-.0.6.0 to work with WinXP, MySQL 5 and R 2.51. 
there is no binary for windows on cran-r, so i grabbed RMySql-0.6.0.tar.gz and 
tried to compile it.

I get the following error: 
* checking for file 'RMySQL/DESCRIPTION' ... OK
* preparing 'RMySQL':
* checking DESCRIPTION meta-information ... 'sh' is not recognized as an 
internal or external command. 
OK
* cleaning src
* removing junk files

where is it specified to use .sh commands?  i went through the installation 
instructions and could not find references to it.

also do i need to compile the src/*.c files separately? i assume the rcmd build 
should take of everything as the individual src/.*c files don't compile without 
mysql.h and unistd.h.

thanks,

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] installing RMySql-0.6.0

2007-08-07 Thread Gabor Grothendieck
Actually there is an RMYSQL windows binary in the CRAN Extras area but
it crashes on my XP machine when I try to use it.  I have had more luck with
the Windows binary in the Bioconductor Extras area (which seems to work except
for some warnings that you can ignore):

http://www.bioconductor.org/packages/2.0/extra/bin/windows/contrib/2.5/RMySQL_0.6-0.zip

and also another private build by one user that works without the warnings.
Note that there is an r-sig-db list for this type of question:

   https://stat.ethz.ch/mailman/listinfo/r-sig-db


On 8/7/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi,

 I am trying to install RMySql-.0.6.0 to work with WinXP, MySQL 5 and R 2.51. 
 there is no binary for windows on cran-r, so i grabbed RMySql-0.6.0.tar.gz 
 and tried to compile it.

 I get the following error:
 * checking for file 'RMySQL/DESCRIPTION' ... OK
 * preparing 'RMySQL':
 * checking DESCRIPTION meta-information ... 'sh' is not recognized as an 
 internal or external command.
 OK
 * cleaning src
 * removing junk files

 where is it specified to use .sh commands?  i went through the installation 
 instructions and could not find references to it.

 also do i need to compile the src/*.c files separately? i assume the rcmd 
 build should take of everything as the individual src/.*c files don't compile 
 without mysql.h and unistd.h.

 thanks,

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lm( ) for log10-transformed data

2007-08-07 Thread Kim Milferstedt
Dear help-list,

I would like to perform a linear regression on the log10 of the two 
vectors ov.mag.min and res.600nm. The slope and intercept of the 
regression I use to plot a wider range of ov.mag.min in a double log plot.

For a linear regression on only tow points, wouldn't I expect the 
results for two.points.min to match pretty exactly res.600nm? It does 
not seem to be the case here. I manually calculated the slope and 
intercept and got values of -0.71 and 1.5 that seem to work alright. 
What is it that I am missing here?

Thanks already for your help,

Kim

ov.mag.min - c(50, 1000)
res.600nm - c(2.0, 0.231)


lm.line.min - lm(  log(ov.mag.min,10)
 ~
 log(res.600nm,10)
 )

intercept.min  - lm.line.min$coefficients[1]
slope.min - lm.line.min$coefficients[2]


two.points.min - log(ov.mag.min,10)

round(res.600nm,2) == 
round(10^(two.points.min*slope.min+intercept.min),2)

round(res.600nm,2)
round(10^(two.points.min*-0.71+1.5),2)


__

Kim Milferstedt
University of Illinois at Urbana-Champaign
Department of Civil and Environmental Engineering
4125 Newmark Civil Engineering Laboratory
205 North Mathews Avenue MC-250
Urbana, IL 61801
USA
phone: (001) 217 333-9663
fax: (001) 217 333-6968
email: [EMAIL PROTECTED]
http://cee.uiuc.edu/research/morgenroth

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] input data file

2007-08-07 Thread Tiandao Li
Hello,

I am new to R. I used scan() to read data from tab-delimited files. I want 
to save all my data files (multiple scan()) in another file, and use it 
like infile statement in SAS or \input{tex.file} in latex.

Thanks!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to include bar values in a barplot?

2007-08-07 Thread Donatas G.
How do I include bar values in a barplot (or other R graphics, where this 
could be applicable)? 

To make sure I am clear I am attaching a barplot created with OpenOffice.org 
which has barplot values written on top of each barplot. 

-- 
Donatas Glodenis
http://dg.lapas.info

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to include bar values in a barplot?

2007-08-07 Thread Frank E Harrell Jr
Donatas G. wrote:
 How do I include bar values in a barplot (or other R graphics, where this 
 could be applicable)? 
 
 To make sure I am clear I am attaching a barplot created with OpenOffice.org 
 which has barplot values written on top of each barplot. 
 

This causes an optical illusion in which the viewer adds some of the 
heights of the numbers to the heights of the bars.  And in general dot 
charts are greatly preferred to bar plots, not the least because of 
their reduced ink:information ratio.

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer() : crossed-random-effects specification

2007-08-07 Thread Daniel Caro
Dear all,

I want to estimate a crossed-random-effects model (i.e., measurements,
students, schools) where students migrate between schools over time.
I'm interested in the fixed effects of SES, age and their
interaction on read (reading achievement) while accounting for the
sample design. Based on a previous post, I'm specifying my model as:
fm1 - lmer2(read ~ SES*age +(1| schlid:idl) +(1| schlid), hamburg,
control=list(gradient = FALSE, niterEM = 0))
where my data is hamburg and idl and schlid are the student and
school ids, respectively.

(1) Is this the specification I want to estimate to obtain those
effects while accounting for...? I'm not sure about the grouping
variables specification.

If not, how should I specify my model? I reviewed Baltes (2007)
Linear mixed model implementation and learned how to detect if my
design is nested or crossed, but not how to specify my model once I
know it is crossed as in my case.

If the previous specification is correct, (2) why do I get this error message?

Error in lmerFactorList(formula, fr$mf, !is.null(family)): number of
levels in grouping factor(s) 'idl:schlid' is too large
In addition: Warning messages:
1: numerical expression has 30113 elements: only the first used in: idl:schlid
2: numerical expression has 30113 elements: only the first used in: idl:schlid

My design consists of 14,047 students, 200 schools and 33,011 obs.

I could, however, run this model:

fm2 - lmer2(read ~ SES*age + (1|idl) +(1|schlid), hamburg)

But I guess it does not account for the crossed design. Does it?

I'm not an statistician and am using lmer() and R for the first time
today. In other words, I sincerely apologize for the very naïve
question. But I would really like to estimate this model soundly and I
can't with the software I am familiar with

Any advice or references are very much appreciated.

All the best,
Daniel

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] small sample techniques

2007-08-07 Thread Nair, Murlidharan T
If my sample size is small is there a particular switch option that I need to 
use with t.test so that it calculates the t ratio correctly? 

Here is a dummy example?

á =0.05

Mean pain reduction for A =27; B =31 and SD are SDA=9 SDB=12

drgA.p-rnorm(5,27,9); 

drgB.p-rnorm(5,31,12)

t.test(drgA.p,drgB.p) # what do I need to give as additional parameter here?

 

I can do it manually but was looking for a switch option that I need to specify 
for  t.test. 

 

Thanks ../Murli

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lm( ) for log10-transformed data

2007-08-07 Thread Duncan Murdoch
On 8/7/2007 2:58 PM, Kim Milferstedt wrote:
 Dear help-list,
 
 I would like to perform a linear regression on the log10 of the two 
 vectors ov.mag.min and res.600nm. The slope and intercept of the 
 regression I use to plot a wider range of ov.mag.min in a double log plot.
 
 For a linear regression on only tow points, wouldn't I expect the 
 results for two.points.min to match pretty exactly res.600nm? It does 
 not seem to be the case here. I manually calculated the slope and 
 intercept and got values of -0.71 and 1.5 that seem to work alright. 
 What is it that I am missing here?
 
 Thanks already for your help,
 
 Kim
 
 ov.mag.min - c(50, 1000)
 res.600nm - c(2.0, 0.231)
 
 
 lm.line.min - lm(  log(ov.mag.min,10)
  ~
  log(res.600nm,10)
  )
 
 intercept.min  - lm.line.min$coefficients[1]
 slope.min - lm.line.min$coefficients[2]
 
 
 two.points.min - log(ov.mag.min,10)
 
 round(res.600nm,2) == 
 round(10^(two.points.min*slope.min+intercept.min),2)
 
 round(res.600nm,2)
 round(10^(two.points.min*-0.71+1.5),2)

The convention in the lm() function is to express a model as
response ~ predictor.  So you should expect

round(ov.mag.min,2) == round(10^(log(res.600nm, 
10)*slope.min+intercept.min),2)

not the other way around, and this does evaluate to two TRUE values.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] array

2007-08-07 Thread Roland Rau
Hi,

Tiandao Li wrote:
 Hello,
 
 I have some files generated from microarray experiments. I used scan() to
 read the files, and assigned each file to a unique name with many rows and
 columns. Now I want to create a array (ArrayA) with unique names, and I can
 use ArrayA[1,2][[6]] to refer the data in each file. Is there any packages
 available for array of array?
 
if all of your initial arrays only consist of rows and columns
(i.e. matrices) and have the same number of rows and columns, you can
store it conveniently in a three dimensional array:

mat1a - matrix(1:10, ncol=2)
mat2a - matrix(11:20, ncol=2)
mat1a
mat2a

arr3D - array(numeric(0), dim=c(nrow(mat1a), ncol(mat1a), 2))
arr3D[,,1] - mat1a
arr3D[,,2] - mat2a

arr3D
arr3D[,,2]

if your initial arrays (assuming again that you have matrices) have
varying amount of rows and columns, I would suggest to use lists:

mat1b - matrix(1:10, ncol=2)
mat2b - matrix(101:133, ncol=3)
mat1b
mat2b
list3D - list(numberone=mat1b, numbertwo=mat2b)
list3D

list3D[[1]]
list3D[[1]]
list3D[[numberone]]
list3D[[numbertwo]]


I hope this helps.

Best,
Roland

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I generate a grid of data in [0,1]?

2007-08-07 Thread esh
Hello, my objective is to produce a data.frame that represents a grid of data 
points evenly spaced as closely together as possible on a graph with domain and 
range both being [0,1].  I tried to get a smaller example working first and I 
tried the following code:
 sink(file=Rsampledata)
 a-0
 while(a1) {data-data.frame(a,seq(0,1,0.1)); print(data); a-a+0.1}

My output was: 
   a seq.0..1..0.1.
1  00.0
2  00.1
3  00.2
4  00.3
5  00.4
6  00.5
7  00.6
8  00.7
9  00.8
10 00.9
11 01.0
 a seq.0..1..0.1.
1  0.10.0
2  0.10.1
3  0.10.2
4  0.10.3
5  0.10.4
6  0.10.5
7  0.10.6
8  0.10.7
9  0.10.8
10 0.10.9
11 0.11.0
 a seq.0..1..0.1.
1  0.20.0
2  0.20.1
3  0.20.2
4  0.20.3
5  0.20.4
6  0.20.5
7  0.20.6
8  0.20.7
9  0.20.8
10 0.20.9
11 0.21.0
 a seq.0..1..0.1.
1  0.30.0
2  0.30.1
3  0.30.2
4  0.30.3
5  0.30.4
6  0.30.5
7  0.30.6
8  0.30.7
9  0.30.8
10 0.30.9
11 0.31.0
 a seq.0..1..0.1.
1  0.40.0
2  0.40.1
3  0.40.2
4  0.40.3
5  0.40.4
6  0.40.5
7  0.40.6
8  0.40.7
9  0.40.8
10 0.40.9
11 0.41.0
 a seq.0..1..0.1.
1  0.50.0
2  0.50.1
3  0.50.2
4  0.50.3
5  0.50.4
6  0.50.5
7  0.50.6
8  0.50.7
9  0.50.8
10 0.50.9
11 0.51.0
 a seq.0..1..0.1.
1  0.60.0
2  0.60.1
3  0.60.2
4  0.60.3
5  0.60.4
6  0.60.5
7  0.60.6
8  0.60.7
9  0.60.8
10 0.60.9
11 0.61.0
 a seq.0..1..0.1.
1  0.70.0
2  0.70.1
3  0.70.2
4  0.70.3
5  0.70.4
6  0.70.5
7  0.70.6
8  0.70.7
9  0.70.8
10 0.70.9
11 0.71.0
 a seq.0..1..0.1.
1  0.80.0
2  0.80.1
3  0.80.2
4  0.80.3
5  0.80.4
6  0.80.5
7  0.80.6
8  0.80.7
9  0.80.8
10 0.80.9
11 0.81.0
 a seq.0..1..0.1.
1  0.90.0
2  0.90.1
3  0.90.2
4  0.90.3
5  0.90.4
6  0.90.5
7  0.90.6
8  0.90.7
9  0.90.8
10 0.90.9
11 0.91.0
   a seq.0..1..0.1.
1  10.0
2  10.1
3  10.2
4  10.3
5  10.4
6  10.5
7  10.6
8  10.7
9  10.8
10 10.9
11 11.0

This does produce the kind of data points that I will need, however I need all 
these data points in one single data.frame instead of 11 because of further 
formatting I must do with the data.  Any suggestions?
Thank you,
Elysia

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I generate a grid of data in [0,1]?

2007-08-07 Thread Greg Snow
Try:

data - expand.grid( a= seq(0,1,.1), b= seq(0,1,.1) )

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of esh
 Sent: Tuesday, August 07, 2007 1:59 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] How do I generate a grid of data in [0,1]?
 
 Hello, my objective is to produce a data.frame that 
 represents a grid of data points evenly spaced as closely 
 together as possible on a graph with domain and range both 
 being [0,1].  I tried to get a smaller example working first 
 and I tried the following code:
  sink(file=Rsampledata)
  a-0
  while(a1) {data-data.frame(a,seq(0,1,0.1)); print(data); a-a+0.1}
 
 My output was: 
a seq.0..1..0.1.
 1  00.0
 2  00.1
 3  00.2
 4  00.3
 5  00.4
 6  00.5
 7  00.6
 8  00.7
 9  00.8
 10 00.9
 11 01.0
  a seq.0..1..0.1.
 1  0.10.0
 2  0.10.1
 3  0.10.2
 4  0.10.3
 5  0.10.4
 6  0.10.5
 7  0.10.6
 8  0.10.7
 9  0.10.8
 10 0.10.9
 11 0.11.0
  a seq.0..1..0.1.
 1  0.20.0
 2  0.20.1
 3  0.20.2
 4  0.20.3
 5  0.20.4
 6  0.20.5
 7  0.20.6
 8  0.20.7
 9  0.20.8
 10 0.20.9
 11 0.21.0
  a seq.0..1..0.1.
 1  0.30.0
 2  0.30.1
 3  0.30.2
 4  0.30.3
 5  0.30.4
 6  0.30.5
 7  0.30.6
 8  0.30.7
 9  0.30.8
 10 0.30.9
 11 0.31.0
  a seq.0..1..0.1.
 1  0.40.0
 2  0.40.1
 3  0.40.2
 4  0.40.3
 5  0.40.4
 6  0.40.5
 7  0.40.6
 8  0.40.7
 9  0.40.8
 10 0.40.9
 11 0.41.0
  a seq.0..1..0.1.
 1  0.50.0
 2  0.50.1
 3  0.50.2
 4  0.50.3
 5  0.50.4
 6  0.50.5
 7  0.50.6
 8  0.50.7
 9  0.50.8
 10 0.50.9
 11 0.51.0
  a seq.0..1..0.1.
 1  0.60.0
 2  0.60.1
 3  0.60.2
 4  0.60.3
 5  0.60.4
 6  0.60.5
 7  0.60.6
 8  0.60.7
 9  0.60.8
 10 0.60.9
 11 0.61.0
  a seq.0..1..0.1.
 1  0.70.0
 2  0.70.1
 3  0.70.2
 4  0.70.3
 5  0.70.4
 6  0.70.5
 7  0.70.6
 8  0.70.7
 9  0.70.8
 10 0.70.9
 11 0.71.0
  a seq.0..1..0.1.
 1  0.80.0
 2  0.80.1
 3  0.80.2
 4  0.80.3
 5  0.80.4
 6  0.80.5
 7  0.80.6
 8  0.80.7
 9  0.80.8
 10 0.80.9
 11 0.81.0
  a seq.0..1..0.1.
 1  0.90.0
 2  0.90.1
 3  0.90.2
 4  0.90.3
 5  0.90.4
 6  0.90.5
 7  0.90.6
 8  0.90.7
 9  0.90.8
 10 0.90.9
 11 0.91.0
a seq.0..1..0.1.
 1  10.0
 2  10.1
 3  10.2
 4  10.3
 5  10.4
 6  10.5
 7  10.6
 8  10.7
 9  10.8
 10 10.9
 11 11.0
 
 This does produce the kind of data points that I will need, 
 however I need all these data points in one single data.frame 
 instead of 11 because of further formatting I must do with 
 the data.  Any suggestions?
 Thank you,
 Elysia
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] backslash c

2007-08-07 Thread Mikkel Grum
How do I get output with \color{blue}, i.e. with
only one backslash??? 

 \color{blue}
[1] color{blue}
Warning messages:
1: '\c' is an unrecognized escape in a character
string 
2: unrecognized escape removed from \color{blue} 
 \\color{blue}
[1] \\color{blue}
 

Any help greatly appreciated.

Best regards,
Mikkel

 sessionInfo()
R version 2.5.1 (2007-06-27) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_Ireland.1252;LC_CTYPE=English_Ireland.1252;LC_MONETARY=English_Ireland.1252;LC_NUMERIC=C;LC_TIME=English_Ireland.1252

attached base packages:
[1] stats graphics  grDevices utils
datasets  methods  
[7] base

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] backslash c

2007-08-07 Thread Henrique Dallazuanna
Perhaps:

cat(\\color{blue}\n)

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

On 07/08/07, Mikkel Grum [EMAIL PROTECTED] wrote:

 How do I get output with \color{blue}, i.e. with
 only one backslash???

  \color{blue}
 [1] color{blue}
 Warning messages:
 1: '\c' is an unrecognized escape in a character
 string
 2: unrecognized escape removed from \color{blue}
  \\color{blue}
 [1] \\color{blue}
 

 Any help greatly appreciated.

 Best regards,
 Mikkel

  sessionInfo()
 R version 2.5.1 (2007-06-27)
 i386-pc-mingw32

 locale:

 LC_COLLATE=English_Ireland.1252;LC_CTYPE=English_Ireland.1252;LC_MONETARY=English_Ireland.1252;LC_NUMERIC=C;LC_TIME=English_Ireland.1252

 attached base packages:
 [1] stats graphics  grDevices utils
 datasets  methods
 [7] base

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] backslash c

2007-08-07 Thread Charles C. Berry

Mikkel,

As

RSiteSearch(backslash)

would have shown you, this is a FAQ.

Your second attempt is correct.

To understand why, see the R FAQ

7.37 Why does backslash behave strangely inside strings?

Chuck

On Tue, 7 Aug 2007, Mikkel Grum wrote:

 How do I get output with \color{blue}, i.e. with
 only one backslash???

 \color{blue}
 [1] color{blue}
 Warning messages:
 1: '\c' is an unrecognized escape in a character
 string
 2: unrecognized escape removed from \color{blue}
 \\color{blue}
 [1] \\color{blue}


 Any help greatly appreciated.

 Best regards,
 Mikkel

 sessionInfo()
 R version 2.5.1 (2007-06-27)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_Ireland.1252;LC_CTYPE=English_Ireland.1252;LC_MONETARY=English_Ireland.1252;LC_NUMERIC=C;LC_TIME=English_Ireland.1252

 attached base packages:
 [1] stats graphics  grDevices utils
 datasets  methods
 [7] base

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to include bar values in a barplot?

2007-08-07 Thread Donatas G.
On Tuesday 07 August 2007 22:09:52 Donatas G. wrote:
 How do I include bar values in a barplot (or other R graphics, where this
 could be applicable)?

 To make sure I am clear I am attaching a barplot created with
 OpenOffice.org which has barplot values written on top of each barplot.

Here is the barplot mentioned above:
http://dg.lapas.info/wp-content/barplot-with-values.jpg

it appeaars that this list does not allow attachments...

-- 
Donatas Glodenis
http://dg.lapas.info

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to include bar values in a barplot?

2007-08-07 Thread Gabor Grothendieck
See:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56852.html



On 8/7/07, Donatas G. [EMAIL PROTECTED] wrote:
 On Tuesday 07 August 2007 22:09:52 Donatas G. wrote:
  How do I include bar values in a barplot (or other R graphics, where this
  could be applicable)?
 
  To make sure I am clear I am attaching a barplot created with
  OpenOffice.org which has barplot values written on top of each barplot.

 Here is the barplot mentioned above:
 http://dg.lapas.info/wp-content/barplot-with-values.jpg

 it appeaars that this list does not allow attachments...

 --
 Donatas Glodenis
 http://dg.lapas.info

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] array

2007-08-07 Thread Tiandao Li
Hello,

I have 30 GenePix tab-delimited files generated from 10 microarray 
experiments, each has 3 replicates. I used scan() to read the files, and 
assigned each file to a unique name (such as A1, A2, A3 for experment A, 
and B1, B2, and B3 for experiment B) with 1000 rows and 56 columns.

Now I want to create a array (ArrayA) with these file names,

ArrayA
[,1][,2]... [,10]
[1,]A1  B1  C1
[2,]A2  B2  C2
[3,]A3  B3  C3

so I can use ArrayA[1,2][[6]] to refer the data in column6 of experimentB, 
replicate 1. Is there any packages available for array of array?

Thanks!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to include bar values in a barplot?

2007-08-07 Thread Greg Snow
Generally adding the numbers to a graph accomplishes 2 things:

1) it acts as an admission that your graph is a failure

2) it converts the graph into a poorly laid out table (with a colorful
and distracting background)

In general it is better to find an appropriate graph that does convey
the information that is intended or if a table is more appropriate, then
replace it with a well laid out table (or both).

Remember that the role of tables is to look up specific values and the
role of graphs is to give a good overview.

The books by William Cleveland and Tufte have a lot of good advice on
these issues.

Before asking how to get R to produce a graph that looks like one from a
spreadsheet, you should study:
http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html and
some of the links from there.  You may also want to run the following in
R:

 library(fortunes)
 fortune(120)

In general I like OpenOffice, my one main complaint is that when faced
with the decision between doing something right or the same way as
microsoft, they have not always made the right decision.

Hope this gives you something to think about,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Donatas G.
 Sent: Tuesday, August 07, 2007 1:10 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] how to include bar values in a barplot?
 
 How do I include bar values in a barplot (or other R 
 graphics, where this could be applicable)? 
 
 To make sure I am clear I am attaching a barplot created with 
 OpenOffice.org which has barplot values written on top of 
 each barplot. 
 
 --
 Donatas Glodenis
 http://dg.lapas.info
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to include bar values in a barplot?

2007-08-07 Thread Donatas G.
On Tuesday 07 August 2007 22:09:52 Donatas G. wrote:
 How do I include bar values in a barplot (or other R graphics, where this
 could be applicable)?

 To make sure I am clear I am attaching a barplot created with
 OpenOffice.org which has barplot values written on top of each barplot.

After more than two hours search I finally found a solution:
http://tolstoy.newcastle.edu.au/R/help/06/05/27286.html
and
http://tolstoy.newcastle.edu.au/R/help/05/09/12936.html

However, what about percentage barcharts, such as this:
http://dg.lapas.info/wp-content/barplot-lytys-G07_x_mean.png

And maybe somebody knows how to get the legend of the bars in the barchart 
(again, see the example above)?

-- 
Donatas Glodenis
http://dg.lapas.info

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer() - crossed random effects specification

2007-08-07 Thread Daniel Caro
Dear all,

I want to estimate a crossed-random-effects model (i.e., measurements,
students, schools) where students migrate between schools over time.
I'm interested in the fixed effects of SES, age and their
interaction on read (reading achievement) while accounting for the
sample design. Based on a previous post, I'm specifying my model as:

fm1 - lmer2(read ~ SES*age +(1| schlid:idl) +(1| schlid), hamburg,
control=list(gradient = FALSE, niterEM = 0))

where my data is hamburg and idl and schlid are the student and
school ids, respectively.

(1) Is this the specification I want to estimate to obtain those
effects while accounting for...? I'm not sure about the grouping
variables specification.

If not, how should I specify my model? I reviewed Baltes (2007)
Linear mixed model implementation and learned how to detect if my
design is nested or crossed, but not how to specify my model once I
know it is crossed as in my case.

If the previous specification is correct, (2) why do I get this error message?

Error in lmerFactorList(formula, fr$mf, !is.null(family)): number of
levels in grouping factor(s) 'idl:schlid' is too large
In addition: Warning messages:
1: numerical expression has 30113 elements: only the first used in: idl:schlid
2: numerical expression has 30113 elements: only the first used in: idl:schlid

My design consists of 14,047 students, 200 schools and 33,011 obs.

I could, however, run this model:

fm2 - lmer2(read ~ SES*age + (1|idl) +(1|schlid), hamburg)

But I guess it does not account for the crossed design. Does it?

I'm not an statistician and am using lmer() and R for the first time
today. In other words, I sincerely apologize for the very naïve
question. But I would really like to estimate this model soundly and I
can't with the software I am familiar with

Any advice or references are very much appreciated.

All the best,

Daniel

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer() : crossed-random-effects specification

2007-08-07 Thread Douglas Bates
On 8/7/07, Daniel Caro [EMAIL PROTECTED] wrote:
 Dear all,

 I want to estimate a crossed-random-effects model (i.e., measurements,
 students, schools) where students migrate between schools over time.
 I'm interested in the fixed effects of SES, age and their
 interaction on read (reading achievement) while accounting for the
 sample design. Based on a previous post, I'm specifying my model as:
 fm1 - lmer2(read ~ SES*age +(1| schlid:idl) +(1| schlid), hamburg,
 control=list(gradient = FALSE, niterEM = 0))
 where my data is hamburg and idl and schlid are the student and
 school ids, respectively.

 (1) Is this the specification I want to estimate to obtain those
 effects while accounting for...? I'm not sure about the grouping
 variables specification.

 If not, how should I specify my model? I reviewed Baltes (2007)
 Linear mixed model implementation and learned how to detect if my
 design is nested or crossed, but not how to specify my model once I
 know it is crossed as in my case.

 If the previous specification is correct, (2) why do I get this error message?

 Error in lmerFactorList(formula, fr$mf, !is.null(family)): number of
 levels in grouping factor(s) 'idl:schlid' is too large
 In addition: Warning messages:
 1: numerical expression has 30113 elements: only the first used in: idl:schlid
 2: numerical expression has 30113 elements: only the first used in: idl:schlid

 My design consists of 14,047 students, 200 schools and 33,011 obs.

 I could, however, run this model:

 fm2 - lmer2(read ~ SES*age + (1|idl) +(1|schlid), hamburg)

 But I guess it does not account for the crossed design. Does it?

Yes, it does account for the crossed design.
 I'm not an statistician and am using lmer() and R for the first time
 today. In other words, I sincerely apologize for the very naïve
 question. But I would really like to estimate this model soundly and I
 can't with the software I am familiar with

 Any advice or references are very much appreciated.

 All the best,
 Daniel

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixture of Normals with Large Data

2007-08-07 Thread Tim Victor
I wasn't aware of this literature, thanks for the references.

On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote:
 Another possibility is to use data squashing methods.  Relevant papers are: 
 (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen (1999).

 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: [EMAIL PROTECTED]


 - Original Message -
 From: Charles C. Berry [EMAIL PROTECTED]
 Date: Saturday, August 4, 2007 8:01 pm
 Subject: Re: [R] Mixture of Normals with Large Data
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch


  On Sat, 4 Aug 2007, Tim Victor wrote:
 
All:
   
I am trying to fit a mixture of 2 normals with  110 million
  observations. I
am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
  I
continue to run out of memory. Does anyone have any suggestions.
 
 
   If the first few million observations can be regarded as a SRS of the
 
   rest, then just use them. Or read in blocks of a convenient size and
 
   sample some observations from each block. You can repeat this process
  a
   few times to see if the results are sufficiently accurate.
 
   Otherwise, read in blocks of a convenient size (perhaps 1 million
   observations at a time), quantize the data to a manageable number of
 
   intervals - maybe a few thousand - and tabulate it. Add the counts
  over
   all the blocks.
 
   Then use mle() to fit a multinomial likelihood whose probabilities
  are the
   masses associated with each bin under a mixture of normals law.
 
   Chuck
 
   
Thanks so much,
   
Tim
   
   [[alternative HTML version deleted]]
   
__
R-help@stat.math.ethz.ch mailing list
   
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
   
 
   Charles C. Berry(858) 534-2098
Dept of
  Family/Preventive Medicine
   E UC San Diego
 La Jolla, San Diego 92093-0901
 
   __
   R-help@stat.math.ethz.ch mailing list
 
   PLEASE do read the posting guide
   and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] S4 methods: unable to find an inherited method

2007-08-07 Thread H. Paul Benton
Hello all,

I consider myself pretty new to the whole OO based programming so
I'm sorry if I'm doing something stupid.

 xml-read.metlin(url)
Error in function (classes, fdef, mtable)  :
unable to find an inherited method for function read.metlin,
for signature url

read.metlin
standardGeneric for read.metlin defined from package .GlobalEnv

function (xml, ...)
standardGeneric(read.metlin)
environment: 0x83a8ae4
Methods may be defined for arguments: xml

 url
   description
http://metlin.scripps.edu/download/MSMS_test.XML;
 class
 url
  mode
   r
  text
text
opened
  closed
  can read
 yes
 can write
  no

I defined my methods as :


if (!isGeneric(read.metlin) )
setGeneric(read.metlin, function(xml) standardGeneric(read.metlin))

setMethod(read.metlin, xcmsRaw, function(xml) {
#Parsing the METLIN XML File
reading-readLines(xml)
#do rest of script

})

Any help as to why I'm getting the inherited method error would be great.

Cheers,

Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixture of Normals with Large Data

2007-08-07 Thread Bert Gunter
Why would anyone want to fit a mixture of normals with 110 million
observations?? Any questions about the distribution that you would care to
ask can be answered directly from the data. Of course, any test of normality
(or anything else) would be rejected.

More to the point, the data are certainly not a random sample of anything.
There will be all kinds of systematic nonrandom structure in them. This is
clearly a situation where the researcher needs to think more carefully about
the substantive questions of interest and how the data may shed light on
them, instead of arbitrarily and perhaps reflexively throwing some silly
statistical methodology at them.  

Bert Gunter
Genentech Nonclinical Statistics

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor
Sent: Tuesday, August 07, 2007 3:02 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Mixture of Normals with Large Data

I wasn't aware of this literature, thanks for the references.

On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote:
 Another possibility is to use data squashing methods.  Relevant papers
are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen
(1999).

 Ravi.
 

 Ravi Varadhan, Ph.D.
 Assistant Professor,
 Division of Geriatric Medicine and Gerontology
 School of Medicine
 Johns Hopkins University

 Ph. (410) 502-2619
 email: [EMAIL PROTECTED]


 - Original Message -
 From: Charles C. Berry [EMAIL PROTECTED]
 Date: Saturday, August 4, 2007 8:01 pm
 Subject: Re: [R] Mixture of Normals with Large Data
 To: [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch


  On Sat, 4 Aug 2007, Tim Victor wrote:
 
All:
   
I am trying to fit a mixture of 2 normals with  110 million
  observations. I
am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
  I
continue to run out of memory. Does anyone have any suggestions.
 
 
   If the first few million observations can be regarded as a SRS of the
 
   rest, then just use them. Or read in blocks of a convenient size and
 
   sample some observations from each block. You can repeat this process
  a
   few times to see if the results are sufficiently accurate.
 
   Otherwise, read in blocks of a convenient size (perhaps 1 million
   observations at a time), quantize the data to a manageable number of
 
   intervals - maybe a few thousand - and tabulate it. Add the counts
  over
   all the blocks.
 
   Then use mle() to fit a multinomial likelihood whose probabilities
  are the
   masses associated with each bin under a mixture of normals law.
 
   Chuck
 
   
Thanks so much,
   
Tim
   
   [[alternative HTML version deleted]]
   
__
R-help@stat.math.ethz.ch mailing list
   
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
   
 
   Charles C. Berry(858) 534-2098
Dept of
  Family/Preventive Medicine
   E UC San Diego
 La Jolla, San Diego 92093-0901
 
   __
   R-help@stat.math.ethz.ch mailing list
 
   PLEASE do read the posting guide
   and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixture of Normals with Large Data

2007-08-07 Thread Tim Victor
Have you considered the situation of wanting to characterize
probability densities of prevalence estimates based on a complex
random sample of some large population.

On 8/7/07, Bert Gunter [EMAIL PROTECTED] wrote:
 Why would anyone want to fit a mixture of normals with 110 million
 observations?? Any questions about the distribution that you would care to
 ask can be answered directly from the data. Of course, any test of normality
 (or anything else) would be rejected.

 More to the point, the data are certainly not a random sample of anything.
 There will be all kinds of systematic nonrandom structure in them. This is
 clearly a situation where the researcher needs to think more carefully about
 the substantive questions of interest and how the data may shed light on
 them, instead of arbitrarily and perhaps reflexively throwing some silly
 statistical methodology at them.

 Bert Gunter
 Genentech Nonclinical Statistics

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor
 Sent: Tuesday, August 07, 2007 3:02 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Mixture of Normals with Large Data

 I wasn't aware of this literature, thanks for the references.

 On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote:
  Another possibility is to use data squashing methods.  Relevant papers
 are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen
 (1999).
 
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: [EMAIL PROTECTED]
 
 
  - Original Message -
  From: Charles C. Berry [EMAIL PROTECTED]
  Date: Saturday, August 4, 2007 8:01 pm
  Subject: Re: [R] Mixture of Normals with Large Data
  To: [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
 
 
   On Sat, 4 Aug 2007, Tim Victor wrote:
  
 All:

 I am trying to fit a mixture of 2 normals with  110 million
   observations. I
 am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
   I
 continue to run out of memory. Does anyone have any suggestions.
  
  
If the first few million observations can be regarded as a SRS of the
  
rest, then just use them. Or read in blocks of a convenient size and
  
sample some observations from each block. You can repeat this process
   a
few times to see if the results are sufficiently accurate.
  
Otherwise, read in blocks of a convenient size (perhaps 1 million
observations at a time), quantize the data to a manageable number of
  
intervals - maybe a few thousand - and tabulate it. Add the counts
   over
all the blocks.
  
Then use mle() to fit a multinomial likelihood whose probabilities
   are the
masses associated with each bin under a mixture of normals law.
  
Chuck
  

 Thanks so much,

 Tim

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list

 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.

  
Charles C. Berry(858) 534-2098
 Dept of
   Family/Preventive Medicine
E UC San Diego
  La Jolla, San Diego 92093-0901
  
__
R-help@stat.math.ethz.ch mailing list
  
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixture of Normals with Large Data

2007-08-07 Thread Bert Gunter
 

Have you considered the situation of wanting to characterize
probability densities of prevalence estimates based on a complex
random sample of some large population.

No -- and I stand by my statement. The empirical distribution of the data
themselves are the best characterization of the density. You and others
are free to disagree.

-- Bert



On 8/7/07, Bert Gunter [EMAIL PROTECTED] wrote:
 Why would anyone want to fit a mixture of normals with 110 million
 observations?? Any questions about the distribution that you would care to
 ask can be answered directly from the data. Of course, any test of
normality
 (or anything else) would be rejected.

 More to the point, the data are certainly not a random sample of anything.
 There will be all kinds of systematic nonrandom structure in them. This is
 clearly a situation where the researcher needs to think more carefully
about
 the substantive questions of interest and how the data may shed light on
 them, instead of arbitrarily and perhaps reflexively throwing some silly
 statistical methodology at them.

 Bert Gunter
 Genentech Nonclinical Statistics

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Tim Victor
 Sent: Tuesday, August 07, 2007 3:02 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] Mixture of Normals with Large Data

 I wasn't aware of this literature, thanks for the references.

 On 8/5/07, RAVI VARADHAN [EMAIL PROTECTED] wrote:
  Another possibility is to use data squashing methods.  Relevant papers
 are: (1) DuMouchel et al. (1999), (2) Madigan et al. (2002), and (3) Owen
 (1999).
 
  Ravi.
  
 
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
 
  Ph. (410) 502-2619
  email: [EMAIL PROTECTED]
 
 
  - Original Message -
  From: Charles C. Berry [EMAIL PROTECTED]
  Date: Saturday, August 4, 2007 8:01 pm
  Subject: Re: [R] Mixture of Normals with Large Data
  To: [EMAIL PROTECTED]
  Cc: r-help@stat.math.ethz.ch
 
 
   On Sat, 4 Aug 2007, Tim Victor wrote:
  
 All:

 I am trying to fit a mixture of 2 normals with  110 million
   observations. I
 am running R 2.5.1 on a box with 1gb RAM running 32-bit windows and
   I
 continue to run out of memory. Does anyone have any suggestions.
  
  
If the first few million observations can be regarded as a SRS of the
  
rest, then just use them. Or read in blocks of a convenient size and
  
sample some observations from each block. You can repeat this process
   a
few times to see if the results are sufficiently accurate.
  
Otherwise, read in blocks of a convenient size (perhaps 1 million
observations at a time), quantize the data to a manageable number of
  
intervals - maybe a few thousand - and tabulate it. Add the counts
   over
all the blocks.
  
Then use mle() to fit a multinomial likelihood whose probabilities
   are the
masses associated with each bin under a mixture of normals law.
  
Chuck
  

 Thanks so much,

 Tim

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list

 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.

  
Charles C. Berry(858) 534-2098
 Dept of
   Family/Preventive Medicine
E UC San Diego
  La Jolla, San Diego 92093-0901
  
__
R-help@stat.math.ethz.ch mailing list
  
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R2WinBUGS results not different with different runs

2007-08-07 Thread toby909
Hi All

I dont know if anyone else has noticed the same thing, but with 2 subsequent 
runs of the same syntax, I am getting exactly the same results. I was expecting 
that results differ slighlty, say in the 4th or 5th decimal place.
Is this a specialty with R2WinBUGS? Does it have something to do with the seed 
value? Isnt the seed value reset everytime I restart winbugs?

Thanks Toby

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional subsetting

2007-08-07 Thread Tim Sippel
Hello-

Upon searching the email archives and reading documentation I haven't found
what I'm looking for.  I have a dataset with a time/date stamp on a series
of geographic locations (lat/lon) associated with the movements of animals.
On some days there are numerous locations and some days there is only one
location.  Associated with each location is an indication that the quality
of the location is either good, poor, or unknown.  For each animal, I
want to extract only one location for each day.  And I want the location
extracted to be nearest to 12pm on any given day, and highest quality
possible.  

So the order of priority for my extraction will be: 1. only one location per
animal each day; 2. the location selected be as close to 12pm as possible;
3. the selected location comes from the best locations available (ie. take
from pool of good locations first, or select from poor locations if a
good location isn't available, or unknown if nothing else is available).


 

I think aspect of this task that has me particularly stumped is how to
select only one location for each day, and for that location to be a close
to 12pm as possible.  

 

An example of my dataset follows:

 



DeployID

 

Date.Time

 

LocationQuality

Latitude

Longitude


STM05-1

 

28/02/2005 17:35

 

Good

-35.562

177.158


STM05-1

 

28/02/2005 19:44

 

Good

-35.487

177.129


STM05-1

 

28/02/2005 23:01

 

Unknown

-35.399

177.064


STM05-1

 

01/03/2005 07:28

 

Unknown

-34.978

177.268


STM05-1

 

01/03/2005 18:06

 

Poor

-34.799

177.027


STM05-1

 

01/03/2005 18:47

 

Poor

-34.85

177.059


STM05-2

 

28/02/2005 12:49

 

Good

-35.928

177.328


STM05-2

 

28/02/2005 21:23

 

Poor

-35.926

177.314

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

Many thanks for your input.  I'm using R 2.5.1 on Windows XP.  

 

Cheers,

 

Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] input data file

2007-08-07 Thread jim holtman
If you are going to read it back into R, then use 'save'; if it is
input to another applicaiton, consider 'write.csv'.  I assume that
when you say save all my data files you really mean save all my R
objects.

On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I am new to R. I used scan() to read data from tab-delimited files. I want
 to save all my data files (multiple scan()) in another file, and use it
 like infile statement in SAS or \input{tex.file} in latex.

 Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to convert decimal date to its equivalent date format(YYYY.mm.dd.hr.min.sec)

2007-08-07 Thread jim holtman
Is this what you want?

 x - scan(textConnection(1979.00
+
+ 1979.020833
+
+ 1979.041667
+
+ 1979.062500), what=0)
Read 4 items
 # get the year and then determine the number of seconds in the year so you can
 # use the decimal part of the year
 x.year - floor(x)
 # fraction of the year
 x.frac - x - x.year
 # number of seconds in each year
 x.sec.yr - unclass(ISOdate(x.year+1,1,1,0,0,0)) - 
 unclass(ISOdate(x.year,1,1,0,0,0))
 # now get the actual time
 x.actual - ISOdate(x.year,1,1,0,0,0) + x.frac * x.sec.yr

 x.actual
[1] 1979-01-01 00:00:00 GMT 1979-01-08 14:29:49 GMT 1979-01-16
05:00:10 GMT
[4] 1979-01-23 19:30:00 GMT



On 8/7/07, Yogesh Tiwari [EMAIL PROTECTED] wrote:
 Hello R Users,

 How to convert decimal date to date as .mm.dd.hr.min.sec

 For example, I have decimal date in one column , and want to convert and
 write it in equivalent date(.mm.dd.hr.min.sec) in another next six
 columns.

 1979.00

 1979.020833

 1979.041667

 1979.062500



 Is it possible in R ?

 Kindly help,

 Regards,

 Yogesh





 --
 Dr. Yogesh K. Tiwari,
 Scientist,
 Indian Institute of Tropical Meteorology,
 Homi Bhabha Road,
 Pashan,
 Pune-411008
 INDIA

 Phone: 0091-99 2273 9513 (Cell)
 : 0091-20-258 93 600 (O) (Ext.250)
 Fax: 0091-20-258 93 825

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional subsetting

2007-08-07 Thread Ross Darnell

Assuming *A* animal can only be in one location at any one time, I don't 
understand how once you have selected the location nearest 12pm how you can 
select based on the type of location?
Do you mean to select on location before timei.e. from the best location 
visited that day which was closest to 12pm?

Perhaps for condition 2  for a animal which.min(time-1200) (replace 1200 with 
properly defined timestamp representing 12:00pm)

Ross Darnell
  


-Original Message-
From: [EMAIL PROTECTED] on behalf of Tim Sippel
Sent: Wed 08-Aug-07 9:37 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Conditional subsetting
 
Hello-

Upon searching the email archives and reading documentation I haven't found
what I'm looking for.  I have a dataset with a time/date stamp on a series
of geographic locations (lat/lon) associated with the movements of animals.
On some days there are numerous locations and some days there is only one
location.  Associated with each location is an indication that the quality
of the location is either good, poor, or unknown.  For each animal, I
want to extract only one location for each day.  And I want the location
extracted to be nearest to 12pm on any given day, and highest quality
possible.  

So the order of priority for my extraction will be: 1. only one location per
animal each day; 2. the location selected be as close to 12pm as possible;
3. the selected location comes from the best locations available (ie. take
from pool of good locations first, or select from poor locations if a
good location isn't available, or unknown if nothing else is available).


 

I think aspect of this task that has me particularly stumped is how to
select only one location for each day, and for that location to be a close
to 12pm as possible.  

 

An example of my dataset follows:

 



DeployID

 

Date.Time

 

LocationQuality

Latitude

Longitude


STM05-1

 

28/02/2005 17:35

 

Good

-35.562

177.158


STM05-1

 

28/02/2005 19:44

 

Good

-35.487

177.129


STM05-1

 

28/02/2005 23:01

 

Unknown

-35.399

177.064


STM05-1

 

01/03/2005 07:28

 

Unknown

-34.978

177.268


STM05-1

 

01/03/2005 18:06

 

Poor

-34.799

177.027


STM05-1

 

01/03/2005 18:47

 

Poor

-34.85

177.059


STM05-2

 

28/02/2005 12:49

 

Good

-35.928

177.328


STM05-2

 

28/02/2005 21:23

 

Poor

-35.926

177.314

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

 

 

 

 

 

 

 

 

 

Many thanks for your input.  I'm using R 2.5.1 on Windows XP.  

 

Cheers,

 

Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] small sample techniques

2007-08-07 Thread Moshe Olshansky
Hi Nair,

If the two populations are normal the t-test gives you
the exact result for whatever the sample size is (the
sample size will affect the number of degrees of
freedom).
When the populations are not normal and the sample
size is large it is still OK to use t-test (because of
the Central Limit Theorem) but this is not necessarily
true for the small sample size.
You could use simulation to find the relevant
probabilities.

--- Nair, Murlidharan T [EMAIL PROTECTED] wrote:

 If my sample size is small is there a particular
 switch option that I need to use with t.test so that
 it calculates the t ratio correctly? 
 
 Here is a dummy example?
 
 á =0.05
 
 Mean pain reduction for A =27; B =31 and SD are
 SDA=9 SDB=12
 
 drgA.p-rnorm(5,27,9); 
 
 drgB.p-rnorm(5,31,12)
 
 t.test(drgA.p,drgB.p) # what do I need to give as
 additional parameter here?
 
  
 
 I can do it manually but was looking for a switch
 option that I need to specify for  t.test. 
 
  
 
 Thanks ../Murli
 
  
 
 
   [[alternative HTML version deleted]]
 
  __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional subsetting

2007-08-07 Thread Tim Sippel
Thanks Ross.  Yes, upon your remark it seems that maybe I need to be
selecting first for the best available location, and then by the time stamp
nearest to 12pm.  It seems like tapply(), or aggregate() or aggregate.ts()
might be applicable, but I'm working through trying different functions and
have yet to find what works.  Suggestions on what function to start with
would be appreciated.  

 

Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)

 

  _  

From: Ross Darnell [mailto:[EMAIL PROTECTED] 
Sent: 08 August 2007 12:43
To: Tim Sippel; r-help@stat.math.ethz.ch
Subject: RE: [R] Conditional subsetting

 

 

Assuming *A* animal can only be in one location at any one time, I don't
understand how once you have selected the location nearest 12pm how you can
select based on the type of location?
Do you mean to select on location before timei.e. from the best location
visited that day which was closest to 12pm?

Perhaps for condition 2  for a animal which.min(time-1200) (replace 1200
with properly defined timestamp representing 12:00pm)

Ross Darnell
 


-Original Message-
From: [EMAIL PROTECTED] on behalf of Tim Sippel
Sent: Wed 08-Aug-07 9:37 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Conditional subsetting

Hello-

Upon searching the email archives and reading documentation I haven't found
what I'm looking for.  I have a dataset with a time/date stamp on a series
of geographic locations (lat/lon) associated with the movements of animals.
On some days there are numerous locations and some days there is only one
location.  Associated with each location is an indication that the quality
of the location is either good, poor, or unknown.  For each animal, I
want to extract only one location for each day.  And I want the location
extracted to be nearest to 12pm on any given day, and highest quality
possible. 

So the order of priority for my extraction will be: 1. only one location per
animal each day; 2. the location selected be as close to 12pm as possible;
3. the selected location comes from the best locations available (ie. take
from pool of good locations first, or select from poor locations if a
good location isn't available, or unknown if nothing else is available).




I think aspect of this task that has me particularly stumped is how to
select only one location for each day, and for that location to be a close
to 12pm as possible. 



An example of my dataset follows:





DeployID



Date.Time



LocationQuality

Latitude

Longitude


STM05-1



28/02/2005 17:35



Good

-35.562

177.158


STM05-1



28/02/2005 19:44



Good

-35.487

177.129


STM05-1



28/02/2005 23:01



Unknown

-35.399

177.064


STM05-1



01/03/2005 07:28



Unknown

-34.978

177.268


STM05-1



01/03/2005 18:06



Poor

-34.799

177.027


STM05-1



01/03/2005 18:47



Poor

-34.85

177.059


STM05-2



28/02/2005 12:49



Good

-35.928

177.328


STM05-2



28/02/2005 21:23



Poor

-35.926

177.314




















































Many thanks for your input.  I'm using R 2.5.1 on Windows XP. 



Cheers,



Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional subsetting

2007-08-07 Thread Tim Sippel
I have tried the following to extract my subset and no luck yet.  Any ideas?

 

subset.df-subset(df, subset=c(DeployID,Date.Time,
LocationQuality==(ordered(c(Good, Poor, Unknown))),

  Latitude,Longitude), select= min(abs(Date.Time-12:00)))

 

I'm not sure if my 'ordered' argument will accomplish my desired result of
first selecting for Good locations, etc.  

 

Thank you,

 

Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)

 

  _  

From: Tim Sippel [mailto:[EMAIL PROTECTED] 
Sent: 08 August 2007 13:16
To: 'Ross Darnell'; 'r-help@stat.math.ethz.ch'
Subject: RE: [R] Conditional subsetting

 

Thanks Ross.  Yes, upon your remark it seems that maybe I need to be
selecting first for the best available location, and then by the time stamp
nearest to 12pm.  It seems like tapply(), or aggregate() or aggregate.ts()
might be applicable, but I'm working through trying different functions and
have yet to find what works.  Suggestions on what function to start with
would be appreciated.  

 

Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)

 

  _  

From: Ross Darnell [mailto:[EMAIL PROTECTED] 
Sent: 08 August 2007 12:43
To: Tim Sippel; r-help@stat.math.ethz.ch
Subject: RE: [R] Conditional subsetting

 

 

Assuming *A* animal can only be in one location at any one time, I don't
understand how once you have selected the location nearest 12pm how you can
select based on the type of location?
Do you mean to select on location before timei.e. from the best location
visited that day which was closest to 12pm?

Perhaps for condition 2  for a animal which.min(time-1200) (replace 1200
with properly defined timestamp representing 12:00pm)

Ross Darnell
 


-Original Message-
From: [EMAIL PROTECTED] on behalf of Tim Sippel
Sent: Wed 08-Aug-07 9:37 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Conditional subsetting

Hello-

Upon searching the email archives and reading documentation I haven't found
what I'm looking for.  I have a dataset with a time/date stamp on a series
of geographic locations (lat/lon) associated with the movements of animals.
On some days there are numerous locations and some days there is only one
location.  Associated with each location is an indication that the quality
of the location is either good, poor, or unknown.  For each animal, I
want to extract only one location for each day.  And I want the location
extracted to be nearest to 12pm on any given day, and highest quality
possible. 

So the order of priority for my extraction will be: 1. only one location per
animal each day; 2. the location selected be as close to 12pm as possible;
3. the selected location comes from the best locations available (ie. take
from pool of good locations first, or select from poor locations if a
good location isn't available, or unknown if nothing else is available).




I think aspect of this task that has me particularly stumped is how to
select only one location for each day, and for that location to be a close
to 12pm as possible. 



An example of my dataset follows:





DeployID



Date.Time



LocationQuality

Latitude

Longitude


STM05-1



28/02/2005 17:35



Good

-35.562

177.158


STM05-1



28/02/2005 19:44



Good

-35.487

177.129


STM05-1



28/02/2005 23:01



Unknown

-35.399

177.064


STM05-1



01/03/2005 07:28



Unknown

-34.978

177.268


STM05-1



01/03/2005 18:06



Poor

-34.799

177.027


STM05-1



01/03/2005 18:47



Poor

-34.85

177.059


STM05-2



28/02/2005 12:49



Good

-35.928

177.328


STM05-2



28/02/2005 21:23



Poor

-35.926

177.314




















































Many thanks for your input.  I'm using R 2.5.1 on Windows XP. 



Cheers,



Tim Sippel (MSc)

School of Biological Sciences

Auckland University

Private Bag 92019

Auckland 1020

New Zealand

+64-9-373-7599 ext. 84589 (work)

+64-9-373-7668 (Fax)

+64-21-593-001 (mobile)




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] input data file

2007-08-07 Thread Tiandao Li
In the first part of myfile.R, I used scan() 100 times to read data from 
100 different tab-delimited files. I want to save this part to another 
data file, so I won't accidently make mistakes, and I want to re-use/input 
it like infile statement in SAS or \input(file.tex} in latex. Don't want 
to copy/paste 100 scan() every time I need to read the same data.

Thanks!

On Tue, 7 Aug 2007, jim holtman wrote:

If you are going to read it back into R, then use 'save'; if it is
input to another applicaiton, consider 'write.csv'.  I assume that
when you say save all my data files you really mean save all my R
objects.

On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
 Hello,

 I am new to R. I used scan() to read data from tab-delimited files. I want
 to save all my data files (multiple scan()) in another file, and use it
 like infile statement in SAS or \input{tex.file} in latex.

 Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional subsetting

2007-08-07 Thread Gabor Grothendieck
Read the data in and transform the Date, Time and Quality columns
so that Date and Time are the respective chron classes (see RNews 4/1 Help
Desk article) and Quality is a factor such that lower levels are better.

Then sort the data frame by Date, Quality and time from noon.  That would
actually be the default ordering if the shown factor levels are the only ones
since they naturally fall into that order but we have explicitly ordered them
in case your extension to other levels no longer obeys that property.

Finally take the first row from each Date group.

Lines - DeployID Date.Time LocationQuality Latitude Longitude
STM05-1 28/02/2005 17:35 Good -35.562 177.158
STM05-1 28/02/2005 19:44 Good -35.487 177.129
STM05-1 28/02/2005 23:01 Unknown -35.399 177.064
STM05-1 01/03/2005 07:28 Unknown -34.978 177.268
STM05-1 01/03/2005 18:06 Poor -34.799 177.027
STM05-1 01/03/2005 18:47 Poor -34.85 177.059
STM05-2 28/02/2005 12:49 Good -35.928 177.328
STM05-2 28/02/2005 21:23 Poor -35.926 177.314


library(chron)
# in next line replace textConnection(Lines) with myfile.dat
DF - read.table(textConnection(Lines), skip = 1,  as.is = TRUE,
  col.names = c(Id, Date, Time, Quality, Lat, Long))

DF - transform(DF,
Date = chron(Date, format = d/m/y),
Time = times(paste(Time, 00, sep = :)),
Quality = factor(Quality, levels = c(Good, Poor, Unknown)))

o - order(DF$Date, as.numeric(DF$Quality), abs(DF$Time - times(12:00:00)))
DF - DF[o,]

DF[tapply(row.names(DF), DF$Date, head, 1), ]

# The last line above could alternately be written like this:
do.call(rbind, by(DF, DF$Date, head, 1))



On 8/7/07, Tim Sippel [EMAIL PROTECTED] wrote:
 Thanks Ross.  Yes, upon your remark it seems that maybe I need to be
 selecting first for the best available location, and then by the time stamp
 nearest to 12pm.  It seems like tapply(), or aggregate() or aggregate.ts()
 might be applicable, but I'm working through trying different functions and
 have yet to find what works.  Suggestions on what function to start with
 would be appreciated.



 Tim Sippel (MSc)

 School of Biological Sciences

 Auckland University

 Private Bag 92019

 Auckland 1020

 New Zealand

 +64-9-373-7599 ext. 84589 (work)

 +64-9-373-7668 (Fax)

 +64-21-593-001 (mobile)



  _

 From: Ross Darnell [mailto:[EMAIL PROTECTED]
 Sent: 08 August 2007 12:43
 To: Tim Sippel; r-help@stat.math.ethz.ch
 Subject: RE: [R] Conditional subsetting





 Assuming *A* animal can only be in one location at any one time, I don't
 understand how once you have selected the location nearest 12pm how you can
 select based on the type of location?
 Do you mean to select on location before timei.e. from the best location
 visited that day which was closest to 12pm?

 Perhaps for condition 2  for a animal which.min(time-1200) (replace 1200
 with properly defined timestamp representing 12:00pm)

 Ross Darnell



 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Tim Sippel
 Sent: Wed 08-Aug-07 9:37 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Conditional subsetting

 Hello-

 Upon searching the email archives and reading documentation I haven't found
 what I'm looking for.  I have a dataset with a time/date stamp on a series
 of geographic locations (lat/lon) associated with the movements of animals.
 On some days there are numerous locations and some days there is only one
 location.  Associated with each location is an indication that the quality
 of the location is either good, poor, or unknown.  For each animal, I
 want to extract only one location for each day.  And I want the location
 extracted to be nearest to 12pm on any given day, and highest quality
 possible.

 So the order of priority for my extraction will be: 1. only one location per
 animal each day; 2. the location selected be as close to 12pm as possible;
 3. the selected location comes from the best locations available (ie. take
 from pool of good locations first, or select from poor locations if a
 good location isn't available, or unknown if nothing else is available).




 I think aspect of this task that has me particularly stumped is how to
 select only one location for each day, and for that location to be a close
 to 12pm as possible.



 An example of my dataset follows:





 DeployID



 Date.Time



 LocationQuality

 Latitude

 Longitude


 STM05-1



 28/02/2005 17:35



 Good

 -35.562

 177.158


 STM05-1



 28/02/2005 19:44



 Good

 -35.487

 177.129


 STM05-1



 28/02/2005 23:01



 Unknown

 -35.399

 177.064


 STM05-1



 01/03/2005 07:28



 Unknown

 -34.978

 177.268


 STM05-1



 01/03/2005 18:06



 Poor

 -34.799

 177.027


 STM05-1



 01/03/2005 18:47



 Poor

 -34.85

 177.059


 STM05-2



 28/02/2005 12:49



 Good

 -35.928

 177.328


 STM05-2



 28/02/2005 21:23



 Poor

 -35.926

 177.314




















































 Many thanks for your input.  I'm using R 2.5.1 on Windows XP.



 Cheers,



 Tim Sippel (MSc)

 

Re: [R] input data file

2007-08-07 Thread jim holtman
I would hope that you don't have 100 'scan' statements; you should
just have a loop that is using a set of file names in a vector to read
the data.  Are you reading the data into separate objects?  If so,
have you considered reading the 100 files into a 'list' so that you
have a single object with all of your data?  This is then easy to save
with the 'save' function and then you can quickly retrieve it with the
'load' statement.

file.names - c('file1', ..., 'file100')
input.list - list()
for (i in file.names){
input.list[[i]] - scan(i, what=)
}

You can then 'save(input.list, file='save.Rdata')'.  You can access
the data from the individual files with:

input.list[['file33']]





On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
 In the first part of myfile.R, I used scan() 100 times to read data from
 100 different tab-delimited files. I want to save this part to another
 data file, so I won't accidently make mistakes, and I want to re-use/input
 it like infile statement in SAS or \input(file.tex} in latex. Don't want
 to copy/paste 100 scan() every time I need to read the same data.

 Thanks!

 On Tue, 7 Aug 2007, jim holtman wrote:

 If you are going to read it back into R, then use 'save'; if it is
 input to another applicaiton, consider 'write.csv'.  I assume that
 when you say save all my data files you really mean save all my R
 objects.

 On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Hello,
 
  I am new to R. I used scan() to read data from tab-delimited files. I want
  to save all my data files (multiple scan()) in another file, and use it
  like infile statement in SAS or \input{tex.file} in latex.
 
  Thanks!
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Binary Search

2007-08-07 Thread Matthew Walker
Hi!

R is an amazing piece of software and with so many libraries it can do 
almost anything... but I was very surprised that a standard binary 
search function seems not to exist.  I can find other much more 
highly-complex search routines (optimize, uniroot, nlm) in the standard 
no-extra-packages-loaded version of R, but not this simple alternative.  
I searched and found an implementation inside the package genetics, 
but that requires the loading of an awful lot of other code to get to 
this simple function.

Have I missed something?  Perhaps I just didn't find what is already in 
there.  Can anyone point me to what I'm after?

If not, perhaps the developers might consider making it more readily 
available?

Cheers,

Matthew

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] input data file

2007-08-07 Thread Tiandao Li
I thought of loop at first. My data were generated from 32 microarray 
experiments, each had 3 replicates, 96 files in total. I named the files 
based on different conditions or time series, and I really won't want to 
name them after numbers. It will make me confused later when I need to 
refer/compare them.


On Tue, 7 Aug 2007, jim holtman wrote:

I would hope that you don't have 100 'scan' statements; you should
just have a loop that is using a set of file names in a vector to read
the data.  Are you reading the data into separate objects?  If so,
have you considered reading the 100 files into a 'list' so that you
have a single object with all of your data?  This is then easy to save
with the 'save' function and then you can quickly retrieve it with the
'load' statement.

file.names - c('file1', ..., 'file100')
input.list - list()
for (i in file.names){
input.list[[i]] - scan(i, what=)
}

You can then 'save(input.list, file='save.Rdata')'.  You can access
the data from the individual files with:

input.list[['file33']]





On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
 In the first part of myfile.R, I used scan() 100 times to read data from
 100 different tab-delimited files. I want to save this part to another
 data file, so I won't accidently make mistakes, and I want to re-use/input
 it like infile statement in SAS or \input(file.tex} in latex. Don't want
 to copy/paste 100 scan() every time I need to read the same data.

 Thanks!

 On Tue, 7 Aug 2007, jim holtman wrote:

 If you are going to read it back into R, then use 'save'; if it is
 input to another applicaiton, consider 'write.csv'.  I assume that
 when you say save all my data files you really mean save all my R
 objects.

 On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
  Hello,
 
  I am new to R. I used scan() to read data from tab-delimited files. I want
  to save all my data files (multiple scan()) in another file, and use it
  like infile statement in SAS or \input{tex.file} in latex.
 
  Thanks!
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] input data file

2007-08-07 Thread jim holtman
You don't have to name them after numbers.  What I sent was just an
example of a character vector with file names.  If you have all the
files in a directory, then you can set the loop to read in all the
files (or selected one based on a pattern match).  If you are
copy/pasting the 'scan' command, then you must somehow be changing the
file name that is being read and the R object that you are storing the
values in.

You can use list.files(pattern=..) to select a list of file names.
This is much easier than copy/paste.

On 8/8/07, Tiandao Li [EMAIL PROTECTED] wrote:
 I thought of loop at first. My data were generated from 32 microarray
 experiments, each had 3 replicates, 96 files in total. I named the files
 based on different conditions or time series, and I really won't want to
 name them after numbers. It will make me confused later when I need to
 refer/compare them.


 On Tue, 7 Aug 2007, jim holtman wrote:

 I would hope that you don't have 100 'scan' statements; you should
 just have a loop that is using a set of file names in a vector to read
 the data.  Are you reading the data into separate objects?  If so,
 have you considered reading the 100 files into a 'list' so that you
 have a single object with all of your data?  This is then easy to save
 with the 'save' function and then you can quickly retrieve it with the
 'load' statement.

 file.names - c('file1', ..., 'file100')
 input.list - list()
 for (i in file.names){
input.list[[i]] - scan(i, what=)
 }

 You can then 'save(input.list, file='save.Rdata')'.  You can access
 the data from the individual files with:

 input.list[['file33']]





 On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
  In the first part of myfile.R, I used scan() 100 times to read data from
  100 different tab-delimited files. I want to save this part to another
  data file, so I won't accidently make mistakes, and I want to re-use/input
  it like infile statement in SAS or \input(file.tex} in latex. Don't want
  to copy/paste 100 scan() every time I need to read the same data.
 
  Thanks!
 
  On Tue, 7 Aug 2007, jim holtman wrote:
 
  If you are going to read it back into R, then use 'save'; if it is
  input to another applicaiton, consider 'write.csv'.  I assume that
  when you say save all my data files you really mean save all my R
  objects.
 
  On 8/7/07, Tiandao Li [EMAIL PROTECTED] wrote:
   Hello,
  
   I am new to R. I used scan() to read data from tab-delimited files. I want
   to save all my data files (multiple scan()) in another file, and use it
   like infile statement in SAS or \input{tex.file} in latex.
  
   Thanks!
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem you are trying to solve?
 


 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R2WinBUGS results not different with different runs

2007-08-07 Thread Gregor Gorjanc
 toby909 at gmail.com writes:
 Is this a specialty with R2WinBUGS? Does it have something to do with the 
 seed 
 value? Isnt the seed value reset everytime I restart winbugs?

I always have the same seed if I start WinBUGS multiple times.

Gregor

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Define t-distribution using data, and ghyp package

2007-08-07 Thread Scott.Wilkinson
Dear R users,

I am fitting a t-distribution to some data, then selecting randomly from
the distribution. I am using the brand new ghyp package, which seems
designed to do this. Firstly is this approach appropriate or are their
alternatives I should also consider? I have a small dataset and the
hyperbolic distribution is also used when heavy tails are expected, but
I understand the t-distribution is designed for understanding variance
when data are limited.

I have 2 questions specific to ghyp; as this package is new perhaps my
limited knowledge is complemented by some teething issues:

1. How should I set argument nu in function fit.tuv()? It appears to be
a starting value. The preliminary draft help file says nu is defined as
-2*lambda but lambda is not defined. Experimenting with Data1 below,
nu=2.5 gives no error messages and hist() looks vaguely reasonable,
although Dist1 lists Converge=FALSE. Nu=2 gives Error in FUN(newX[,
i], ...) : If lambda  0: chi must be  0 and psi must be = 0! lambda =
-1;   chi = 0;   psi = 0. Nu=3 gives Warning message: NaNs produced
in: sqrt(diag(hess)), and again Dist1 gives Converge=FALSE, but also
the resulting hist() is scrambled. 

Looking at Data2, again nu=2.5 gives no error messages, and Dist2 lists
Converge=TRUE. Interestingly, Dist2 lists nu=18.2927194, but if I set
nu=18 in fit.tuv() I get Warning message: NaNs produced in:
sqrt(diag(hess)), and Converge=FALSE (!). 

2. What is the significance/consequence of Converge=FALSE? How can I
achieve Converge=TRUE apart from collecting more data?

3. In fit.tuv() I have experimented with setting argument symmetric=TRUE
but why is the distribution always Asymmetric Student-t? 

Code:

library(ghyp)
Data1 - c(37.07000, 46.94609, 38.19270, 41.98090, 36.45126, 45.25217,
39.07771, 39.35987)
Dist1 - fit.tuv(Data1, nu = 2.5, opt.pars = c(nu = T, mu = T, sigma =
T), symmetric = T)
hist(Dist1, data = Data1, gaussian = T, log.hist = F, ylim = c(0,0.5),
ghyp.col = 1, ghyp.lwd = 1,
ghyp.lty = solid, col = 1, nclass = 30)

Data2 - c(0.07, 2.68, 2.00, 0.27, 0.39, 1.17, 1.34, 4.34, 3.36, 3.61,
1.28)
Dist2 - fit.tuv(Data2, nu = 2.5, opt.pars = c(nu = T, mu = T, sigma =
T), symmetric = T)
hist(Dist2, data = Data2, gaussian = T, log.hist = F, ylim = c(0,1),
ghyp.col = 1, ghyp.lwd = 1,
ghyp.lty = solid, col = 1, nclass = 30)


Thanks in advance,
Scott.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Naming Lists

2007-08-07 Thread Tom.O

Thanks,

I have considered using array but I dont belive its possible. For my
function isn't really paste(), its a custom function that takes the three
different parameters. The result however is a matrix where the size depends
on the inputs. I belive thats hard to fit into an array. But if you have any
suggestions how I could improve the code. Felle free to pass them over, I
would be glad for anything.

//Tom



Charles C. Berry wrote:
 
 On Tue, 7 Aug 2007, Tom.O wrote:
 

 Hi
 Im pretty new to R and I have run in to a problem. How do I name all the
 levels in this list.
 
 One way:
 
 MyList -
mapply(function(x) mapply(
  function(y) mapply( function(z)
   paste(unlist(x),unlist(y),unlist(z)),
 Lev3, SIMPLIFY=FALSE  ),
  Lev2, SIMPLIFY=FALSE   ),
   Lev1, SIMPLIFY=FALSE )
 
 
 However, for many purposes you might better use an array:
 
 MyArray.dimnames - list(Lev1,Lev2,Lev3)
 combos -  rev( expand.grid (Lev3,Lev2,Lev1) )
 elements - do.call( paste, combos )
 MyArray - array( elements, dim=sapply(MyArray.dimnames,length),
 dimnames=MyArray.dimnames )
 MyArray['A1','B1','C2']
 
 
 

 Lev1 - c(A1,A2)
 Lev2 - c(B1,B2)
 Lev3 - c(C1,C2)

 MyList - lapply(Lev1,function(x){
 lapply(Lev2,function(y){
 lapply(Lev3,function(z){
 paste(unlist(x),unlist(y),unlist(z))
 })})})

 I would like to name the different levels in the list with the differnt
 inputs.
 So that the first level in named by the inputs in Lev1 and the second
 Level
 by the inputs in Lev2, and so on

 I would then like to use this call

 MyList$A1$B1$C1
 A1 B1 C1

 Regards Tom
 -- 
 View this message in context:
 http://www.nabble.com/Naming-Lists-tf4228968.html#a12030717
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 Charles C. Berry(858) 534-2098
  Dept of Family/Preventive
 Medicine
 E mailto:[EMAIL PROTECTED]UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Naming-Lists-tf4228968.html#a12047294
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional subsetting

2007-08-07 Thread Gabor Grothendieck
Another way to do this is by using the SQLite select statements making
use of the sqldf package.

First let us assume that the alphabetic order of the Quality levels
reflects their desirability as well.  In the example data this is true
but if its not true in the whole data set where there may be additional
levels of Quality then it would need to be recoded first.

The first select statement sorts the data and the second one just takes
the last row in each group.  (It seems
that when using the group by clause that it takes the last row of each group
but I could not actually find this documented so beware.)  Note
that Time and Date names are changed by the underlying software
which is why we must refer to them using Time__1 and Date__1.

Lines - DeployID Date.Time LocationQuality Latitude Longitude

STM05-1 28/02/2005 17:35 Good -35.562 177.158
STM05-1 28/02/2005 19:44 Good -35.487 177.129
STM05-1 28/02/2005 23:01 Unknown -35.399 177.064
STM05-1 01/03/2005 07:28 Unknown -34.978 177.268
STM05-1 01/03/2005 18:06 Poor -34.799 177.027
STM05-1 01/03/2005 18:47 Poor -34.85 177.059
STM05-2 28/02/2005 12:49 Good -35.928 177.328
STM05-2 28/02/2005 21:23 Poor -35.926 177.314


# in next line replace textConnection(Lines) with myfile.dat
library(sqldf)
DF - read.table(textConnection(Lines), skip = 1,  as.is = TRUE,
  col.names = c(Id, Date, Time, Quality, Lat, Long))

DFo - sqldf(select * from DF order by
  substr(Date__1, 7, 4) || substr(Date__1, 4, 2) || substr(Date__1, 1, 2) DESC,
  Quality DESC,
  abs(substr(Time__1, 1, 2) + substr(Time__1, 4, 2) /60 - 12) DESC)
sqldf(select * from DFo group by Date__1)


# Here is a second different way to do it.
# Another way to do it also using sqldf is via nested selects like this using
# the same DF as above

sqldf(select * from DF u
 where abs(substr(Time__1, 1, 2) + substr(Time__1, 4, 2) /60 - 12) =
   (select min(abs(substr(Time__1, 1, 2) + substr(Time__1, 4, 2) /60 - 12))
 from DF x where Quality =
   (select min(Quality) from DF y
  where x.Date__1 = y.Date__1) and x.Date__1 = u.Date__1))


On 8/7/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Read the data in and transform the Date, Time and Quality columns
 so that Date and Time are the respective chron classes (see RNews 4/1 Help
 Desk article) and Quality is a factor such that lower levels are better.

 Then sort the data frame by Date, Quality and time from noon.  That would
 actually be the default ordering if the shown factor levels are the only ones
 since they naturally fall into that order but we have explicitly ordered them
 in case your extension to other levels no longer obeys that property.

 Finally take the first row from each Date group.

 Lines - DeployID Date.Time LocationQuality Latitude Longitude
 STM05-1 28/02/2005 17:35 Good -35.562 177.158
 STM05-1 28/02/2005 19:44 Good -35.487 177.129
 STM05-1 28/02/2005 23:01 Unknown -35.399 177.064
 STM05-1 01/03/2005 07:28 Unknown -34.978 177.268
 STM05-1 01/03/2005 18:06 Poor -34.799 177.027
 STM05-1 01/03/2005 18:47 Poor -34.85 177.059
 STM05-2 28/02/2005 12:49 Good -35.928 177.328
 STM05-2 28/02/2005 21:23 Poor -35.926 177.314
 

 library(chron)
 # in next line replace textConnection(Lines) with myfile.dat
 DF - read.table(textConnection(Lines), skip = 1,  as.is = TRUE,
  col.names = c(Id, Date, Time, Quality, Lat, Long))

 DF - transform(DF,
Date = chron(Date, format = d/m/y),
Time = times(paste(Time, 00, sep = :)),
Quality = factor(Quality, levels = c(Good, Poor, Unknown)))

 o - order(DF$Date, as.numeric(DF$Quality), abs(DF$Time - times(12:00:00)))
 DF - DF[o,]

 DF[tapply(row.names(DF), DF$Date, head, 1), ]

 # The last line above could alternately be written like this:
 do.call(rbind, by(DF, DF$Date, head, 1))



 On 8/7/07, Tim Sippel [EMAIL PROTECTED] wrote:
  Thanks Ross.  Yes, upon your remark it seems that maybe I need to be
  selecting first for the best available location, and then by the time stamp
  nearest to 12pm.  It seems like tapply(), or aggregate() or aggregate.ts()
  might be applicable, but I'm working through trying different functions and
  have yet to find what works.  Suggestions on what function to start with
  would be appreciated.
 
 
 
  Tim Sippel (MSc)
 
  School of Biological Sciences
 
  Auckland University
 
  Private Bag 92019
 
  Auckland 1020
 
  New Zealand
 
  +64-9-373-7599 ext. 84589 (work)
 
  +64-9-373-7668 (Fax)
 
  +64-21-593-001 (mobile)
 
 
 
   _
 
  From: Ross Darnell [mailto:[EMAIL PROTECTED]
  Sent: 08 August 2007 12:43
  To: Tim Sippel; r-help@stat.math.ethz.ch
  Subject: RE: [R] Conditional subsetting
 
 
 
 
 
  Assuming *A* animal can only be in one location at any one time, I don't
  understand how once you have selected the location nearest 12pm how you can
  select based on the type of location?
  Do you mean to select on location before timei.e. from the best location
  visited that day which was closest to 12pm?
 
  Perhaps for condition 2  for a