Re: [R] aligning axis labels in a colorkey from levelplot

2012-06-26 Thread Stephen Eglen

Deepayan Sarkar deepayan.sar...@gmail.com writes:


 You can specify a fixed-width fontfamily if that helps:

 levelplot(matrix(seq(4,120,l=9),3,3),
   colorkey = list(at = seq(0, 120, 20),
   labels = list(labels = c('  0',' 20',' 40','
 60',' 80','100','120'),
 fontfamily = courier,
 font = 1)))

Thanks Deepayan; I think I finally found a solution which worked much
easier than I thought:

## Thanks to R graphics, 2nd ed Paul Murrell, page 250 shows how to edit
## an existing plot.
levelplot(matrix(-90:89,20,20))
grid.edit([.]colorkey.labels$, grep=TRUE, just=right,
  global=T, x=unit(0.95, npc))

I can live with adjusting the x position by hand.

Stephen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Drawing (lon,lat) coordinates onto the image of a world

2012-06-26 Thread Steven Winter
Given a set of latitude and longitude coordinates pairs (stored in variables 
latitudevals and longitudevals), I would like to plot them onto the image 
of a equirectangular world map. I would like to plot each coordinate pair with 
a red circle, if possible. Does anyone have any suggestions as to how I go 
about doing this, whether using R or using another program like Google maps?
 
Thank you,
Steve
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] increase the usage of CPU and Memory

2012-06-26 Thread Xi
Dear All,

I have been searching online for help increasing my R code more efficiently
for almost a whole day, however, there is no solution to my case. So if
anyone could give any clue to solve my problem, I would be very appreciate
for you help. Thanks in advance.

Here is my issue:

My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA GTX
480  graphic card, and I am using a 64-bit version of R under 64-bit Windows
.

I am running a for loop to generate a 461*5 matrix data, which is coming
from the coefficients of 5 models. The loop would produce 5 values one
time, and it will run 461 times in total. I have tried to run the code
inside the loop just once, it will cost almost 10 seconds, so if
we intuitively calculate the time of the whole loop will cost, it would be
4610 seconds, equal to almost one and a half hours, which is exactly the
whole loop taking indeed. But I have to run this kinda loop for
30 data-sets!

Although I thought I am using a not-bad at all desktop, I checked the usage
of CPU and memory during my running R code, and found out the whole code
just used 15% of CPU and 10% of memory. Does anyone have the same issue
with me? or Does anyone know some methods to shorten the running time and
increase the usage of CPU and memory?

Many thanks,
Xi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Packaging Error

2012-06-26 Thread Mayank Bansal
I was trying to ByteCompile a package that I made. The package compiles 
successfully with byte compile set to FALSE.
When I set ByteCompile to TRUE, I receive the following error message while 
doing R CMD INSTALL

/usr/lib/R/bin/INSTALL: line 34: 9964 Done echo 'tools:::.install_packages()' 
9965 Segmentation fault | R_DEFAULT_PACKAGES= LC_COLLATE=C ${R_HOME}/bin/R 
$myArgs --slave --args ${args}

I have not been able to understand the problem. Can someone help me understand 
the problem so that it can be fixed?

Thanks,
Mayank



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Save rgl plot3d Graph as Image

2012-06-26 Thread iris
In the rgl package *rgl.postscript *can save 3d scatter plots you have
generated using the plot3d command .

For example
open3d()
  x - sort(rnorm(1000))
  y - rnorm(1000)
  z - rnorm(1000) + atan2(x,y)
  plot3d(x, y, z, col=rainbow(1000))

rgl.postscript(persp3dd.pdf,pdf)

--
View this message in context: 
http://r.789695.n4.nabble.com/Save-rgl-plot3d-Graph-as-Image-tp898351p4634478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop for multiple plots in figure

2012-06-26 Thread Marcel Curlin
This solution works really nicely  I learned much by working through it.
However but I am having trouble with subplot formatting; setting
main=d$Subject results in the correct title over each plot but repeated
multiple times. Also I can't seem to format the axis labels and numbers to
reduce the space between them and the plot. Any more thoughts appreciated. 

revised code:

tC - textConnection(
Subject XvarYvarparam1  param2
bob 9   100 1   100
bob 0   110 1   200
steve   2   250 1   50
bob -5  175 0   35
dave22  260 0   343
bob 3   180 0   74
steve   1   290 1   365
kevin   5   380 1   546
bob 8   185 0   76
dave2   233 0   343
steve   -10 230 0   556
dave-10 233 1   400
steve   -7  250 1   388
dave3   568 0   555
kevin   10  380 0   57
kevin   4   390 0   50
bob 6   115 1   600
)
data - read.table(header=TRUE, tC)
close.connection(tC)
rm(tC)

plot_one - function(d){
 with(d, plot(Xvar, Yvar, t=n, tck=0.02, main=d$Subject, xlim=c(-14,14),
ylim=c(0,600))) # set limits
 with(d[d$param1 == 0,], points(Xvar, Yvar, col = 1)) # first line
 with(d[d$param1 == 1,], points(Xvar, Yvar, col = 2)) # second line

}

par(mfrow=c(2,2))
plyr::d_ply(data, Subject, plot_one)

--
View this message in context: 
http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634482.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drawing (lon,lat) coordinates onto the image of a world

2012-06-26 Thread Pascal Oettli

Hello,

What do you mean by image? A file (jpeg, bmp,...)?

Best Regards

Le 26/06/2012 10:47, Steven Winter a écrit :

Given a set of latitude and longitude coordinates pairs (stored in variables 
latitudevals and longitudevals), I would like to plot them onto the image 
of a equirectangular world map. I would like to plot each coordinate pair with a red circle, if 
possible. Does anyone have any suggestions as to how I go about doing this, whether using R or 
using another program like Google maps?

Thank you,
Steve
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] increase the usage of CPU and Memory

2012-06-26 Thread Prof Brian Ripley

See the vignette for package 'parallel' to make use of your 4 cores.

On 26/06/2012 01:07, Xi wrote:

Dear All,

I have been searching online for help increasing my R code more efficiently
for almost a whole day, however, there is no solution to my case. So if
anyone could give any clue to solve my problem, I would be very appreciate
for you help. Thanks in advance.

Here is my issue:

My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA GTX
480  graphic card, and I am using a 64-bit version of R under 64-bit Windows
.

I am running a for loop to generate a 461*5 matrix data, which is coming
from the coefficients of 5 models. The loop would produce 5 values one
time, and it will run 461 times in total. I have tried to run the code
inside the loop just once, it will cost almost 10 seconds, so if
we intuitively calculate the time of the whole loop will cost, it would be
4610 seconds, equal to almost one and a half hours, which is exactly the
whole loop taking indeed. But I have to run this kinda loop for
30 data-sets!

Although I thought I am using a not-bad at all desktop, I checked the usage
of CPU and memory during my running R code, and found out the whole code
just used 15% of CPU and 10% of memory. Does anyone have the same issue
with me? or Does anyone know some methods to shorten the running time and
increase the usage of CPU and memory?

Many thanks,
Xi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combineLimits and Dates

2012-06-26 Thread Peter Ehlers

On 2012-06-25 15:30, Duncan Mackay wrote:

Hi Elliot

This works on Win 7 ver 2.15

   useOuterStrips(combineLimits(
   xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l',
  scales = list(y = list(relation = free),
x = list( at =seq(from =
as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month),
labels = format(seq(from =
as.Date(2011-01-01), to = as.Date(2011-10-01), by = 3 month), %b))
  ),
  auto.key = TRUE)  ))



This works because the x-limits don't require combining in
this example; all panels have the same xlims.

See below for a solution when the xlims are not equal.



amend the seq as required and the format if required
see ?strptime for format


HTH

Duncan


Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au


At 02:28 26/06/2012, you wrote:

I'm having some trouble using the latticeExtra 'combineLimits' function
with a Date x-variable:

require(lattice)

set.seed(12345)

dates- seq(as.Date(2011-01-01), as.Date(2011-12-31), days)
dat- data.frame(d = rep(dates, 4),
   g = factor(rep(rep(c(1,2), each = length(dates)), 2)),
   h = factor(rep(c(a, b), each = length(dates)*2)),
   x = rnorm(4 * length(dates)),
   y = rnorm(4 * length(dates)))

plt1- xyplot(x + y ~ d | g, groups = h, data = dat, type = 'l', scales =
list(relation = free),
auto.key = TRUE)
plt1- useOuterStrips(plt1)
plt1- combineLimits(plt1)

The x-axis labels are right after the call to 'useOuterStrips' but they get
converted to numeric after the call to 'combineLimits'. How do I keep them
as date labels?


After combineLimits(plt1), the plt1 object will have an
x.limits component that has the dates converted to numeric
form. You can just modify that component with:

  plt1$x.limits - lapply(plt1$x.limits, as.Date, origin = 1970-01-01)

and then plot it.

Peter Ehlers



Thanks.

- Elliot

--
Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC
134 Mount Auburn Street | Cambridge, MA | 02138
Phone: (617) 503-4619 | Email: elliot.bernst...@fdopartners.com

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] increase the usage of CPU and Memory

2012-06-26 Thread Oliver Ruebenacker
 Hello Xi,

  If a program has input or output to disk or network, this may cause
it to wait and not use the available CPU.

  Output is usually buffered, but may cause delay if the buffer gets
full (I'm not sure though whether this is an issue with plenty of
memory available)

 Take care
 Oliver

On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote:
 Dear All,

 I have been searching online for help increasing my R code more efficiently
 for almost a whole day, however, there is no solution to my case. So if
 anyone could give any clue to solve my problem, I would be very appreciate
 for you help. Thanks in advance.

 Here is my issue:

 My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA GTX
 480  graphic card, and I am using a 64-bit version of R under 64-bit Windows
 .

 I am running a for loop to generate a 461*5 matrix data, which is coming
 from the coefficients of 5 models. The loop would produce 5 values one
 time, and it will run 461 times in total. I have tried to run the code
 inside the loop just once, it will cost almost 10 seconds, so if
 we intuitively calculate the time of the whole loop will cost, it would be
 4610 seconds, equal to almost one and a half hours, which is exactly the
 whole loop taking indeed. But I have to run this kinda loop for
 30 data-sets!

 Although I thought I am using a not-bad at all desktop, I checked the usage
 of CPU and memory during my running R code, and found out the whole code
 just used 15% of CPU and 10% of memory. Does anyone have the same issue
 with me? or Does anyone know some methods to shorten the running time and
 increase the usage of CPU and memory?

 Many thanks,
 Xi

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant
President and Founder of Knowomics
(http://www.knowomics.com/wiki/Oliver_Ruebenacker)
Consultant at Predictive Medicine
(http://predmed.com/people/oliverruebenacker.html)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph displays

2012-06-26 Thread MSousa


Good morning,
Thanks for help.
I can explain better what I am trying to do.
I'm trying to read data from a file, separated by a tab, with the following
code.


Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t,
quote=\,header = TRUE)
View(Dataset)
dput(Dataset)

 View(Dataset)
 dput(Dataset)
structure(list(Source = structure(1:3, .Label = c(A, B, C
), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L, 
64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L
), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L, 
61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L
), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s, 
X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s, 
X1s), class = data.frame, row.names = c(NA, -3L))
 Dataset
  Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
1  A 476375   116   125  129  131 131 131 131
2  B 3764451125   19   61 131 186 186
3  C 1762256612   29   91 171 186 186


the idea is to get a graph like this excel, but in R, 
as I'm still in the learning phase of the R, I have little knowledge how to
do

http://imageshack.us/photo/my-images/51/testlt.png/

--
View this message in context: 
http://r.789695.n4.nabble.com/graph-displays-tp4634448p4634488.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in mice

2012-06-26 Thread Anera Salucci
Hi all,

I am imputing missingness of  90 columns  in a  data frame using mice.
But mice gives back :

 Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = 
TRUE,  :   too many (1100) weights

Any idea to solve this error is welcome,

Anera
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compare one field of dataframe with excel sheet using R

2012-06-26 Thread sathya7priya
I have a data frame consisting of three columns(name of compund,ppm and
frequency).Name contains string values .ppm and frequency contains numeric
values with decimal points upto four digits.
I have an excel sheet which is like a library.The first column contains the
name of compounds and remaining column contains the ppm values of the
compound which satisfy certain rules.The number of ppm values varies for
each compound from 4 to 700.
I need to compare the values of ppm from the dataframe and compare it with
the ppm values in excel sheet and give the result if they are similar.

--
View this message in context: 
http://r.789695.n4.nabble.com/compare-one-field-of-dataframe-with-excel-sheet-using-R-tp4634489.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to estimate variance components with lmer for models with random effects and compare them with lme results

2012-06-26 Thread KL
Hi,

I performed an experiment where I raised different families coming from two
different source populations, where each family was split up into a
different treatments. After the experiment I measured several traits on each
individual. 
To test for an effect of either treatment or source as well as their
interaction, I used a linear mixed effect model with family as random
factor, i.e.
lme(fixed=Trait~Treatment*Source,random=~1|Family,method=ML)

so far so good,
Now I have to calculate the relative variance components, i.e. the
percentage of variation that is explained by either treatment or source as
well as the interaction.

Without a random effect, I could easily use the sums of squares (SS) to
calculate the variance explained by each factor. But for a mixed model (with
ML estimation), there are no SS, hence I thought I could use Treatment and
Source as random effects too to estimate the variance, i.e.

lme(fixed=Trait~1,random=~(Treatment*Source)|Family, method=REML)

However, in some cases, lme does not converge, hence I used lmer from the
lme4 package:

lmer(Trait~1+(Treatment*Source|Family),data=DATA)

Where I extract the variances from the model using the summary function:

model-lmer(Trait~1+(Treatment*Source|Family),data=regrexpdat)
results-model@REmat
variances-results[,3]

I get the same values as with the VarCorr function. I use then these values
to calculate the actual percentage of variation taking the sum as the total
variation.

Where I am struggling is with the interpretation of the results from the
initial lme model (with treatment and source as fixed effects) and the
random model to estimate the variance components (with treatment and source
as random effect). I find in most cases that the percentage of variance
explained by each factor does not correspond to the significance of the
fixed effect.

For example for the trait HD,
The initial lme suggests a tendency for the interaction as well as a
significance for Treatment. Using a backward procedure, I find that
Treatment has a close to significant tendency. However, estimating variance
components, I find that Source has the highest variance, making up to 26.7%
of the total variance.

   
anova(lme(fixed=HD~as.factor(Treatment)*as.factor(Source),random=~1|as.factor(Family),method=ML,data=test),type=m)
  numDF denDF  F-value p-value
(Intercept)1   426 0.044523  0.8330
as.factor(Treatment)   1   426 5.935189  0.0153
as.factor(Source)  111 0.042662  0.8401
as.factor(Treatment):as.factor(Source) 1   426 3.754112  0.0533





   
summary(lmer(HD~1+(as.factor(Treatment)*as.factor(Source)|Family),data=regrexpdat))
Linear mixed model fit by REML 
Formula: HD ~ 1 + (as.factor(Treatment) * as.factor(Source) | Family) 
   Data: regrexpdat 
AICBIC logLik deviance REMLdev
 -103.5 -54.43  63.75   -132.5  -127.5
Random effects:
 Groups   Name  Variance  Std.Dev.
Corr 
 Family   (Intercept)   0.0113276 0.106431  

  as.factor(Treatment)  0.0063710 0.079819 
0.405   
  as.factor(Source) 0.0235294 0.153393
-0.134 -0.157
  as.factor(Treatment)L:as.factor(Source)   0.0076353 0.087380
-0.578 -0.589 -0.585 
 Residual   0.0394610 0.198648  

Number of obs: 441, groups: Family, 13

Fixed effects:
Estimate Std. Error t value
(Intercept) -0.027400.03237  -0.846



Hence my question is, is it correct what I am doing? Or should I use another
way to estimate the amount of variance explained by each factor (i.e.
Treatment, Source and their interaction). For example, would the effect
sizes be a more appropriate way to go?


Thanks!

Kay Lucek


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-estimate-variance-components-with-lmer-for-models-with-random-effects-and-compare-them-with-ls-tp4634492.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MuMIn - assessing variable importance following model averaging, z-stats/p-values or CI?

2012-06-26 Thread Robertson, Andrew
Dear R users,

Recent changes to the MuMIn package now means that the model averaging command 
(model.avg) no longer returns confidence intervals, but instead returns zvalues 
and corresponding pvalues for fixed effects included in models.

Previously I have used this package for model selection/averaging following 
Greuber et al (2011) where it suggests that one should use confidence intervals 
from model averaging to assess whether your fixed effects have an affect or not 
 (If confidence intervals do not span zero then variable has an affect).

Can anyone tell me why MuMIn now gives z-stats and p-values and whether these 
should be used to assess the 'significance'/importance of variables when model 
averaging?

Heres the example code of what I'm doing

#-#
ps-lmer(tranPS~(
Sex+
Age.Cat2+
TOTAL+
Propfarm+
Maize+
TOTAL:Propfarm+
Maize:TOTAL+
Maize:Propfarm+
(1|Socialgroup)+(1|Year)+(1|Tattoo)),REML=FALSE, data=propspec)

pss-standardize(ps,standardize.y = FALSE)

psdrg-dredge(pss)

summary(model.avg(get.models(psdrg,subset=delta2)))
#-#

REf -Grueber, C.E., Nakagawa, S., Laws, R.J.  Jamieson, I.G. (2011) Multimodel 
inference in ecology and evolution: challenges and solutions. Journal of 
evolutionary biology, 24, 699-711.

Any help would be much appreciated

Regards

Andrew Robertson
PhD student
Centre for Ecology and Conservation
University of Exeter, Cornwall Campus
Tremough, Cornwall. TR10 9EZ
UK
Tel: 01326 371852
Email: ar...@exeter.ac.uk
Web page: 
http://biosciences.exeter.ac.uk/staff/postgradresearch/andrewrobertson/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop for multiple plots in figure

2012-06-26 Thread baptiste auguie
Try this alternative solution using only base functions:

# split the data into 4 data.frames
l - split(data, data$Subject)
names(l)

# set up the graph parameters
par(mfrow=n2mfrow(length(l)), mar=c(4,4,1,1), mgp = c(2, 1, 0))
# good old for loop over the subject names
for( n in names(l)){
  d - l[[n]] # temporary data.frame for convenience
  with(d, plot(Xvar, Yvar, t=n)) # set limits
  with(d[d$param1 == 0,], lines(Xvar, Yvar, lty=1)) # first line
  with(d[d$param1 == 1,], lines(Xvar, Yvar, lty=2)) # second line
  title(n) # here n is just a string
}


HTH,

b.



On 25 June 2012 23:45, Marcel Curlin cemar...@u.washington.edu wrote:
 This solution works really nicely  I learned much by working through it.
 However but I am having trouble with subplot formatting; setting
 main=d$Subject results in the correct title over each plot but repeated
 multiple times. Also I can't seem to format the axis labels and numbers to
 reduce the space between them and the plot. Any more thoughts appreciated.

 revised code:

 tC - textConnection(
 Subject Xvar    Yvar    param1  param2
 bob     9       100     1       100
 bob     0       110     1       200
 steve   2       250     1       50
 bob     -5      175     0       35
 dave    22      260     0       343
 bob     3       180     0       74
 steve   1       290     1       365
 kevin   5       380     1       546
 bob     8       185     0       76
 dave    2       233     0       343
 steve   -10     230     0       556
 dave    -10     233     1       400
 steve   -7      250     1       388
 dave    3       568     0       555
 kevin   10      380     0       57
 kevin   4       390     0       50
 bob     6       115     1       600
 )
 data - read.table(header=TRUE, tC)
 close.connection(tC)
 rm(tC)

 plot_one - function(d){
  with(d, plot(Xvar, Yvar, t=n, tck=0.02, main=d$Subject, xlim=c(-14,14),
 ylim=c(0,600))) # set limits
  with(d[d$param1 == 0,], points(Xvar, Yvar, col = 1)) # first line
  with(d[d$param1 == 1,], points(Xvar, Yvar, col = 2)) # second line

 }

 par(mfrow=c(2,2))
 plyr::d_ply(data, Subject, plot_one)

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Loop-for-multiple-plots-in-figure-tp4634390p4634482.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] clean Email format data

2012-06-26 Thread climmi
Dear all 

I am now going to do some text analysis using R. 
However, the data is very noisy that I need to clean it first.
I don't have much experience in the text cleaning process.   Is anyone would
provide help on this?
If you are able to provide some similar code which was done before would be
greatly appreciated.

May content is mainly the Feedback data through 
*Phone call record*:  (usally the structure looks like the below one)
*Email:*   the common email corresponding , usually got a lot of history ,
and also some footnote such as if you are not the intended reciepient... 
etal..

I know it's quite a complex problem and can not be solved by a single
answer,so,  some tips is also very good, I will .. 


One example of the data: 



#
Fyna.   

g-cc...@adfae.com
 24/06/2012 09:15 AM  
Tog-cc...@adfae.com  
cc g-cc...@adfae.com  
Subject ase Mewrr asdffID:dde_20120624_15988015_11653024 *  (keep
this part)*


CUSTOMER DETAILS Name  : Mr dffa  
Company :  da
Address :  ff
Home No. :  
Office No. : 
Payphone Ext :  
Mobile No. :  
Fax No. :  
Email :  
CASE DETAILS Division : * dsaf (RIM) (keep this part)*
Category 1 : * dsaf (RIM) (keep this part)*
Category 2 : * dsaf (RIM) (keep this part)*
Category 3 :   
Veh Reg Num :   
COMMENTS  24/06/2012 09:15:23 AM (Name) -  Location @Ddaferdsdaf Rd   



Caller feedback Content.. (*This part I need to keep*)


NFORMANT STATES 
Date  Time : 24/06/2012 09:15:31 AM  
CSO ID : dasf  


https://MSCCasdfEB/LsdfA/Madsf.htm?pardsnDc?0pAsdoE9.=cS0eiIcp9m


--
View this message in context: 
http://r.789695.n4.nabble.com/clean-Email-format-data-tp4634491.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in mice

2012-06-26 Thread Prof Brian Ripley

On 26/06/2012 08:59, Anera Salucci wrote:

Hi all,

I am imputing missingness of  90 columns  in a  data frame using mice.
But mice gives back :

  Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = 
TRUE,  :   too many (1100) weights

Any idea to solve this error is welcome,


See ?nnet (in package nnet).



Anera
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to estimate variance components with lmer for models with random effects and compare them with lme results

2012-06-26 Thread Bert Gunter
1. This is not an R question; it is a statistical issue.

2. R-sig-mixed-models is the appropriate list, not r-help.

-- Bert

On Tue, Jun 26, 2012 at 3:28 AM, KL sticklena...@gmail.com wrote:

 Hi,

 I performed an experiment where I raised different families coming from two
 different source populations, where each family was split up into a
 different treatments. After the experiment I measured several traits on
 each
 individual.
 To test for an effect of either treatment or source as well as their
 interaction, I used a linear mixed effect model with family as random
 factor, i.e.
 lme(fixed=Trait~Treatment*Source,random=~1|Family,method=ML)

 so far so good,
 Now I have to calculate the relative variance components, i.e. the
 percentage of variation that is explained by either treatment or source as
 well as the interaction.

 Without a random effect, I could easily use the sums of squares (SS) to
 calculate the variance explained by each factor. But for a mixed model
 (with
 ML estimation), there are no SS, hence I thought I could use Treatment and
 Source as random effects too to estimate the variance, i.e.

 lme(fixed=Trait~1,random=~(Treatment*Source)|Family, method=REML)

 However, in some cases, lme does not converge, hence I used lmer from the
 lme4 package:

 lmer(Trait~1+(Treatment*Source|Family),data=DATA)

 Where I extract the variances from the model using the summary function:

 model-lmer(Trait~1+(Treatment*Source|Family),data=regrexpdat)
 results-model@REmat
 variances-results[,3]

 I get the same values as with the VarCorr function. I use then these values
 to calculate the actual percentage of variation taking the sum as the total
 variation.

 Where I am struggling is with the interpretation of the results from the
 initial lme model (with treatment and source as fixed effects) and the
 random model to estimate the variance components (with treatment and source
 as random effect). I find in most cases that the percentage of variance
 explained by each factor does not correspond to the significance of the
 fixed effect.

 For example for the trait HD,
 The initial lme suggests a tendency for the interaction as well as a
 significance for Treatment. Using a backward procedure, I find that
 Treatment has a close to significant tendency. However, estimating variance
 components, I find that Source has the highest variance, making up to 26.7%
 of the total variance.



 anova(lme(fixed=HD~as.factor(Treatment)*as.factor(Source),random=~1|as.factor(Family),method=ML,data=test),type=m)
  numDF denDF  F-value p-value
(Intercept)1   426 0.044523  0.8330
as.factor(Treatment)   1   426 5.935189  0.0153
as.factor(Source)  111 0.042662  0.8401
as.factor(Treatment):as.factor(Source) 1   426 3.754112  0.0533







 summary(lmer(HD~1+(as.factor(Treatment)*as.factor(Source)|Family),data=regrexpdat))
Linear mixed model fit by REML
Formula: HD ~ 1 + (as.factor(Treatment) * as.factor(Source) | Family)
   Data: regrexpdat
AICBIC logLik deviance REMLdev
 -103.5 -54.43  63.75   -132.5  -127.5
Random effects:
 Groups   Name  Variance  Std.Dev.
 Corr
 Family   (Intercept)   0.0113276 0.106431
  as.factor(Treatment)  0.0063710 0.079819
 0.405
  as.factor(Source) 0.0235294 0.153393
 -0.134 -0.157
  as.factor(Treatment)L:as.factor(Source)   0.0076353 0.087380
 -0.578 -0.589 -0.585
 Residual   0.0394610 0.198648
Number of obs: 441, groups: Family, 13

Fixed effects:
Estimate Std. Error t value
(Intercept) -0.027400.03237  -0.846



 Hence my question is, is it correct what I am doing? Or should I use
 another
 way to estimate the amount of variance explained by each factor (i.e.
 Treatment, Source and their interaction). For example, would the effect
 sizes be a more appropriate way to go?


 Thanks!

 Kay Lucek


 --
 View this message in context:
 http://r.789695.n4.nabble.com/How-to-estimate-variance-components-with-lmer-for-models-with-random-effects-and-compare-them-with-ls-tp4634492.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__

Re: [R] graph displays

2012-06-26 Thread Jim Lemon

On 06/26/2012 06:24 PM, MSousa wrote:



Good morning,
Thanks for help.
I can explain better what I am trying to do.
I'm trying to read data from a file, separated by a tab, with the following
code.


Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t,
quote=\,header = TRUE)
View(Dataset)
dput(Dataset)


View(Dataset)
dput(Dataset)

structure(list(Source = structure(1:3, .Label = c(A, B, C
), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L,
64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L
), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L,
61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L
), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s,
X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s,
X1s), class = data.frame, row.names = c(NA, -3L))

Dataset

   Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
1  A 476375   116   125  129  131 131 131 131
2  B 3764451125   19   61 131 186 186
3  C 1762256612   29   91 171 186 186


the idea is to get a graph like this excel, but in R,
as I'm still in the learning phase of the R, I have little knowledge how to
do

http://imageshack.us/photo/my-images/51/testlt.png/


Hi MSousa,
Try this:

library(plotrix)
barp(Dataset[,-1],names.arg=rep(,10),col=2:4)
staxlab(1,at=1:10,labels=names(Dataset)[-1])
legend(2,170,Dataset$Source,fill=2:4)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drawing (lon,lat) coordinates onto the image of a world

2012-06-26 Thread Sarah Goslee
Hi Steve,

On Mon, Jun 25, 2012 at 9:47 PM, Steven Winter stevenwinte...@yahoo.com wrote:
 Given a set of latitude and longitude coordinates pairs (stored in variables 
 latitudevals and longitudevals), I would like to plot them onto the image 
 of a equirectangular world map. I would like to plot each coordinate pair 
 with a red circle, if possible. Does anyone have any suggestions as to how I 
 go about doing this, whether using R or using another program like Google 
 maps?

This might help:

library(maps)
map(world)
lon - c(-75, -70, 10)
lat - c(42, -45, 50)
points(lon, lat, col=red, pch=19)

Sarah


 Thank you,
 Steve
        [[alternative HTML version deleted]]



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compare one field of dataframe with excel sheet using R

2012-06-26 Thread Jean V Adams
It would help if you provided an example for your data frame, and example 
for your spreadsheet, and more information on how to judge if the ppm 
values are similar.  Maybe this code will help you get started ...

# Here's an example data frame
mydf - data.frame(
compound=letters[1:10], 
ppm=abs(round(rnorm(10), 4)),
frequency=abs(round(rnorm(10), 4)))

# Here's an example data frame representing data from your spreadsheet
# You can read the data from the spreadsheet into R using the package 
XLConnect
# library(XLConnect)
# mysheet - readWorksheet(loadWorkbook(C:\\Temp\\Compounds.xlsx), 
sheet=Sheet1, startRow=1)
mysheet - data.frame(
compound=letters[sample(1:10, 100, replace=TRUE)],
libppm=abs(round(rnorm(100), 4)))

# combine the two example data frames
both - merge(mydf, mysheet)

# list the compounds in mydf that had ppm values within 0.1 of those in 
the spreadsheet
both$diff - abs(both$ppm-both$libppm)
both[both$diff0.1, ]

Jean


sathya7priya sathya7pr...@gmail.com wrote on 06/26/2012 03:34:22 AM:

 I have a data frame consisting of three columns(name of compund,ppm and
 frequency).Name contains string values .ppm and frequency contains 
numeric
 values with decimal points upto four digits.
 I have an excel sheet which is like a library.The first column contains 
the
 name of compounds and remaining column contains the ppm values of the
 compound which satisfy certain rules.The number of ppm values varies for
 each compound from 4 to 700.
 I need to compare the values of ppm from the dataframe and compare it 
with
 the ppm values in excel sheet and give the result if they are similar.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance level (p) for t-value in package zelig

2012-06-26 Thread Rune Haubo
My point was just that the situation in a cumulative link model is not
much different from a binomial glm - the binomial glm is even a
special case of the clm with only two response categories. And just
like summary(glm(, family=binomial)) reports z-values and computes
p-values by using the normal distribution as reference, one can do the
same in a cumulative link model by applying the same asymptotic
arguments.

In both models the variance is determined implicitly by the mean, so a
t-distribution is never involved.

Cheers,
Rune

On 25 June 2012 11:05, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On 25/06/2012 09:32, Rune Haubo wrote:

 According to standard likelihood theory these are actually not
 t-values, but z-values, i.e., they asymptotically follow a standard
 normal distribution under the null hypothesis. This means that you


 Whose 'standard'?

 It is conventional to call a value of t-like statistic (i.e. a ratio of the
 form value/standard error) a 't-value'.  And that is nothing to do with
 'likelihood theory' (t statistics predate the term 'likelihood'!).

 The separate issue is whether a t statistic is even approximately
 t-distributed (and if so, on what df?), and another is if it is
 asymptotically normal.  For the latter you have to say what you mean by
 'asymptotic': we have lost a lot of the context, but as this does not appear
 to be IID univariate observations:

 - 'standard likelihood theory' is unlikely to apply.

 - standard asymptotics may well not be a good approximation (in regression
 modelling, people tend to fit more complex models to large datasets, which
 is often why a large dataset was collected).

 - even for IID observations the derivation of the t distribution assumes
 normality.

 The difference between a t distribution and a normal distribution is
 practically insignificant unless the df is small.   And if the df is small,
 one can rarely rely on the CLT for approximate normality 


 could use pnorm instead of pt to get the p-values, but an easier
 solution is probably to use the clm-function (for Cumulative Link
 Models) from the ordinal package - here you get the p-values
 automatically.

 Cheers,
 Rune

 On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote:

 This advice is almost certainly false!

 A t-statistic can be calculated, but the distribution will not
 necessarily be student's t nor will the df be those of the rse.  See,
 for
 example, rlm() in MASS, where values of the t-statistic are given without
 p
 values. If Brian Ripley says that p values cannot be straightforwardly
 calculated by pt(), then believe it!

 -- Bert

 On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote:

 Michael,

 Try

 ?pt

 Best
 Ozgur

 --
 View this message in context:

 http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






 --
 Brian D. Ripley,                  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,             Tel:  +44 1865 272861 (self)
 1 South Parks Road,                     +44 1865 272866 (PA)
 Oxford OX1 3TG, UK                Fax:  +44 1865 272595


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Rune Haubo Bojesen Christensen

Ph.D. Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54

DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK-2800 Kgs. Lyngby, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph displays

2012-06-26 Thread John Kane
Sorry I misunderstood what you wanted.   Using ggplot2 and reshape2 which I 
imagine you will have to install, this should give you what you want

library(ggplot2)
library(reshape2)

xx1  -  melt(Dataset, id = c(Source))

p  -  ggplot( xx1 , aes(variable, value, fill= Source   )) +
geom_bar(position = dodge) +
   scale_y_continuous( Scale Values) +
   scale_x_discrete(X values) +
   opts( title = Graphing Exercise)
 
p




John Kane
Kingston ON Canada


 -Original Message-
 From: ricardosousa2...@clix.pt
 Sent: Tue, 26 Jun 2012 01:24:17 -0700 (PDT)
 To: r-help@r-project.org
 Subject: Re: [R] graph displays
 
 
 
 Good morning,
 Thanks for help.
 I can explain better what I am trying to do.
 I'm trying to read data from a file, separated by a tab, with the
 following
 code.
 
 
 Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t,
 quote=\,header = TRUE)
 View(Dataset)
 dput(Dataset)
 
 View(Dataset)
 dput(Dataset)
 structure(list(Source = structure(1:3, .Label = c(A, B, C
 ), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L,
 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L
 ), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L,
 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L
 ), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s,
 X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s,
 X1s), class = data.frame, row.names = c(NA, -3L))
 Dataset
   Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
 1  A 476375   116   125  129  131 131 131 131
 2  B 3764451125   19   61 131 186 186
 3  C 1762256612   29   91 171 186 186
 
 
 the idea is to get a graph like this excel, but in R,
 as I'm still in the learning phase of the R, I have little knowledge how
 to
 do
 
 http://imageshack.us/photo/my-images/51/testlt.png/
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/graph-displays-tp4634448p4634488.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Receive Notifications of Incoming Messages
Easily monitor multiple email accounts  access them with a click.
Visit http://www.inbox.com/notifier and check it out!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rrdf package for mac not working

2012-06-26 Thread Uwe Ligges

Please contact the package maintainer.

Best,
Uwe Ligges

On 26.06.2012 00:41, Ricardo Pietrobon wrote:

rrdf is incredibly helpful, but I've notice that the rrdf package for mac
hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there
is still a plan to maintain that in the long run, or if there is some other
alternative to read RDF files.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rrdf package for mac not working

2012-06-26 Thread Uwe Ligges



On 26.06.2012 00:41, Ricardo Pietrobon wrote:

rrdf is incredibly helpful, but I've notice that the rrdf package for mac
hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there
is still a plan to maintain that in the long run, or if there is some other
alternative to read RDF files.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MuMIn - assessing variable importance following model averaging, z-stats/p-values or CI?

2012-06-26 Thread Uwe Ligges

Please contact the package maintainer.

Best,
Uwe Ligges

On 26.06.2012 12:46, Robertson, Andrew wrote:

Dear R users,

Recent changes to the MuMIn package now means that the model averaging command 
(model.avg) no longer returns confidence intervals, but instead returns zvalues 
and corresponding pvalues for fixed effects included in models.

Previously I have used this package for model selection/averaging following 
Greuber et al (2011) where it suggests that one should use confidence intervals 
from model averaging to assess whether your fixed effects have an affect or not 
 (If confidence intervals do not span zero then variable has an affect).

Can anyone tell me why MuMIn now gives z-stats and p-values and whether these 
should be used to assess the 'significance'/importance of variables when model 
averaging?

Heres the example code of what I'm doing

#-#
ps-lmer(tranPS~(
 Sex+
 Age.Cat2+
 TOTAL+
 Propfarm+
 Maize+
 TOTAL:Propfarm+
 Maize:TOTAL+
 Maize:Propfarm+
 (1|Socialgroup)+(1|Year)+(1|Tattoo)),REML=FALSE, data=propspec)

pss-standardize(ps,standardize.y = FALSE)

psdrg-dredge(pss)

summary(model.avg(get.models(psdrg,subset=delta2)))
#-#

REf -Grueber, C.E., Nakagawa, S., Laws, R.J.  Jamieson, I.G. (2011) Multimodel 
inference in ecology and evolution: challenges and solutions. Journal of 
evolutionary biology, 24, 699-711.

Any help would be much appreciated

Regards

Andrew Robertson
PhD student
Centre for Ecology and Conservation
University of Exeter, Cornwall Campus
Tremough, Cornwall. TR10 9EZ
UK
Tel: 01326 371852
Email: ar...@exeter.ac.uk
Web page: 
http://biosciences.exeter.ac.uk/staff/postgradresearch/andrewrobertson/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packaging Error

2012-06-26 Thread Uwe Ligges



On 26.06.2012 08:54, Mayank Bansal wrote:

I was trying to ByteCompile a package that I made. The package compiles 
successfully with byte compile set to FALSE.
When I set ByteCompile to TRUE, I receive the following error message while 
doing R CMD INSTALL

/usr/lib/R/bin/INSTALL: line 34: 9964 Done echo 'tools:::.install_packages()' 9965 
Segmentation fault | R_DEFAULT_PACKAGES= LC_COLLATE=C ${R_HOME}/bin/R $myArgs 
--slave --args ${args}

I have not been able to understand the problem. Can someone help me understand 
the problem so that it can be fixed?



Not without your package to try it out.

Best,
Uwe Ligges



Thanks,
Mayank



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting two histograms on one plot with hist function

2012-06-26 Thread Blignaut, M, Mej mb...@sun.ac.za
I would like to plot two data sets (frequency (y-axis) of mean values for 
0-1(x=axis)) on a single histogram for comparison. The hist() only allow the 
overlay of two histograms, and although barplot() allows beside=TRUE, it does 
not show frequency values (like hist) but rather all of the values. Is there 
any way that I can use the hist() to plot two data sets similar to the 
barplot(). Any help or advice will be appreciated!

Kind regards,
Marguerite





  
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd 
wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u 
nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u 
hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook 
asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. 
Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of 
uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige l?ers 
aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread achaumont
Hello, 

I have a question about the “plot.predict” function in Frank Harrell's rms
package.
Do you know how to superpose in the same graph the prediction curve of ols
and raw data points?
Put most simply, I would like to combine these two graphs:

  fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE)
 p - Predict(fit_linear,x2,conf.int=FALSE)
 plot (p, ylim =c(-2,0.5), xlim=c(0,100))  # graph n°1

 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)  
 # graph n°2

Thanks all, 

Agnès



--
View this message in context: 
http://r.789695.n4.nabble.com/rms-package-superposition-prediction-curve-of-ols-and-data-points-tp4634503.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shapiro.test()

2012-06-26 Thread reso

Hey,
today I wanted to use the shapiro.test() on data containing 3  
numerical values per group.

It is the first time that an NA was given back for some of the groups.
In the follwing an example of code and output is shown:



shapiro.test(c(0.000637806, 0.00175561, 0.001196708))


Shapiro-Wilk normality test

data:  c(0.000637806, 0.00175561, 0.001196708)
W = 1, p-value = NA

I am not able to find the bug in our data, so I think there might be a  
problem with the shapiro.test().


I use the following technical background:

platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  14.1
year   2011
month  12
day22
svn rev57956
language   R
version.string R version 2.14.1 (2011-12-22)


Thanks,
Judith

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove empty levels in subset

2012-06-26 Thread svo
Hi,

I have exactly the same question (how to remove empty levels in my subset),
but in my case the factor command does not work, because my dataframe is not
atomic

 Try this:

 test2$a - factor(test2$a)


R gives me the error message:

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

Do you have advice?

Thank you

--
View this message in context: 
http://r.789695.n4.nabble.com/Remove-empty-levels-in-subset-tp873967p4634499.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Intersection

2012-06-26 Thread Васильченко Александр
Hello.
I have a problem with 2 dataframes. There are 2 columns - value and
dates. These dataframes have different dimension. Some dates coincide.
And I need to intersect them by dates and have on output two dataframes
with identical columns dates and new dimension . value have to
recieve in compliance with dates.
Regards, Aleksander.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread Sarah Goslee
You could use points() instead of plot() for the second command.

Sarah

On Tue, Jun 26, 2012 at 8:37 AM, achaumont agnes.chaum...@live.be wrote:
 Hello,

 I have a question about the “plot.predict” function in Frank Harrell's rms
 package.
 Do you know how to superpose in the same graph the prediction curve of ols
 and raw data points?
 Put most simply, I would like to combine these two graphs:

  fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE)
 p - Predict(fit_linear,x2,conf.int=FALSE)
 plot (p, ylim =c(-2,0.5), xlim=c(0,100))              # graph n°1

 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)
 # graph n°2

 Thanks all,

 Agnès





-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Intersection

2012-06-26 Thread Sarah Goslee
That sounds like a job for merge(), but it's hard to be sure because
you didn't provide the information requested in the posting guide.

Sarah

On Tue, Jun 26, 2012 at 11:03 AM, Васильченко Александр
vasilchenko@gmail.com wrote:
 Hello.
 I have a problem with 2 dataframes. There are 2 columns - value and
 dates. These dataframes have different dimension. Some dates coincide.
 And I need to intersect them by dates and have on output two dataframes
 with identical columns dates and new dimension . value have to
 recieve in compliance with dates.
 Regards, Aleksander.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove empty levels in subset

2012-06-26 Thread Sarah Goslee
Hi,

On Tue, Jun 26, 2012 at 8:06 AM, svo s.vanom...@uu.nl wrote:
 Hi,

 I have exactly the same question (how to remove empty levels in my subset),
 but in my case the factor command does not work, because my dataframe is not
 atomic

 Try this:

 test2$a - factor(test2$a)


 R gives me the error message:

 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

 Do you have advice?

I have two pieces of advice.

1. Don't try to use factor() on your entire data frame, but only on a
single column at a time, as shown in the example you included.

2. Provide an example of your data using something like
dput(head(mydata, 10)) so we can offer actual working code.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data.table vs plyr reg output

2012-06-26 Thread Geoffrey Smith
Hello.  The data.table package is very helpful in terms of speed.  But I am
having trouble actually using the output from linear regression.  Is there
any way to get the data.table output to be as pretty/useful as that from
the plyr package?  Below is an example.

library('data.table');
library('plyr');

REG - data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)),
y=rnorm(15), x=rnorm(15), z=rnorm(15));
REG;

#The ddply function from the plyr package produces very neat and useful
output;
ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x)));

#The data.table output is fast, but not very neat (in terms of the order of
the coefficient estimates).  Is there any way to get the data.table output
to look more like the plyr/ddply output (without making a list for each
coef and running the regression two times)?
REG[, coef(lm(y ~ x + z)), by=ID];

Thank you!  Geoff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Intersection

2012-06-26 Thread andrija djurovic
Hi. Try with following functions:

?intersection
?%in%
?[

Perhaps someone will provide you more help if you read and follow posting
guide  
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html

Andrija

On Tue, Jun 26, 2012 at 5:03 PM, ÷ÁÓÉÌØÞÅÎËÏ áÌÅËÓÁÎÄÒ 
vasilchenko@gmail.com wrote:

 Hello.
 I have a problem with 2 dataframes. There are 2 columns - value and
 dates. These dataframes have different dimension. Some dates coincide.
 And I need to intersect them by dates and have on output two dataframes
 with identical columns dates and new dimension . value have to
 recieve in compliance with dates.
 Regards, Aleksander.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test()

2012-06-26 Thread Özgür Asar
See

?shapiro.test

...the number of non-missing values must be between 3 and 5000.

By the way, how reasonable testing normality of 3 values?

Best
ozgur

--
View this message in context: 
http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634520.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting two histograms on one plot with hist function

2012-06-26 Thread John Kane
Why not just plot the two histograms on the same scale in a 2 panel plot?

John Kane
Kingston ON Canada


 -Original Message-
 From: mb...@sun.ac.za
 Sent: Tue, 26 Jun 2012 15:24:55 +0200
 To: r-help@r-project.org
 Subject: [R] plotting two histograms on one plot with hist function
 
 I would like to plot two data sets (frequency (y-axis) of mean values for
 0-1(x=axis)) on a single histogram for comparison. The hist() only allow
 the overlay of two histograms, and although barplot() allows beside=TRUE,
 it does not show frequency values (like hist) but rather all of the
 values. Is there any way that I can use the hist() to plot two data sets
 similar to the barplot(). Any help or advice will be appreciated!
 
 Kind regards,
 Marguerite
 
 
 
 
 
   
 E-pos vrywaringsklousule
 
 Hierdie e-pos mag vertroulike inligting bevat en mag regtens
 geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit
 geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u
 hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik,
 versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per
 telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie
 aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit
 hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos
 nie.
 
 E-mail disclaimer
 
 This e-mail may contain confidential information and may...{{dropped:11}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread David Winsemius


On Jun 26, 2012, at 11:29 AM, Sarah Goslee wrote:


You could use points() instead of plot() for the second command.



Ummm. Maybe not. I think think that plot.Predict uses lattice  
graphics. You may need to use trellis.focus() followed by lpoints().  
Or use the + operation with suitable objects.


--
David.




Sarah

On Tue, Jun 26, 2012 at 8:37 AM, achaumont agnes.chaum...@live.be  
wrote:

Hello,

I have a question about the “plot.predict” function in Frank  
Harrell's rms

package.
Do you know how to superpose in the same graph the prediction curve  
of ols

and raw data points?
Put most simply, I would like to combine these two graphs:

 fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE,  
y=TRUE)

p - Predict(fit_linear,x2,conf.int=FALSE)
plot (p, ylim =c(-2,0.5), xlim=c(0,100))  # graph n°1


z - plot  
(x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)

# graph n°2


Thanks all,

Agnès






--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting two histograms on one plot with hist function

2012-06-26 Thread ilai
On Tue, Jun 26, 2012 at 10:02 AM, John Kane jrkrid...@inbox.com wrote:

 Why not just plot the two histograms on the same scale in a 2 panel plot?


I think OP request was for comparison. Two panels may do, but why not a
barplot of the histograms in the same panel ?

barplot( rbind(
hist(rbeta(30,2,4),breaks=seq(0,1,.1),plot=F)$counts,
hist(rbeta(30,6,8),breaks=seq(0,1,.1),plot=F)$counts),
beside=T)

see str(hist(yourdata)) or ?hist

Cheers
Ilai



 John Kane
 Kingston ON Canada


  -Original Message-
  From: mb...@sun.ac.za
  Sent: Tue, 26 Jun 2012 15:24:55 +0200
  To: r-help@r-project.org
  Subject: [R] plotting two histograms on one plot with hist function
 
  I would like to plot two data sets (frequency (y-axis) of mean values for
  0-1(x=axis)) on a single histogram for comparison. The hist() only allow
  the overlay of two histograms, and although barplot() allows beside=TRUE,
  it does not show frequency values (like hist) but rather all of the
  values. Is there any way that I can use the hist() to plot two data sets
  similar to the barplot(). Any help or advice will be appreciated!
 
  Kind regards,
  Marguerite
 
 
 
 
 

  E-pos vrywaringsklousule
 
  Hierdie e-pos mag vertroulike inligting bevat en mag regtens
  geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit
  geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u
  hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik,
  versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per
  telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie
  aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit
  hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos
  nie.
 
  E-mail disclaimer
 
  This e-mail may contain confidential information and may...{{dropped:11}}
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and
 most webmails

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test()

2012-06-26 Thread Özgür Asar
Actually, your sample size is 3. Sorry for that.

Ozgur

--
View this message in context: 
http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634525.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] increase the usage of CPU and Memory

2012-06-26 Thread Christofer Bogaso

On 26-06-2012 16:33, Oliver Ruebenacker wrote:

  Hello Xi,

   If a program has input or output to disk or network, this may cause
it to wait and not use the available CPU.

   Output is usually buffered, but may cause delay if the buffer gets
full (I'm not sure though whether this is an issue with plenty of
memory available)

  Take care
  Oliver

On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote:

Dear All,

I have been searching online for help increasing my R code more efficiently
for almost a whole day, however, there is no solution to my case. So if
anyone could give any clue to solve my problem, I would be very appreciate
for you help. Thanks in advance.

Here is my issue:

My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA GTX
480  graphic card, and I am using a 64-bit version of R under 64-bit Windows
.

I am running a for loop to generate a 461*5 matrix data, which is coming
from the coefficients of 5 models. The loop would produce 5 values one
time, and it will run 461 times in total. I have tried to run the code
inside the loop just once, it will cost almost 10 seconds, so if
we intuitively calculate the time of the whole loop will cost, it would be
4610 seconds, equal to almost one and a half hours, which is exactly the
whole loop taking indeed. But I have to run this kinda loop for
30 data-sets!

Although I thought I am using a not-bad at all desktop, I checked the usage
of CPU and memory during my running R code, and found out the whole code
just used 15% of CPU and 10% of memory. Does anyone have the same issue
with me? or Does anyone know some methods to shorten the running time and
increase the usage of CPU and memory?

Many thanks,
Xi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Hi Oliver, can you please give some details on what you are meaning by 
'Output is usually buffered'?


Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Intersection

2012-06-26 Thread arun
Hi,

Try this:
 dat1-data.frame(value=c(15,20,25,30,45,50),dates=c(2005-05-25,2005-06-25,2005-07-25,2005-08-25,2005-09-25,2005-10-25))
dat2-data.frame(value=c(15,20,25,50),dates=c(2005-05-25,2005-06-25,2005-07-25,2005-10-25))
 merge(dat1,dat2, by=dates)
   dates value.x value.y
1 2005-05-25  15  15
2 2005-06-25  20  20
3 2005-07-25  25  25
4 2005-10-25  50  50
or
subset(dat1,(dates %in% dat2$dates))
  value  dates
1    15 2005-05-25
2    20 2005-06-25
3    25 2005-07-25
6    50 2005-10-25

I hope this is what you meant.  You mentioned the datasets have different 
dimensions.  Not sure what you meant.

A.K.



- Original Message -
From: Васильченко Александр vasilchenko@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Tuesday, June 26, 2012 11:03 AM
Subject: [R] Intersection

Hello.
I have a problem with 2 dataframes. There are 2 columns - value and
dates. These dataframes have different dimension. Some dates coincide.
And I need to intersect them by dates and have on output two dataframes
with identical columns dates and new dimension . value have to
recieve in compliance with dates.
Regards, Aleksander.

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How To Setup hunspell in R

2012-06-26 Thread Ribbis
Do you make any progress in solving this?  I'm having the same struggle. 
Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/How-To-Setup-hunspell-in-R-tp4541801p4634523.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ljung-Box test (Box.test)

2012-06-26 Thread Steven Winter
I fit a simple linear model y = bX to a data set today, and that produced 24 
residuals (I have 24 data points, one for each year from 1984-2007). I would 
like to test the time-independence of the residuals of my model, and I was 
recommended by my supervisor to use the Ljung-Box test. The Box.test function 
in R takes 4 arguments: 

x a numeric vector or univariate time series. 
lag the statistic will be based on lag autocorrelation
coefficients. 
type test to be performed: partial matching is used. 
fitdf number of degrees of freedom to be subtracted if x is a series of 
residuals. 

Unfortunately, I never took a statistics class where I learned the Ljung-Box 
test, and information about it online is hard to find. What does lag mean, 
and what value would you guys recommend I use for the test? Also, what does 
fitdf represent, and what would the value for that parameter be in my case? 
Finally, the value of x is a vector of my 24 residuals, correct?

Thank you all so much. I apologize for the basic nature of the question.

Steven
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapiro.test()

2012-06-26 Thread peter dalgaard

On Jun 26, 2012, at 16:43 , r...@uni-potsdam.de wrote:

 Hey,
 today I wanted to use the shapiro.test() on data containing 3 numerical 
 values per group.
 It is the first time that an NA was given back for some of the groups.
 In the follwing an example of code and output is shown:
 
 
 shapiro.test(c(0.000637806, 0.00175561, 0.001196708))
 
   Shapiro-Wilk normality test
 
 data:  c(0.000637806, 0.00175561, 0.001196708)
 W = 1, p-value = NA
 
 I am not able to find the bug in our data, so I think there might be a 
 problem with the shapiro.test().

The clue is that

 diff(sort(c(0.000637806, 0.00175561, 0.001196708)))
[1] 0.000558902 0.000558902

which is either an extreme coincidence or a sign that your data are not 
independent samples from a continuous distribution. Since the normal quantiles 
are also equidistant, you get a correlation of W=1 in the QQ-plot, and 
apparently this triggers the NA p-value. 

I suppose returning p=1.0 would arguably be a better choice for this case, but 
it _is_ pretty extreme. 

-pd

 
 I use the following technical background:
 
 platform   x86_64-pc-linux-gnu
 arch   x86_64
 os linux-gnu
 system x86_64, linux-gnu
 status
 major  2
 minor  14.1
 year   2011
 month  12
 day22
 svn rev57956
 language   R
 version.string R version 2.14.1 (2011-12-22)
 
 
 Thanks,
 Judith
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] increase the usage of CPU and Memory

2012-06-26 Thread Oliver Ruebenacker
 Hello Christopher,

  If a process has data to write to hard disk, the data is usually
written to a buffer in memory, and from there it is written to the
hard disk independently of the CPU. Since writing to memory is much
faster than writing to hard disk, this allows the process to run
faster. To the process, it appears as if the data is already on disk.
If, however, the buffer runs full, an attempt by a process to write
more data will cause the process to wait until space is available in
the buffer. If a process spends time waiting, it means it does not use
all the CPU it could otherwise.

  I don't know how much input is buffered, but since only the process
knows where it will request input from next, this limits ways to
buffer input. I'm assuming though, that if you open a file and read
the first few bytes, some more bytes may be read into a buffer since
the process is likely to request them next. But in any case, input
form disk or network is almost certain to cause waiting times and
therefore decreases used CPU time.

 Take care
 Oliver

On Tue, Jun 26, 2012 at 1:53 PM, Christofer Bogaso
bogaso.christo...@gmail.com wrote:
 On 26-06-2012 16:33, Oliver Ruebenacker wrote:

      Hello Xi,

   If a program has input or output to disk or network, this may cause
 it to wait and not use the available CPU.

   Output is usually buffered, but may cause delay if the buffer gets
 full (I'm not sure though whether this is an issue with plenty of
 memory available)

      Take care
      Oliver

 On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote:

 Dear All,

 I have been searching online for help increasing my R code more
 efficiently
 for almost a whole day, however, there is no solution to my case. So if
 anyone could give any clue to solve my problem, I would be very
 appreciate
 for you help. Thanks in advance.

 Here is my issue:

 My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA
 GTX
 480  graphic card, and I am using a 64-bit version of R under 64-bit
 Windows
 .

 I am running a for loop to generate a 461*5 matrix data, which is
 coming
 from the coefficients of 5 models. The loop would produce 5 values one
 time, and it will run 461 times in total. I have tried to run the code
 inside the loop just once, it will cost almost 10 seconds, so if
 we intuitively calculate the time of the whole loop will cost, it would
 be
 4610 seconds, equal to almost one and a half hours, which is exactly the
 whole loop taking indeed. But I have to run this kinda loop for
 30 data-sets!

 Although I thought I am using a not-bad at all desktop, I checked the
 usage
 of CPU and memory during my running R code, and found out the whole code
 just used 15% of CPU and 10% of memory. Does anyone have the same issue
 with me? or Does anyone know some methods to shorten the running time and
 increase the usage of CPU and memory?

 Many thanks,
 Xi

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 Hi Oliver, can you please give some details on what you are meaning by
 'Output is usually buffered'?

 Thanks and regards,




-- 
Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant
President and Founder of Knowomics
(http://www.knowomics.com/wiki/Oliver_Ruebenacker)
Consultant at Predictive Medicine
(http://predmed.com/people/oliverruebenacker.html)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Compile C files

2012-06-26 Thread Frederico Mestre
Hello:

 

Sorry, this might look like a beginner question, but I'm just starting to
work on the C and R interface.

 

I'm trying to compile a C file (with a function) to load it to an R function
but, in the command line I keep getting a lot of errors, like:

 

C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected
declaration specifiers before 'SEXP'

 

I've been able to compile this file before, so I 

 

I'm using Windows 7 in a 64 bits computer.

 

Best regards,

 

Frederico 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?

2012-06-26 Thread Terry Therneau
I'm finally back from vacation and looking at your email.

1. The primary mistake is in your call, where you say
 fit - survfit(mod.allison.5, newdata.1, id=Id)

This will use the character string Id as the value of the identifier, 
not the data.  The effect is exactly the same as the difference between 
print(x) and print('x').

2. In reply to John's comment that all the id values are the same.  It 
is correct.  Normally the survfit routine is used to produce multiple 
curves, one curve per line of the input data, for time-independent 
variables.   The presence of an id argument is used to tell it that 
there are multiple lines per subject in the data, e.g. time-dependent 
covariates.  So even though there is only one curve being produced we 
need an id statement to trigger the behavior.
   If you only want one curve for one individual, then individual=TRUE 
is an alternate, as John pointed out.

3. It's very important to specify the Surv object and the formula 
directly in the coxph function ...
Yes, I agree.  I always use your suggested form because it gives better 
documentation -- variable names are directly visible in the coxph call.  
I don't understand the attraction of the other form, but lot's of people 
use it.
Why did it go wrong?  Because the survfit function was evaluating
 Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + 
age + age:stop + pro, data=newdata.1

The length of the variables will be different.  The error message comes 
from the R internals, not my program.

Terry Therneau


On 06/16/2012 08:04 AM, Jürgen Biedermann wrote:

 Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...

 I don't find a solution to use the survfit function (package:
 survival)  for a defined pattern of covariates with a Cox-Model
 including a time dependent interaction term. Somehow the definition of
 my newdata argument seems to be erroneous.
 I already googled the problem, found many persons having the same or a
 similar problem, but still no solution.
 I want to stress that my time-dependent covariate does not depend on the
 failure of an individual (in this case it doesn't seem sensible to
 predict a survivor function for an individual). Rather one of my effects
 declines with time (time-dependent coefficient).

 For illustration, I use the example of John Fox's paper Cox
 Proportional - Hazards Regression for Survival Data.
 http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf

 Do you know any help? See code below

 Thanks very much in advance
 Jürgen Biedermann

 #
 #Code

 Rossi -
 read.table(http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt;,
 header=T)

 Rossi.2 - fold(Rossi, time='week',
  event='arrest', cov=11:62, cov.names='employed')

 # see below for the fold function from John Fox

 # modeling an interaction with time (Page 14)

 mod.allison.5 - coxph(Surv(start, stop, arrest.time) ~
  fin + age + age:stop + prio,
  data=Rossi.2)
 mod.allison.5

 # Attempt to get the survivor function of a person with age=30, fin=0
 and prio=5

 newdata.1 -
 data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
 fit - survfit(mod.allison.5,newdata.1,id=Id)

 Error message:

 Fehler in model.frame.default(data = newdata.1, id = Id, formula =
 Surv(start,  :
Variablenlängen sind unterschiedlich (gefunden für '(id)')

 -- failure, length of variables are different.

 #-
 fold - function(data, time, event, cov,
  cov.names=paste('covariate', '.', 1:ncovs, sep=),
  suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
  vlag - function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
  xlag - function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
  all.cov - unlist(cov)
  if (!is.list(cov)) cov - list(cov)
  ncovs - length(cov)
  nrow - nrow(data)
  ncol - ncol(data)
  ncov - length(cov[[1]])
  nobs - nrow*ncov
  if (length(unique(c(sapply(cov, length), length(cov.times)-1)))  1)
  stop(paste(
  all elements of cov must be of the same length and \n,
  cov.times must have one more entry than each element of
 cov.))
  var.names - names(data)
  subjects - rownames(data)
  omit.cols - if (!common.times) c(all.cov, cov.times) else all.cov
  keep.cols - (1:ncol)[-omit.cols]
  nkeep - length(keep.cols)
  if (is.numeric(event)) event - var.names[event]
  times - if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
  else data[, cov.times]
  new.data - matrix(Inf, nobs, 3 + ncovs + nkeep)
  rownames - rep(, nobs)
  colnames(new.data) - c('start', 'stop', paste(event, suffix, 
 sep=),
  var.names[-omit.cols], cov.names)
  end.row - 0
  for (i in 1:nrow){
  start.row - end.row + 1
  end.row - end.row + ncov
  start - 

Re: [R] Ljung-Box test (Box.test)

2012-06-26 Thread Rui Barradas

Hello,

That's a statistics question, but it's also about using an R function.

The Ljung-Box test isn't supposed to be used in such a context, to test 
the residuals of an ols y = bX + e. It is used to test time independence 
of the original series or of the residuals of an ARMA(p, q) fit.


In both cases you are right, 'x' is a series.
'lag' can be explained as follows: you have a time series and want to 
know if the value observed today depends on what was observed in the 
past. Then, a linear regression of today on yesterday could be


X[t] = b[1]*X[t-1] + e[t], e ~ Normal(0, sigma^2)

A linear regression on two time units in the past would be

X[t] = b[1]*X[t-1] + b[2]*X[t-2] + e[t], e ~ Normal(0, sigma^2)

etc. This is a regression of the series on itself lagged by a certain 
number of time units, the present is regressed on the past. Function 
ar() fits this kind of model to a time series. In the first case, the 
order is p=1, in the second, p=2.


Now, in the first case, is there second order serial correlation? Test 
the residuals with lag=2, fitdf=1, the value of p. Third order? lag=3, 
fitdf=p=1, etc.


You are NOT fitting this type of model, so the Ljung-Box test is 
misused. Test the original series with default parameters, lag=1. If 
there is serial correlation, fit an AR (Auto-Regressive) model with 
ar(). See the help page ?ar. And see a statiscian with experience in 
time series. It's a world on its own, I haven't even mentioned 
seasonality. And almost everything else about time series.


Do ask someone near you.

Hope this helps,

Rui Barradas
Em 26-06-2012 19:01, Steven Winter escreveu:

I fit a simple linear model y = bX to a data set today, and that produced 24 
residuals (I have 24 data points, one for each year from 1984-2007). I would 
like to test the time-independence of the residuals of my model, and I was 
recommended by my supervisor to use the Ljung-Box test. The Box.test function 
in R takes 4 arguments:

x a numeric vector or univariate time series.
lag the statistic will be based on lag autocorrelation
coefficients.
type test to be performed: partial matching is used.
fitdf number of degrees of freedom to be subtracted if x is a series of 
residuals.

Unfortunately, I never took a statistics class where I learned the Ljung-Box test, and information 
about it online is hard to find. What does lag mean, and what value would you guys 
recommend I use for the test? Also, what does fitdf represent, and what would the value 
for that parameter be in my case? Finally, the value of x is a vector of my 24 residuals, correct?

Thank you all so much. I apologize for the basic nature of the question.

Steven
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Storing whole regression results

2012-06-26 Thread Kevin Chang
Hello seasons R users,

 

Is it possible to store a complete regression result into an array? I've
already been able to save individual regression coefficients, but would like
to store the whole regression results into different arrays through a loop.

 

So that in under different quantiles regressions, I would be able to create
a loop and store the full regression result each time into a different array
for printing.

The only way I can think of is to pre-generate a whole set of arrays and
matrices to individually store each regression coefficients one at a time.

 

Thank you,

 

Kevin

 

Master of Science Student |  University of Guelph Department of Food,
Resource and Agricultural Economics 
J.D. MacLachlan Building - Room 002 Guelph, ON N1G 2W1 
Webpage:http://fare.uoguelph.ca/users/kchang01
http://fare.uoguelph.ca/users/kchang01 
Email:mailto:kchan...@uoguelph.ca kchan...@uoguelph.ca

Mobile:226-979-2813

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Storing whole regression results

2012-06-26 Thread Sarah Goslee
You can store entire regression results in a list, then use lapply()
to retrieve individual coefficients as desired.

Lists are very powerful for managing odd data formats, and no loops needed.

Sarah

On Tue, Jun 26, 2012 at 4:19 PM, Kevin Chang kchan...@uoguelph.ca wrote:
 Hello seasons R users,



 Is it possible to store a complete regression result into an array? I've
 already been able to save individual regression coefficients, but would like
 to store the whole regression results into different arrays through a loop.



 So that in under different quantiles regressions, I would be able to create
 a loop and store the full regression result each time into a different array
 for printing.

 The only way I can think of is to pre-generate a whole set of arrays and
 matrices to individually store each regression coefficients one at a time.



 Thank you,



 Kevin




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] chisq.test

2012-06-26 Thread Omphalodes Verna
Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example: 

        tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
        dimnames(tabela) - list(
        SEX = c(M,F),
        HAIR = c(Brown, Black, Red, Blonde))
        addmargins(tabele)
        prop.table(tabele)
        chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.

Thanks a lot to all, OV

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread SSimek
Hello, 

I have count data that illustrate the presence or absence of individuals in
my study population. I created a grid cell across the study area and
calcuated a count value for each individual per season per year for each
grid cell. The count value is the number of time an individual was present
in each grid cell.  For illustration my data columns look something like
this and are repeated for each individual:

Cell_ID Param1  Param2  Param3  Param4  COUNT   NameYearSeason  Cov
1   160.565994  729.08  15037930.3  0   AA  2010AUT 
Open
1   160.565994  729.08  15037930.3  22  AA  2011SPR 
Open
1   160.565994  729.08  15037930.3  12  AA  2009SUM 
Open
1   160.565994  729.08  15037930.3  0   AA  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 0   AA  2010AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 16  AA  2011SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2009SUM 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2010SUM 
oldHard
…   
563 86.777099   612.69  977 4474.6  62  AA  2010AUT 
Water
563 86.777099   612.69  977 4474.6  12  AA  2011SPR 
Water
563 86.777099   612.69  977 4474.6  55  AA  2009SUM 
Water


1   160.565994  729.08  15037930.3  0   BB  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 72  BB  2010SUM 
oldHard
5   160.75  614.95  1503.31 2878.98 16  BB  2010SUM medHard
6   170.404998  510.58  1489.44 743.14  0   BB  2010SUM 
Water
…   
563 86.777099   612.69  977 4474.6  0   BB  2010SUM 
Water


1   160.565994  729.08  15037930.3  14  C   2005AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006SPR 
Open
1   160.565994  729.08  15037930.3  56  C   2007SPR 
Open
1   160.565994  729.08  15037930.3  0   C   2006SUM 
Open
2   169.427001  491.87  1503.31 5101.09 124 C   2005AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 231 C   2006AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 889 C   2006SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   C   2007SPR 
oldHard
…   
563 86.777099   612.69  977 4474.6  0   C   2005
AUT Water
563 86.777099   612.69  977 4474.6  231 C   2006
AUT Water
563 86.777099   612.69  977 4474.6  185 C   2006
SPR Water
563 86.777099   612.69  977 4474.6  123 C   2007
SPR Water
563 86.777099   612.69  977 4474.6  52  C   2006
SUM Water



I have 563 grid cells across my study area and each individual has 1-563
cells associated for each year and each season the individual was monitored.
Therefore my grid cells are repeated. I end up with 71,000 records and 925
records have a Count value 0; which means 70,075 records have a Count value
= 0. 

I wanted to run a zero inflated poisson model to determine mixed effects (of
parameters) with individual as the random effect. But I have been advised
two things:

1. I cannot run a zero inflated poisson model because my data are too
extremely inflated (i.e. 70,075 vs 925) and 

2. I cannot run the model with each cell repeated for each individual. I am
told the model doesn't recognize that Cell_ID #1 for individual A is the
same Cell_ID #1 for individual B.

Does anyone know if either or both of these points are true? I would
appreciate any thoughts, advice, or suggestions. 

Thanks!

-Stephanie

--
View this message in context: 
http://r.789695.n4.nabble.com/Zero-inflated-is-there-a-limit-to-the-level-of-inflation-tp4634532.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

[R] Figuring out encodings of PDFs in R

2012-06-26 Thread Jonas Michaelis
Dear list,

I am currently scraping some text data from several PDFs using the
readPDF() function in the tm package. This all works very well and in most
cases the encoding seems to be latin1 - in some, however, it is not. Is
there a good way in R to check character encodings? I found the functions
is.utf8() and is.local() in the tau package but that obviously only gets me
so far.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chisq.test

2012-06-26 Thread Peter Ehlers

On 2012-06-26 11:27, Omphalodes Verna wrote:

Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example:

 tabele- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
 dimnames(tabela)- list(
 SEX = c(M,F),
 HAIR = c(Brown, Black, Red, Blonde))
 addmargins(tabele)
 prop.table(tabele)
 chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.


Do this:

  ct - chisq.test(tabele)
  ct$expected

If that does not give you a sufficient hint, then you need
to review the assumptions underlying the chisquare test.

Peter Ehlers



Thanks a lot to all, OV

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chisq.test

2012-06-26 Thread David L Carlson
The warning means that you have many cells with expected values less than 5
(4 of 8 cells in this case) so that the chi square estimate may be inflated.
The good news is that the probability of the inflated chi square is .0978
which you probably would not consider to be significant anyway. If you want
to get a simulated p value using Monte Carlo simulation (see the references
in the manual page for chisq.test), just change the call to

chisq.test(tabele, simulate.p.value=TRUE, B=2000)

When I run this five times, I get probability estimates ranging from .09795
to .1089.

Alternatively, get more data.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Omphalodes Verna
 Sent: Tuesday, June 26, 2012 1:28 PM
 To: r-help@r-project.org
 Subject: [R] chisq.test
 
 Dear list!
 
 I would like to calculate chisq.test on simple data set with 70
 observations, but the output is ''Warning message:''
 
 Warning message:
 In chisq.test(tabele) : Chi-squared approximation may be incorrect
 
 
 Here is an example:
 
 tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow
 = TRUE)
 dimnames(tabela) - list(
 SEX = c(M,F),
 HAIR = c(Brown, Black, Red, Blonde))
 addmargins(tabele)
 prop.table(tabele)
 chisq.test(tabele)
 Please, give me an advice / suggestion / recommendation.
 
 Thanks a lot to all, OV
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] flatten lists

2012-06-26 Thread Jeroen Ooms
I am looking for a function to flatten a list to a list of only 1
level deep. Very similar to unlist, however I don't want to turn it
into a vector because then everything will be casted to character
vectors:

x - list(name=Jeroen, age=27, married=FALSE,
home=list(country=Netherlands, city=Utrecht))
unlist(x)

This function sort of does it:

flatlist - function(mylist){
  lapply(rapply(mylist, enquote, how=unlist), eval)
}

flatlist(x)

However it is a bit slow. Is there a more native way?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting two histograms on one plot with hist function

2012-06-26 Thread John Kane

Oh, I had not thought of it in those terms.  It does make sense now.

   John Kane
   Kingston ON Canada

   -Original Message-
   From: ke...@math.montana.edu
   Sent: Tue, 26 Jun 2012 10:57:31 -0600
   To: jrkrid...@inbox.com
   Subject: Re: [R] plotting two histograms on one plot with hist function

   On Tue, Jun 26, 2012 at 10:02 AM, John Kane [1]jrkrid...@inbox.com wrote:

 Why not just plot the two histograms on the same scale in a 2 panel plot?

   I think OP request was for comparison. Two panels may do, but why not a
   barplot of the histograms in the same panel ?
   barplot( rbind(
   hist(rbeta(30,2,4),breaks=seq(0,1,.1),plot=F)$counts,
   hist(rbeta(30,6,8),breaks=seq(0,1,.1),plot=F)$counts),
   beside=T)
   see str(hist(yourdata)) or ?hist
   Cheers
   Ilai

 John Kane
 Kingston ON Canada
  -Original Message-
  From: [2]mb...@sun.ac.za
  Sent: Tue, 26 Jun 2012 15:24:55 +0200
  To: [3]r-help@r-project.org
  Subject: [R] plotting two histograms on one plot with hist function
 
  I would like to plot two data sets (frequency (y-axis) of mean values
 for
  0-1(x=axis)) on a single histogram for comparison. The hist() only allow
   the  overlay  of  two  histograms, and although barplot() allows
 beside=TRUE,
  it does not show frequency values (like hist) but rather all of the
  values. Is there any way that I can use the hist() to plot two data sets
  similar to the barplot(). Any help or advice will be appreciated!
 
  Kind regards,
  Marguerite
 
 
 
 
 

  E-pos vrywaringsklousule
 
  Hierdie e-pos mag vertroulike inligting bevat en mag regtens
  geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit
  geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u
  hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik,
  versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per
  telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie
  aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit
  hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie
 e-pos
  nie.
 
  E-mail disclaimer
 
This   e-mail   may   contain   confidential   information  and
 may...{{dropped:11}}
 
  __
  [4]R-help@r-project.org mailing list
  [5]https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  [6]http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 GET   FREE   SMILEYS   FOR   YOUR  IMEMAIL  -  Learn  more  at
 [7]http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and
 most webmails
 __
 [8]R-help@r-project.org mailing list
 [9]https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 [10]http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 _

   Free Online Photosharing - Share your photos online with your friends and
   family!
   Visit [11]http://www.inbox.com/photosharing to find out more!

References

   1. mailto:jrkrid...@inbox.com
   2. mailto:mb...@sun.ac.za
   3. mailto:r-help@r-project.org
   4. mailto:R-help@r-project.org
   5. https://stat.ethz.ch/mailman/listinfo/r-help
   6. http://www.R-project.org/posting-guide.html
   7. http://www.inbox.com/smileys
   8. mailto:R-help@r-project.org
   9. https://stat.ethz.ch/mailman/listinfo/r-help
  10. http://www.R-project.org/posting-guide.html
  11. http://www.inbox.com/photosharing
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread Marc Schwartz
On Jun 26, 2012, at 2:10 PM, SSimek wrote:

 Hello, 
 
 I have count data that illustrate the presence or absence of individuals in
 my study population. I created a grid cell across the study area and
 calcuated a count value for each individual per season per year for each
 grid cell. The count value is the number of time an individual was present
 in each grid cell.  For illustration my data columns look something like
 this and are repeated for each individual:
 
 Cell_ID   Param1  Param2  Param3  Param4  COUNT   NameYearSeason  
 Cov
 1 160.565994  729.08  15037930.3  0   AA  2010AUT 
 Open
 1 160.565994  729.08  15037930.3  22  AA  2011SPR 
 Open
 1 160.565994  729.08  15037930.3  12  AA  2009SUM 
 Open
 1 160.565994  729.08  15037930.3  0   AA  2010SUM 
 Open
 2 169.427001  491.87  1503.31 5101.09 0   AA  2010AUT 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 16  AA  2011SPR 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 0   AA  2009SUM 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 0   AA  2010SUM 
 oldHard
 … 
 563   86.777099   612.69  977 4474.6  62  AA  2010AUT 
 Water
 563   86.777099   612.69  977 4474.6  12  AA  2011SPR 
 Water
 563   86.777099   612.69  977 4474.6  55  AA  2009SUM 
 Water
   
   
 1 160.565994  729.08  15037930.3  0   BB  2010SUM 
 Open
 2 169.427001  491.87  1503.31 5101.09 72  BB  2010SUM 
 oldHard
 5 160.75  614.95  1503.31 2878.98 16  BB  2010SUM medHard
 6 170.404998  510.58  1489.44 743.14  0   BB  2010SUM 
 Water
 … 
 563   86.777099   612.69  977 4474.6  0   BB  2010SUM 
 Water
   
   
 1 160.565994  729.08  15037930.3  14  C   2005AUT 
 Open
 1 160.565994  729.08  15037930.3  0   C   2006AUT 
 Open
 1 160.565994  729.08  15037930.3  0   C   2006SPR 
 Open
 1 160.565994  729.08  15037930.3  56  C   2007SPR 
 Open
 1 160.565994  729.08  15037930.3  0   C   2006SUM 
 Open
 2 169.427001  491.87  1503.31 5101.09 124 C   2005AUT 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 231 C   2006AUT 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 889 C   2006SPR 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 0   C   2007SPR 
 oldHard
 … 
 563   86.777099   612.69  977 4474.6  0   C   2005
 AUT Water
 563   86.777099   612.69  977 4474.6  231 C   2006
 AUT Water
 563   86.777099   612.69  977 4474.6  185 C   2006
 SPR Water
 563   86.777099   612.69  977 4474.6  123 C   2007
 SPR Water
 563   86.777099   612.69  977 4474.6  52  C   2006
 SUM Water
 
 
 
 I have 563 grid cells across my study area and each individual has 1-563
 cells associated for each year and each season the individual was monitored.
 Therefore my grid cells are repeated. I end up with 71,000 records and 925
 records have a Count value 0; which means 70,075 records have a Count value
 = 0. 
 
 I wanted to run a zero inflated poisson model to determine mixed effects (of
 parameters) with individual as the random effect. But I have been advised
 two things:
 
 1. I cannot run a zero inflated poisson model because my data are too
 extremely inflated (i.e. 70,075 vs 925) and 
 
 2. I cannot run the model with each cell repeated for each individual. I am
 told the model doesn't recognize that Cell_ID #1 for individual A is the
 same Cell_ID #1 for individual B.
 
 Does anyone know if either or both of these points are true? I would
 appreciate any thoughts, advice, or suggestions. 
 
 Thanks!
 
 -Stephanie


Hi Stephanie,

Some comments:

1. You should think about or at least be open to a zero inflated negative 
binomial distribution rather than zero inflated poisson. 

2. You should at least review the vignette for the pscl CRAN package, which 
provides standard fixed effects models and related functions for count based 
data and importantly, 

Re: [R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread Achim Zeileis

On Tue, 26 Jun 2012, Marc Schwartz wrote:


On Jun 26, 2012, at 2:10 PM, SSimek wrote:


Hello,

I have count data that illustrate the presence or absence of individuals in
my study population. I created a grid cell across the study area and
calcuated a count value for each individual per season per year for each
grid cell. The count value is the number of time an individual was present
in each grid cell.  For illustration my data columns look something like
this and are repeated for each individual:

Cell_ID Param1  Param2  Param3  Param4  COUNT   NameYearSeason  Cov
1   160.565994  729.08  15037930.3  0   AA  2010AUT 
Open
1   160.565994  729.08  15037930.3  22  AA  2011SPR 
Open
1   160.565994  729.08  15037930.3  12  AA  2009SUM 
Open
1   160.565994  729.08  15037930.3  0   AA  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 0   AA  2010AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 16  AA  2011SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2009SUM 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2010SUM 
oldHard
?
563 86.777099   612.69  977 4474.6  62  AA  2010AUT 
Water
563 86.777099   612.69  977 4474.6  12  AA  2011SPR 
Water
563 86.777099   612.69  977 4474.6  55  AA  2009SUM 
Water


1   160.565994  729.08  15037930.3  0   BB  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 72  BB  2010SUM 
oldHard
5   160.75  614.95  1503.31 2878.98 16  BB  2010SUM medHard
6   170.404998  510.58  1489.44 743.14  0   BB  2010SUM 
Water
?
563 86.777099   612.69  977 4474.6  0   BB  2010SUM 
Water


1   160.565994  729.08  15037930.3  14  C   2005AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006SPR 
Open
1   160.565994  729.08  15037930.3  56  C   2007SPR 
Open
1   160.565994  729.08  15037930.3  0   C   2006SUM 
Open
2   169.427001  491.87  1503.31 5101.09 124 C   2005AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 231 C   2006AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 889 C   2006SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   C   2007SPR 
oldHard
?
563 86.777099   612.69  977 4474.6  0   C   2005
AUT Water
563 86.777099   612.69  977 4474.6  231 C   2006
AUT Water
563 86.777099   612.69  977 4474.6  185 C   2006
SPR Water
563 86.777099   612.69  977 4474.6  123 C   2007
SPR Water
563 86.777099   612.69  977 4474.6  52  C   2006
SUM Water



I have 563 grid cells across my study area and each individual has 1-563
cells associated for each year and each season the individual was monitored.
Therefore my grid cells are repeated. I end up with 71,000 records and 925
records have a Count value 0; which means 70,075 records have a Count value
= 0.

I wanted to run a zero inflated poisson model to determine mixed effects (of
parameters) with individual as the random effect. But I have been advised
two things:

1. I cannot run a zero inflated poisson model because my data are too
extremely inflated (i.e. 70,075 vs 925) and

2. I cannot run the model with each cell repeated for each individual. I am
told the model doesn't recognize that Cell_ID #1 for individual A is the
same Cell_ID #1 for individual B.

Does anyone know if either or both of these points are true? I would
appreciate any thoughts, advice, or suggestions.

Thanks!

-Stephanie



Hi Stephanie,

Some comments:

1. You should think about or at least be open to a zero inflated negative 
binomial distribution rather than zero inflated poisson.

2. You should at least review the vignette for the pscl CRAN package, which 
provides standard fixed effects models and related functions for count based 
data and importantly, some good conceptual content:

 http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

3. Given the repeated measures framework and correlation issues you likely 
have, you should subscribe to and re-post your query to the R-sig-mixed-models 
list:

 https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

which will avail you of experts in the field.

4. There is also a draft FAQ for mixed models here:

 http://glmm.wikidot.com/faq

which I believe is maintained by Ben Bolker, 

Re: [R] flatten lists

2012-06-26 Thread Neal Fultz
do.call(c, x) 

maybe?

On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:
 
 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)
 
 This function sort of does it:
 
 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }
 
 flatlist(x)
 
 However it is a bit slow. Is there a more native way?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flatten lists

2012-06-26 Thread Jeroen Ooms
Hmm that doesn't seem to work if the original list is nested more than
2 levels deep. I should have probably given a better example:

x - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flatten lists

2012-06-26 Thread Jeroen Ooms
Alright, but I need something recursive for lists with arbitrary deepness.



On Tue, Jun 26, 2012 at 3:37 PM, arun smartpink...@yahoo.com wrote:
 Hi,

 Try:

 do.call(c,do.call(c,x))

 x1-do.call(c,do.call(c,x))
  x2-flatlist(x)
  identical(x1,x2)
 [1] TRUE



 A.K.



 - Original Message -
 From: Jeroen Ooms jeroen.o...@stat.ucla.edu
 To: Neal Fultz nfu...@gmail.com
 Cc: r-help@r-project.org
 Sent: Tuesday, June 26, 2012 6:23 PM
 Subject: Re: [R] flatten lists

 Hmm that doesn't seem to work if the original list is nested more than
 2 levels deep. I should have probably given a better example:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




 On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chisq.test

2012-06-26 Thread David Winsemius


On Jun 26, 2012, at 2:27 PM, Omphalodes Verna wrote:


Dear list!

I would like to calculate chisq.test on simple data set with 70  
observations, but the output is ''Warning message:''


Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example:

tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4,  
byrow = TRUE)

dimnames(tabela) - list(
SEX = c(M,F),
HAIR = c(Brown, Black, Red, Blonde))
addmargins(tabele)
prop.table(tabele)
chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.


Read any introductory stats book regarding  small cell sizes:

 [,1] [,2] [,3] [,4]
[1,]   11335
[2,]3   186   21





--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?

2012-06-26 Thread Søren Højsgaard
Dear Duncan

Thanks for your suggestion, but I really need sparse matrices: I have 
implemented various graph algorithms based on adjacency matrices. For large 
graphs, storing all the 0's in an adjacency matrices become uneconomical, and 
therefore I thought I would use sparse matrices but the speed of [i,j] will 
slow down the algorithms. However, using RcppEigen it is possible to mimic 
[i,j] with a slowdown of only a factor 16 which is much better than what is 
obtained when using [i,j]:

 benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
+ columns=c(test, replications, elapsed, relative), replications=5)
   test replications elapsed relative
1   lookup(mm, `[`)50.05  1.0
2   lookup(MM, `[`)5   23.54470.8
3 lookup(MM, Xiijj)50.84 16.8

The code for producing the result is given below.

Best regards,
Søren

-

library(inline)
library(RcppEigen)
library(rbenchmark)
library(Matrix)

src - '
using namespace Rcpp;
typedef Eigen::SparseMatrixdouble MSpMat;
const MSpMat X(asMSpMat(XX_));
int i = asint(ii_)-1;
int j = asint(jj_)-1;
double ans = X.coeff(i,j);
return(wrap(ans));
'

Xiijj - cxxfunction(signature(XX_=matrix, ii_=integer, jj_=integer), 
body=src, plugin=RcppEigen)

mm - matrix(c(1,0,0,0,0,0,0,0), nr=100, nc=100)
MM - as(mm, Matrix)
object.size(mm)
object.size(MM)

lookup - function(mat, func){
  for (i in 1:nrow(mat)){
for (j in 1:ncol(mat)){
v-func(mat,i,j)
}
   }
}

benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
columns=c(test, replications, elapsed, relative), 
replications=5)











-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: 25. juni 2012 11:27
To: Søren Højsgaard
Cc: r-help@r-project.org
Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to 
be very slow. Are there faster alternatives?

On 12-06-24 4:50 PM, Søren Højsgaard wrote:
 Dear all,

 Indexing matrices from the Matrix package with [i,j] seems to be very slow. 
 For example:

   library(rbenchmark)
   library(Matrix)
   mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20)
   MM- as(mm, Matrix)
   lookup- function(mat){
 for (i in 1:nrow(mat)){
   for (j in 1:ncol(mat)){
  mat[i,j]
   }
 }
 }

   benchmark(lookup(mm), lookup(MM),  columns=c(test, replications, 
 elapsed, relative), replications=50)
 test replications elapsed relative
 1 lookup(mm)   500.01   1
 2 lookup(MM)   508.77  877

 I would have expected a small overhead when indexing a matrix from the Matrix 
 package, but this result is really surprising...
 Does anybody know if there are faster alternatives to [i,j] ?

There's also a large overhead when indexing a dataframe, though Matrix appears 
to be slower.  It's designed to work on whole matrices at a time, not single 
entries.  So I'd suggest that if you need to use [i,j] indexing, then try to 
arrange your code to localize the access, and extract a submatrix as a regular 
fast matrix first. (Or if it will fit in memory, convert the whole thing to a 
matrix just for the access.  If I just add the line

mat - as.matrix(mat)

at the start of your lookup function, it becomes several hundred times
faster.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mixture distribution with positive and negative probabilities

2012-06-26 Thread Yakir Gagnon
Hi!
Any ideas on which package (e.g. mixdist, flexmix, etc) how I could fit a
mixture of say 3 Gaussian functions where 2 have their proportions, means,
and sigmas, and the third has a mean, sigma but a negative proportion.
Basically I'm trying to fit a mixture model to a distribution that
I know is the sum of 3 distributions, where one inhibits the other two. Is
there such a thing?
Thanks in advance!

Yakir Gagnon
cell+1 919 886 3877
office +1 919 684 7188
Johnsen Lab
Biology Department
Box 90338
Duke University
Durham, NC 27708
BioSci Building
Room 307
http://fds.duke.edu/db/aas/Biology/postdoc/yg32
http://www.biology.duke.edu/johnsenlab/people/yakir.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flatten lists

2012-06-26 Thread Bert Gunter
Frankly, I'm not sure what you mean, but presumably

unlist(yourlist, recurs=FALSE)

is not it, right?

-- Bert

On Tue, Jun 26, 2012 at 2:25 PM, Jeroen Ooms jeroen.o...@stat.ucla.eduwrote:

 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
  lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?

2012-06-26 Thread Søren Højsgaard
Duncan,
I should probably add that I am aware that my code is not the solution and also 
that the relative gain of my code probably decreases with the problem size 
until eventually it will perform worse that [i,j] (because of copying I 
suppose). So my point is just:  It would just be nice if [i,j] was faster...
Regards
Søren

PS: For a 2000 x 2000 matrix I get:
  test replications elapsed  relative
1   lookup(mm, `[`)  514.85 1.00
2 lookup(MM, Xiijj)5  133.66 9.000673

Using the modified code

src - '
using namespace Rcpp;
typedef Eigen::MappedSparseMatrixdouble MSpMat;
const MSpMat X(asMSpMat(XX_));
int i = asint(ii_)-1;
int j = asint(jj_)-1;
double ans = X.coeff(i,j);
return(wrap(ans));
'





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Søren Højsgaard
Sent: 27. juni 2012 01:20
To: Duncan Murdoch
Cc: r-help@r-project.org
Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to 
be very slow. Are there faster alternatives?

Dear Duncan

Thanks for your suggestion, but I really need sparse matrices: I have 
implemented various graph algorithms based on adjacency matrices. For large 
graphs, storing all the 0's in an adjacency matrices become uneconomical, and 
therefore I thought I would use sparse matrices but the speed of [i,j] will 
slow down the algorithms. However, using RcppEigen it is possible to mimic 
[i,j] with a slowdown of only a factor 16 which is much better than what is 
obtained when using [i,j]:

 benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
+ columns=c(test, replications, elapsed, relative), 
+ replications=5)
   test replications elapsed relative
1   lookup(mm, `[`)50.05  1.0
2   lookup(MM, `[`)5   23.54470.8
3 lookup(MM, Xiijj)50.84 16.8

The code for producing the result is given below.

Best regards,
Søren

-

library(inline)
library(RcppEigen)
library(rbenchmark)
library(Matrix)

src - '
using namespace Rcpp;
typedef Eigen::SparseMatrixdouble MSpMat; const MSpMat X(asMSpMat(XX_)); 
int i = asint(ii_)-1; int j = asint(jj_)-1; double ans = X.coeff(i,j); 
return(wrap(ans)); '

Xiijj - cxxfunction(signature(XX_=matrix, ii_=integer, jj_=integer), 
body=src, plugin=RcppEigen)

mm - matrix(c(1,0,0,0,0,0,0,0), nr=100, nc=100) MM - as(mm, Matrix)
object.size(mm)
object.size(MM)

lookup - function(mat, func){
  for (i in 1:nrow(mat)){
for (j in 1:ncol(mat)){
v-func(mat,i,j)
}
   }
}

benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
columns=c(test, replications, elapsed, relative), 
replications=5)











-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: 25. juni 2012 11:27
To: Søren Højsgaard
Cc: r-help@r-project.org
Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to 
be very slow. Are there faster alternatives?

On 12-06-24 4:50 PM, Søren Højsgaard wrote:
 Dear all,

 Indexing matrices from the Matrix package with [i,j] seems to be very slow. 
 For example:

   library(rbenchmark)
   library(Matrix)
   mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20)
   MM- as(mm, Matrix)
   lookup- function(mat){
 for (i in 1:nrow(mat)){
   for (j in 1:ncol(mat)){
  mat[i,j]
   }
 }
 }

   benchmark(lookup(mm), lookup(MM),  columns=c(test, replications, 
 elapsed, relative), replications=50)
 test replications elapsed relative
 1 lookup(mm)   500.01   1
 2 lookup(MM)   508.77  877

 I would have expected a small overhead when indexing a matrix from the Matrix 
 package, but this result is really surprising...
 Does anybody know if there are faster alternatives to [i,j] ?

There's also a large overhead when indexing a dataframe, though Matrix appears 
to be slower.  It's designed to work on whole matrices at a time, not single 
entries.  So I'd suggest that if you need to use [i,j] indexing, then try to 
arrange your code to localize the access, and extract a submatrix as a regular 
fast matrix first. (Or if it will fit in memory, convert the whole thing to a 
matrix just for the access.  If I just add the line

mat - as.matrix(mat)

at the start of your lookup function, it becomes several hundred times
faster.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

Re: [R] Compile C files

2012-06-26 Thread Duncan Murdoch

On 12-06-26 2:48 PM, Frederico Mestre wrote:

Hello:



Sorry, this might look like a beginner question, but I'm just starting to
work on the C and R interface.



I'm trying to compile a C file (with a function) to load it to an R function
but, in the command line I keep getting a lot of errors, like:


You'll need to tell us what you did before  you can expect us to 
interpret the error messages.


Duncan Murdoch





C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected
declaration specifiers before 'SEXP'



I've been able to compile this file before, so I



I'm using Windows 7 in a 64 bits computer.



Best regards,



Frederico




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Figuring out encodings of PDFs in R

2012-06-26 Thread Duncan Murdoch

On 12-06-26 3:28 PM, Jonas Michaelis wrote:

Dear list,

I am currently scraping some text data from several PDFs using the
readPDF() function in the tm package. This all works very well and in most
cases the encoding seems to be latin1 - in some, however, it is not. Is
there a good way in R to check character encodings? I found the functions
is.utf8() and is.local() in the tau package but that obviously only gets me
so far.



There are heuristics for guessing encodings, but I don't think they are 
built into R.  I think the way to do what you want is to read the PDF 
spec to find out how the strings are encoded in the source file, and 
believe that.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RES: Compile C files

2012-06-26 Thread Frederico Mestre
Hello:

I just reinstalled R and Rtools. 

It works perfectly now.

Thanks,

Frederico 



-Mensagem original-
De: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Enviada em: quarta-feira, 27 de Junho de 2012 01:06
Para: Frederico Mestre
Cc: r-help@r-project.org
Assunto: Re: [R] Compile C files

On 12-06-26 2:48 PM, Frederico Mestre wrote:
 Hello:



 Sorry, this might look like a beginner question, but I'm just starting 
 to work on the C and R interface.



 I'm trying to compile a C file (with a function) to load it to an R 
 function but, in the command line I keep getting a lot of errors, like:

You'll need to tell us what you did before  you can expect us to interpret
the error messages.

Duncan Murdoch




 C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected 
 declaration specifiers before 'SEXP'



 I've been able to compile this file before, so I



 I'm using Windows 7 in a 64 bits computer.



 Best regards,



 Frederico




   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread Frank Harrell
This is what the addpanel argument to plot.Predict is for, something along
the lines of

ap - function(...) lpoints(age, weight)
plot(Predict(. . .), addpanel=ap)

Frank


David Winsemius wrote
 
 On Jun 26, 2012, at 11:29 AM, Sarah Goslee wrote:
 
 You could use points() instead of plot() for the second command.
 
 
 Ummm. Maybe not. I think think that plot.Predict uses lattice  
 graphics. You may need to use trellis.focus() followed by lpoints().  
 Or use the + operation with suitable objects.
 
 -- 
 David.
 
 

 Sarah

 On Tue, Jun 26, 2012 at 8:37 AM, achaumont lt;agnes.chaumont@gt;  
 wrote:
 Hello,

 I have a question about the “plot.predict” function in Frank  
 Harrell's rms
 package.
 Do you know how to superpose in the same graph the prediction curve  
 of ols
 and raw data points?
 Put most simply, I would like to combine these two graphs:

  fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE,  
 y=TRUE)
 p - Predict(fit_linear,x2,conf.int=FALSE)
 plot (p, ylim =c(-2,0.5), xlim=c(0,100))  # graph n°1

 z - plot  
 (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)
 # graph n°2

 Thanks all,

 Agnès





 -- 
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/rms-package-superposition-prediction-curve-of-ols-and-data-points-tp4634503p4634566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about formatting Dates

2012-06-26 Thread Erin Hodgess
Dear R People:

I have dates as factors in the following:

 poudel.df$DATE
[1] 1/2/2011  1/4/2011  1/4/2011  1/4/2011  1/6/2011  1/7/2011  1/8/2011
[8] 1/9/2011  1/10/2011
Levels: 1/10/2011 1/2/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 1/9/2011


I want them to be regular dates which can be sorted, etc.

But when I did this:

 as.character(poudel.df$DATE)
[1] 1/2/2011  1/4/2011  1/4/2011  1/4/2011  1/6/2011  1/7/2011
[7] 1/8/2011  1/9/2011  1/10/2011

and
 as.Date(as.character(poudel.df$DATE),%m/%d/$Y)
[1] NA NA NA NA NA NA NA NA NA

because the dates do not have leading zeros.

There are approximately 30 years of nearly daily data in the entire set.

Any suggestions would be much appreciated.

Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chisq.test

2012-06-26 Thread arun


Hi,

The error is due to less than 5 observations in some cells.

You can try,
fisher.test(tabele)
    Fisher's Exact Test for Count Data

data:  tabele 
p-value = 0.0998
alternative hypothesis: two.sided 

A.K.



- Original Message -
From: Omphalodes Verna omphalodes.ve...@yahoo.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Tuesday, June 26, 2012 2:27 PM
Subject: [R] chisq.test

Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example: 

        tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
        dimnames(tabela) - list(
        SEX = c(M,F),
        HAIR = c(Brown, Black, Red, Blonde))
        addmargins(tabele)
        prop.table(tabele)
        chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.

Thanks a lot to all, OV

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] flatten lists

2012-06-26 Thread arun
Hi,

Try:

do.call(c,do.call(c,x))

x1-do.call(c,do.call(c,x))
 x2-flatlist(x)
 identical(x1,x2)
[1] TRUE



A.K.



- Original Message -
From: Jeroen Ooms jeroen.o...@stat.ucla.edu
To: Neal Fultz nfu...@gmail.com
Cc: r-help@r-project.org
Sent: Tuesday, June 26, 2012 6:23 PM
Subject: Re: [R] flatten lists

Hmm that doesn't seem to work if the original list is nested more than
2 levels deep. I should have probably given a better example:

x - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread Stephanie L. Simek
Thank you both for your quick response and input. I will consider all of
your points and see what we are able to derive from there. 

Thank you again for your time and expertise.

-Stephanie

---
Stephanie L. Simek
Carnivore Ecology Lab
Forest and Wildlife Research Center
Mississippi State University
Box 9690
Mississippi State, MS 39762
Cell: (850) 591-1430
Email: ssi...@cfr.msstate.edu


-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] 
Sent: Tuesday, June 26, 2012 4:46 PM
To: Marc Schwartz
Cc: Stephanie L. Simek; r-help@r-project.org
Subject: Re: [R] Zero inflated: is there a limit to the level of
inflation

On Tue, 26 Jun 2012, Marc Schwartz wrote:

 On Jun 26, 2012, at 2:10 PM, SSimek wrote:

 Hello,

 I have count data that illustrate the presence or absence of 
 individuals in my study population. I created a grid cell across the 
 study area and calcuated a count value for each individual per season

 per year for each grid cell. The count value is the number of time an

 individual was present in each grid cell.  For illustration my data 
 columns look something like this and are repeated for each
individual:

 Cell_ID  Param1  Param2  Param3  Param4  COUNT   NameYear
Season  Cov
 1160.565994  729.08  15037930.3  0   AA  2010
AUT Open
 1160.565994  729.08  15037930.3  22  AA  2011
SPR Open
 1160.565994  729.08  15037930.3  12  AA  2009
SUM Open
 1160.565994  729.08  15037930.3  0   AA  2010
SUM Open
 2169.427001  491.87  1503.31 5101.09 0   AA  2010
AUT oldHard
 2169.427001  491.87  1503.31 5101.09 16  AA  2011
SPR oldHard
 2169.427001  491.87  1503.31 5101.09 0   AA  2009
SUM oldHard
 2169.427001  491.87  1503.31 5101.09 0   AA  2010
SUM oldHard
 ?
 563  86.777099   612.69  977 4474.6  62  AA  2010
AUT Water
 563  86.777099   612.69  977 4474.6  12  AA  2011
SPR Water
 563  86.777099   612.69  977 4474.6  55  AA  2009
SUM Water


 1160.565994  729.08  15037930.3  0   BB  2010
SUM Open
 2169.427001  491.87  1503.31 5101.09 72  BB  2010
SUM oldHard
 5160.75  614.95  1503.31 2878.98 16  BB  2010SUM
medHard
 6170.404998  510.58  1489.44 743.14  0   BB  2010
SUM Water
 ?
 563  86.777099   612.69  977 4474.6  0   BB  2010
SUM Water


 1160.565994  729.08  15037930.3  14  C   2005
AUT Open
 1160.565994  729.08  15037930.3  0   C   2006
AUT Open
 1160.565994  729.08  15037930.3  0   C   2006
SPR Open
 1160.565994  729.08  15037930.3  56  C   2007
SPR Open
 1160.565994  729.08  15037930.3  0   C   2006
SUM Open
 2169.427001  491.87  1503.31 5101.09 124 C   2005
AUT oldHard
 2169.427001  491.87  1503.31 5101.09 231 C   2006
AUT oldHard
 2169.427001  491.87  1503.31 5101.09 889 C   2006
SPR oldHard
 2169.427001  491.87  1503.31 5101.09 0   C   2007
SPR oldHard
 ?
 563  86.777099   612.69  977 4474.6  0   C
2005AUT Water
 563  86.777099   612.69  977 4474.6  231 C
2006AUT Water
 563  86.777099   612.69  977 4474.6  185 C
2006SPR Water
 563  86.777099   612.69  977 4474.6  123 C
2007SPR Water
 563  86.777099   612.69  977 4474.6  52  C
2006SUM Water



 I have 563 grid cells across my study area and each individual has 
 1-563 cells associated for each year and each season the individual
was monitored.
 Therefore my grid cells are repeated. I end up with 71,000 records 
 and 925 records have a Count value 0; which means 70,075 records 
 have a Count value = 0.

 I wanted to run a zero inflated poisson model to determine mixed 
 effects (of
 parameters) with individual as the random effect. But I have been 
 advised two things:

 1. I cannot run a zero inflated poisson model because my data are too

 extremely inflated (i.e. 70,075 vs 925) and

 2. I cannot run the model with each cell repeated for each 
 individual. I am told the model doesn't recognize that Cell_ID #1 for

 individual A is the same Cell_ID #1 for individual B.

 Does anyone know if either or both of these points are true? I would 
 appreciate any thoughts, advice, or suggestions.

 Thanks!

 -Stephanie


 Hi Stephanie,

 Some comments:

 1. You should think about or at least be open to a zero inflated
negative binomial distribution rather than zero inflated poisson.

 2. You should at least review the vignette for the pscl CRAN package,
which provides standard fixed effects models and 

Re: [R] flatten lists

2012-06-26 Thread arun
Hi,

I hope this helps. Tested to some depth.



x1 - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=Netherlands, short=NL), city=Utrecht))
x2 - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=list(Country1=Netherlands,Country2=Spain), 
short=list(NL,SP)), city=Utrecht))
x3 - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=list(Countrygroup= 
list(Netherlands,Germany),Country2=Spain), short=list(NL,SP)), 
city=Utrecht))


#recursive function

x4-lapply(do.call(c,c(x3,list(recursive=TRUE))),FUN=unlist)
 x4[2]-as.numeric(x4[2])
 x4[3]-as.logical(x4[3])
x4
$name
[1] Jeroen

$age
[1] 27

$married
[1] FALSE

$home.country.name.Countrygroup1
[1] Netherlands

$home.country.name.Countrygroup2
[1] Germany

$home.country.name.Country2
[1] Spain

$home.country.short1
[1] NL

$home.country.short2
[1] SP

$home.city
[1] Utrecht


 identical(x4,flatlist(x3))
[1] TRUE


A.K.







- Original Message -
From: Jeroen Ooms jeroen.o...@stat.ucla.edu
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Tuesday, June 26, 2012 6:55 PM
Subject: Re: [R] flatten lists

Alright, but I need something recursive for lists with arbitrary deepness.



On Tue, Jun 26, 2012 at 3:37 PM, arun smartpink...@yahoo.com wrote:
 Hi,

 Try:

 do.call(c,do.call(c,x))

 x1-do.call(c,do.call(c,x))
  x2-flatlist(x)
  identical(x1,x2)
 [1] TRUE



 A.K.



 - Original Message -
 From: Jeroen Ooms jeroen.o...@stat.ucla.edu
 To: Neal Fultz nfu...@gmail.com
 Cc: r-help@r-project.org
 Sent: Tuesday, June 26, 2012 6:23 PM
 Subject: Re: [R] flatten lists

 Hmm that doesn't seem to work if the original list is nested more than
 2 levels deep. I should have probably given a better example:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




 On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about formatting Dates

2012-06-26 Thread R. Michael Weylandt
On Tue, Jun 26, 2012 at 10:54 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Dear R People:

 I have dates as factors in the following:

 poudel.df$DATE
 [1] 1/2/2011  1/4/2011  1/4/2011  1/4/2011  1/6/2011  1/7/2011  1/8/2011
 [8] 1/9/2011  1/10/2011
 Levels: 1/10/2011 1/2/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 1/9/2011


 I want them to be regular dates which can be sorted, etc.

 But when I did this:

 as.character(poudel.df$DATE)
 [1] 1/2/2011  1/4/2011  1/4/2011  1/4/2011  1/6/2011  1/7/2011
 [7] 1/8/2011  1/9/2011  1/10/2011

 and
 as.Date(as.character(poudel.df$DATE),%m/%d/$Y)

Right about ...
 ^

should be a percent instead of a dollar sign.

Also, probably can't hurt to used a named argument (but I don't think
that's the problem here)

In the future dput()-ery would be much appreciated.

Michael

 [1] NA NA NA NA NA NA NA NA NA

 because the dates do not have leading zeros.

 There are approximately 30 years of nearly daily data in the entire set.

 Any suggestions would be much appreciated.

 Sincerely,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A solution for question about formatting Dates

2012-06-26 Thread Erin Hodgess
Hello again:

Here is a solution to the dates without leading zeros:

pou1 - function(x) {
 #Note:  x is a data frame
 #Assume that Column 1 has the date
 #Column 2 has station
 #Column 3 has min  
 #Column 4 has max
 library(stringr)
 w - character(length=nrow(x))
 z - str_split(x[,1],/)
 for(i in 1:nrow(x)) {
   u -  str_pad(z[[i]][1:3],width=2,pad=0)
   w[i] - paste(u,sep=,collapse=/)

   }
 a - as.Date(w,%m/%d/%Y)

This is not particularly elegant, but it does the trick.


Thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chisq.test

2012-06-26 Thread Rolf Turner

On 27/06/12 08:54, arun wrote:


Hi,

The error is due to less than 5 observations in some cells.


NO, NO, NO  It's not the observations that matter, it is
the ***EXPECTED COUNTS***.  These must all be at least
5 in order for the null distribution of the test statistic to be
adequately approximated by a chi-squared distribution.

cheers,

Rolf Turner


You can try,
fisher.test(tabele)
 Fisher's Exact Test for Count Data

data:  tabele
p-value = 0.0998
alternative hypothesis: two.sided

A.K.



- Original Message -
From: Omphalodes Verna omphalodes.ve...@yahoo.com
To: r-help@r-project.org r-help@r-project.org
Cc:
Sent: Tuesday, June 26, 2012 2:27 PM
Subject: [R] chisq.test

Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example:

 tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
 dimnames(tabela) - list(
 SEX = c(M,F),
 HAIR = c(Brown, Black, Red, Blonde))
 addmargins(tabele)
 prop.table(tabele)
 chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.

Thanks a lot to all, OV

 [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A solution for question about formatting Dates

2012-06-26 Thread R. Michael Weylandt
Please don't change subject lines for follow-on comments. It messes up
threading in most readers: e.g.,
https://stat.ethz.ch/pipermail/r-help/2012-June/thread.html


Michael

On Tue, Jun 26, 2012 at 11:57 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Hello again:

 Here is a solution to the dates without leading zeros:

 pou1 - function(x) {
     #Note:  x is a data frame
     #Assume that Column 1 has the date
     #Column 2 has station
     #Column 3 has min
     #Column 4 has max
     library(stringr)
     w - character(length=nrow(x))
     z - str_split(x[,1],/)
     for(i in 1:nrow(x)) {
           u -  str_pad(z[[i]][1:3],width=2,pad=0)
           w[i] - paste(u,sep=,collapse=/)

           }
     a - as.Date(w,%m/%d/%Y)

 This is not particularly elegant, but it does the trick.


 Thanks,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove empty levels in subset

2012-06-26 Thread svo
Thank you very much. The advice I followed (and which, for some reason, I do
not see here right now) was to use 'droplevels'. I needed the command for
several variables at the same time, so this was very convenient.


Hello,

Have you tried 'droplevels':
test -data.frame(a=as.factor(rep(c(f1,f2,f3),10)),b=rep(c(1,2,3),10)) 
test2 - subset(test,test$a==f1) 
summary(test2)
  ab
 f1:10   Min.   :1  
 f2: 0   1st Qu.:1  
 f3: 0   Median :1  
 Mean   :1  
 3rd Qu.:1  
 Max.   :1  
test3-droplevels(test2)
summary(test3)
  ab
 f1:10   Min.   :1  
 1st Qu.:1  
 Median :1  
 Mean   :1  
 3rd Qu.:1  
 Max.   :1  
A.K.


--
View this message in context: 
http://r.789695.n4.nabble.com/Remove-empty-levels-in-subset-tp873967p4634550.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] selecting rows by maximum value of one variables in dataframe nested by another Variable

2012-06-26 Thread Miriam
How could I select the rows of a dataset that have the maximum value in one 
variable and to do this nested in another variable. It is a dataframe in long 
format with repeated measures per subject.  
I was not successful using aggregate, because one of the columns has character 
values (and/or possibly because of another reason).
I would like to transfer something like this: 
subjecttime.ms  V3 
1   1   stringA
1   12  stringB
1   22  stringC
2   1   stringB
2   14  stringC
2   25  stringA
…. 
To something like this: 
subjecttime.ms  V3
1   22  stringC
2   25  stringA
… 

Thank you very much for you help!
Miriam
-- 

Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.