date:20120626

Hi Steve,

On Mon, Jun 25, 2012 at 9:47 PM, Steven Winter stevenwinte...@yahoo.com wrote:
 Given a set of latitude and longitude coordinates pairs (stored in variables 
 latitudevals and longitudevals), I would like to plot them onto the image 
 of a equirectangular world map. I would like to plot each coordinate pair 
 with a red circle, if possible. Does anyone have any suggestions as to how I 
 go about doing this, whether using R or using another program like Google 
 maps?

This might help:

library(maps)
map(world)
lon - c(-75, -70, 10)
lat - c(42, -45, 50)
points(lon, lat, col=red, pch=19)

Sarah


 Thank you,
 Steve
        [[alternative HTML version deleted]]



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] compare one field of dataframe with excel sheet using R

2012-06-26 Thread Jean V Adams

It would help if you provided an example for your data frame, and example 
for your spreadsheet, and more information on how to judge if the ppm 
values are similar.  Maybe this code will help you get started ...

# Here's an example data frame
mydf - data.frame(
compound=letters[1:10], 
ppm=abs(round(rnorm(10), 4)),
frequency=abs(round(rnorm(10), 4)))

# Here's an example data frame representing data from your spreadsheet
# You can read the data from the spreadsheet into R using the package 
XLConnect
# library(XLConnect)
# mysheet - readWorksheet(loadWorkbook(C:\\Temp\\Compounds.xlsx), 
sheet=Sheet1, startRow=1)
mysheet - data.frame(
compound=letters[sample(1:10, 100, replace=TRUE)],
libppm=abs(round(rnorm(100), 4)))

# combine the two example data frames
both - merge(mydf, mysheet)

# list the compounds in mydf that had ppm values within 0.1 of those in 
the spreadsheet
both$diff - abs(both$ppm-both$libppm)
both[both$diff0.1, ]

Jean


sathya7priya sathya7pr...@gmail.com wrote on 06/26/2012 03:34:22 AM:

 I have a data frame consisting of three columns(name of compund,ppm and
 frequency).Name contains string values .ppm and frequency contains 
numeric
 values with decimal points upto four digits.
 I have an excel sheet which is like a library.The first column contains 
the
 name of compounds and remaining column contains the ppm values of the
 compound which satisfy certain rules.The number of ppm values varies for
 each compound from 4 to 700.
 I need to compare the values of ppm from the dataframe and compare it 
with
 the ppm values in excel sheet and give the result if they are similar.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] significance level (p) for t-value in package zelig

2012-06-26 Thread Rune Haubo

My point was just that the situation in a cumulative link model is not
much different from a binomial glm - the binomial glm is even a
special case of the clm with only two response categories. And just
like summary(glm(, family=binomial)) reports z-values and computes
p-values by using the normal distribution as reference, one can do the
same in a cumulative link model by applying the same asymptotic
arguments.

In both models the variance is determined implicitly by the mean, so a
t-distribution is never involved.

Cheers,
Rune

On 25 June 2012 11:05, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
On 25/06/2012 09:32, Rune Haubo wrote:

According to standard likelihood theory these are actually not
t-values, but z-values, i.e., they asymptotically follow a standard
normal distribution under the null hypothesis. This means that you

Whose 'standard'?

It is conventional to call a value of t-like statistic (i.e. a ratio of the
form value/standard error) a 't-value'. And that is nothing to do with
'likelihood theory' (t statistics predate the term 'likelihood'!).

The separate issue is whether a t statistic is even approximately
t-distributed (and if so, on what df?), and another is if it is
asymptotically normal. For the latter you have to say what you mean by
'asymptotic': we have lost a lot of the context, but as this does not appear
to be IID univariate observations:

- 'standard likelihood theory' is unlikely to apply.

- standard asymptotics may well not be a good approximation (in regression
modelling, people tend to fit more complex models to large datasets, which
is often why a large dataset was collected).

- even for IID observations the derivation of the t distribution assumes
normality.

The difference between a t distribution and a normal distribution is
practically insignificant unless the df is small. And if the df is small,
one can rarely rely on the CLT for approximate normality

could use pnorm instead of pt to get the p-values, but an easier
solution is probably to use the clm-function (for Cumulative Link
Models) from the ordinal package - here you get the p-values
automatically.

Cheers,
Rune

On 23 June 2012 07:02, Bert Gunter gunter.ber...@gene.com wrote:

This advice is almost certainly false!

A t-statistic can be calculated, but the distribution will not
necessarily be student's t nor will the df be those of the rse. See,
for
example, rlm() in MASS, where values of the t-statistic are given without
p
values. If Brian Ripley says that p values cannot be straightforwardly
calculated by pt(), then believe it!

-- Bert

On Fri, Jun 22, 2012 at 9:30 PM, Özgür Asar oa...@metu.edu.tr wrote:

Michael,

Try

?pt

Best
Ozgur

--
View this message in context:

http://r.789695.n4.nabble.com/significance-level-p-for-t-value-in-package-zelig-tp4634252p4634271.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:

http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Rune Haubo Bojesen Christensen

Ph.D. Student, M.Sc. Eng.
Phone: (+45) 45 25 33 63
Mobile: (+45) 30 26 45 54

DTU Informatics, Section for Statistics
Technical University of Denmark, Build. 305, Room 122,
DK-2800 Kgs. Lyngby, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graph displays

2012-06-26 Thread John Kane

Sorry I misunderstood what you wanted.   Using ggplot2 and reshape2 which I 
imagine you will have to install, this should give you what you want

library(ggplot2)
library(reshape2)

xx1  -  melt(Dataset, id = c(Source))

p  -  ggplot( xx1 , aes(variable, value, fill= Source   )) +
geom_bar(position = dodge) +
   scale_y_continuous( Scale Values) +
   scale_x_discrete(X values) +
   opts( title = Graphing Exercise)
 
p




John Kane
Kingston ON Canada


 -Original Message-
 From: ricardosousa2...@clix.pt
 Sent: Tue, 26 Jun 2012 01:24:17 -0700 (PDT)
 To: r-help@r-project.org
 Subject: Re: [R] graph displays
 
 
 
 Good morning,
 Thanks for help.
 I can explain better what I am trying to do.
 I'm trying to read data from a file, separated by a tab, with the
 following
 code.
 
 
 Dataset-read.table(C:/Users/Administrator/Desktop/R/graph.txt,sep=\t,
 quote=\,header = TRUE)
 View(Dataset)
 dput(Dataset)
 
 View(Dataset)
 dput(Dataset)
 structure(list(Source = structure(1:3, .Label = c(A, B, C
 ), class = factor), X1000s = c(47L, 37L, 17L), X600s = c(63L,
 64L, 62L), X500s = c(75L, 45L, 25L), X250s = c(116L, 11L, 66L
 ), X100s = c(125L, 25L, 12L), X50s = c(129L, 19L, 29L), X10s = c(131L,
 61L, 91L), X5s = c(131L, 131L, 171L), X3s = c(131L, 186L, 186L
 ), X1s = c(131L, 186L, 186L)), .Names = c(Source, X1000s,
 X600s, X500s, X250s, X100s, X50s, X10s, X5s, X3s,
 X1s), class = data.frame, row.names = c(NA, -3L))
 Dataset
   Source X1000s X600s X500s X250s X100s X50s X10s X5s X3s X1s
 1  A 476375   116   125  129  131 131 131 131
 2  B 3764451125   19   61 131 186 186
 3  C 1762256612   29   91 171 186 186
 
 
 the idea is to get a graph like this excel, but in R,
 as I'm still in the learning phase of the R, I have little knowledge how
 to
 do
 
 http://imageshack.us/photo/my-images/51/testlt.png/
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/graph-displays-tp4634448p4634488.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Receive Notifications of Incoming Messages
Easily monitor multiple email accounts  access them with a click.
Visit http://www.inbox.com/notifier and check it out!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rrdf package for mac not working


Please contact the package maintainer.

Best,
Uwe Ligges

On 26.06.2012 00:41, Ricardo Pietrobon wrote:

rrdf is incredibly helpful, but I've notice that the rrdf package for mac
hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there
is still a plan to maintain that in the long run, or if there is some other
alternative to read RDF files.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rrdf package for mac not working




On 26.06.2012 00:41, Ricardo Pietrobon wrote:

rrdf is incredibly helpful, but I've notice that the rrdf package for mac
hasn't been working for some time: http://goo.gl/5Ukpn . wondering if there
is still a plan to maintain that in the long run, or if there is some other
alternative to read RDF files.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MuMIn - assessing variable importance following model averaging, z-stats/p-values or CI?


Please contact the package maintainer.

Best,
Uwe Ligges

On 26.06.2012 12:46, Robertson, Andrew wrote:

Dear R users,

Recent changes to the MuMIn package now means that the model averaging command 
(model.avg) no longer returns confidence intervals, but instead returns zvalues 
and corresponding pvalues for fixed effects included in models.

Previously I have used this package for model selection/averaging following 
Greuber et al (2011) where it suggests that one should use confidence intervals 
from model averaging to assess whether your fixed effects have an affect or not 
 (If confidence intervals do not span zero then variable has an affect).

Can anyone tell me why MuMIn now gives z-stats and p-values and whether these 
should be used to assess the 'significance'/importance of variables when model 
averaging?

Heres the example code of what I'm doing

#-#
ps-lmer(tranPS~(
 Sex+
 Age.Cat2+
 TOTAL+
 Propfarm+
 Maize+
 TOTAL:Propfarm+
 Maize:TOTAL+
 Maize:Propfarm+
 (1|Socialgroup)+(1|Year)+(1|Tattoo)),REML=FALSE, data=propspec)

pss-standardize(ps,standardize.y = FALSE)

psdrg-dredge(pss)

summary(model.avg(get.models(psdrg,subset=delta2)))
#-#

REf -Grueber, C.E., Nakagawa, S., Laws, R.J.  Jamieson, I.G. (2011) Multimodel 
inference in ecology and evolution: challenges and solutions. Journal of 
evolutionary biology, 24, 699-711.

Any help would be much appreciated

Regards

Andrew Robertson
PhD student
Centre for Ecology and Conservation
University of Exeter, Cornwall Campus
Tremough, Cornwall. TR10 9EZ
UK
Tel: 01326 371852
Email: ar...@exeter.ac.uk
Web page: 
http://biosciences.exeter.ac.uk/staff/postgradresearch/andrewrobertson/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Packaging Error

2012-06-26 Thread Васильченко Александр




On 26.06.2012 08:54, Mayank Bansal wrote:

I was trying to ByteCompile a package that I made. The package compiles 
successfully with byte compile set to FALSE.
When I set ByteCompile to TRUE, I receive the following error message while 
doing R CMD INSTALL

/usr/lib/R/bin/INSTALL: line 34: 9964 Done echo 'tools:::.install_packages()' 9965 
Segmentation fault | R_DEFAULT_PACKAGES= LC_COLLATE=C ${R_HOME}/bin/R $myArgs 
--slave --args ${args}

I have not been able to understand the problem. Can someone help me understand 
the problem so that it can be fixed?



Not without your package to try it out.

Best,
Uwe Ligges



Thanks,
Mayank



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting two histograms on one plot with hist function

2012-06-26 Thread Blignaut, M, Mej mb...@sun.ac.za

I would like to plot two data sets (frequency (y-axis) of mean values for 
0-1(x=axis)) on a single histogram for comparison. The hist() only allow the 
overlay of two histograms, and although barplot() allows beside=TRUE, it does 
not show frequency values (like hist) but rather all of the values. Is there 
any way that I can use the hist() to plot two data sets similar to the 
barplot(). Any help or advice will be appreciated!

Kind regards,
Marguerite





  
E-pos vrywaringsklousule

Hierdie e-pos mag vertroulike inligting bevat en mag regtens geprivilegeerd 
wees en is slegs bedoel vir die persoon aan wie dit geadresseer is. Indien u 
nie die bedoelde ontvanger is nie, word u hiermee in kennis gestel dat u 
hierdie dokument geensins mag gebruik, versprei of kopieer nie. Stel ook 
asseblief die sender onmiddellik per telefoon in kennis en vee die e-pos uit. 
Die Universiteit aanvaar nie aanspreeklikheid vir enige skade, verlies of 
uitgawe wat voortspruit uit hierdie e-pos en/of die oopmaak van enige l?ers 
aangeheg by hierdie e-pos nie.

E-mail disclaimer

This e-mail may contain confidential information and may...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread achaumont

Hello, 

I have a question about the “plot.predict” function in Frank Harrell's rms
package.
Do you know how to superpose in the same graph the prediction curve of ols
and raw data points?
Put most simply, I would like to combine these two graphs:

  fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE)
 p - Predict(fit_linear,x2,conf.int=FALSE)
 plot (p, ylim =c(-2,0.5), xlim=c(0,100))  # graph n°1

 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)  
 # graph n°2

Thanks all, 

Agnès



--
View this message in context: 
http://r.789695.n4.nabble.com/rms-package-superposition-prediction-curve-of-ols-and-data-points-tp4634503.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] shapiro.test()

2012-06-26 Thread reso


Hey,
today I wanted to use the shapiro.test() on data containing 3  
numerical values per group.

It is the first time that an NA was given back for some of the groups.
In the follwing an example of code and output is shown:



shapiro.test(c(0.000637806, 0.00175561, 0.001196708))


Shapiro-Wilk normality test

data:  c(0.000637806, 0.00175561, 0.001196708)
W = 1, p-value = NA

I am not able to find the bug in our data, so I think there might be a  
problem with the shapiro.test().


I use the following technical background:

platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  14.1
year   2011
month  12
day22
svn rev57956
language   R
version.string R version 2.14.1 (2011-12-22)


Thanks,
Judith

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Remove empty levels in subset

2012-06-26 Thread svo

Hi,

I have exactly the same question (how to remove empty levels in my subset),
but in my case the factor command does not work, because my dataframe is not
atomic

 Try this:

 test2$a - factor(test2$a)


R gives me the error message:

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

Do you have advice?

Thank you

--
View this message in context: 
http://r.789695.n4.nabble.com/Remove-empty-levels-in-subset-tp873967p4634499.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Intersection

Hello.
I have a problem with 2 dataframes. There are 2 columns - value and
dates. These dataframes have different dimension. Some dates coincide.
And I need to intersect them by dates and have on output two dataframes
with identical columns dates and new dimension . value have to
recieve in compliance with dates.
Regards, Aleksander.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rms package-superposition prediction curve of ols and data points

You could use points() instead of plot() for the second command.

Sarah

On Tue, Jun 26, 2012 at 8:37 AM, achaumont agnes.chaum...@live.be wrote:
 Hello,

 I have a question about the “plot.predict” function in Frank Harrell's rms
 package.
 Do you know how to superpose in the same graph the prediction curve of ols
 and raw data points?
 Put most simply, I would like to combine these two graphs:

  fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE, y=TRUE)
 p - Predict(fit_linear,x2,conf.int=FALSE)
 plot (p, ylim =c(-2,0.5), xlim=c(0,100))              # graph n°1

 z - plot (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)
 # graph n°2

 Thanks all,

 Agnès





-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection

That sounds like a job for merge(), but it's hard to be sure because
you didn't provide the information requested in the posting guide.

Sarah

On Tue, Jun 26, 2012 at 11:03 AM, Васильченко Александр
vasilchenko@gmail.com wrote:
 Hello.
 I have a problem with 2 dataframes. There are 2 columns - value and
 dates. These dataframes have different dimension. Some dates coincide.
 And I need to intersect them by dates and have on output two dataframes
 with identical columns dates and new dimension . value have to
 recieve in compliance with dates.
 Regards, Aleksander.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Remove empty levels in subset

Hi,

On Tue, Jun 26, 2012 at 8:06 AM, svo s.vanom...@uu.nl wrote:
 Hi,

 I have exactly the same question (how to remove empty levels in my subset),
 but in my case the factor command does not work, because my dataframe is not
 atomic

 Try this:

 test2$a - factor(test2$a)


 R gives me the error message:

 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

 Do you have advice?

I have two pieces of advice.

1. Don't try to use factor() on your entire data frame, but only on a
single column at a time, as shown in the example you included.

2. Provide an example of your data using something like
dput(head(mydata, 10)) so we can offer actual working code.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data.table vs plyr reg output

2012-06-26 Thread Geoffrey Smith

Hello.  The data.table package is very helpful in terms of speed.  But I am
having trouble actually using the output from linear regression.  Is there
any way to get the data.table output to be as pretty/useful as that from
the plyr package?  Below is an example.

library('data.table');
library('plyr');

REG - data.table(ID=c(rep('Frank',5),rep('Tony',5),rep('Ed',5)),
y=rnorm(15), x=rnorm(15), z=rnorm(15));
REG;

#The ddply function from the plyr package produces very neat and useful
output;
ddply(REG, .(ID), function(x) coef(lm(y ~ x + z, data=x)));

#The data.table output is fast, but not very neat (in terms of the order of
the coefficient estimates).  Is there any way to get the data.table output
to look more like the plyr/ddply output (without making a list for each
coef and running the regression two times)?
REG[, coef(lm(y ~ x + z)), by=ID];

Thank you!  Geoff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection

2012-06-26 Thread andrija djurovic

Hi. Try with following functions:

?intersection
?%in%
?[

Perhaps someone will provide you more help if you read and follow posting
guide  
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html

Andrija

On Tue, Jun 26, 2012 at 5:03 PM, ÷ÁÓÉÌØÞÅÎËÏ áÌÅËÓÁÎÄÒ 
vasilchenko@gmail.com wrote:

 Hello.
 I have a problem with 2 dataframes. There are 2 columns - value and
 dates. These dataframes have different dimension. Some dates coincide.
 And I need to intersect them by dates and have on output two dataframes
 with identical columns dates and new dimension . value have to
 recieve in compliance with dates.
 Regards, Aleksander.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shapiro.test()

2012-06-26 Thread Özgür Asar

See

?shapiro.test

...the number of non-missing values must be between 3 and 5000.

By the way, how reasonable testing normality of 3 values?

Best
ozgur

--
View this message in context: 
http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634520.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting two histograms on one plot with hist function

2012-06-26 Thread John Kane

Why not just plot the two histograms on the same scale in a 2 panel plot?

John Kane
Kingston ON Canada

 -Original Message-
 From: mb...@sun.ac.za
 Sent: Tue, 26 Jun 2012 15:24:55 +0200
 To: r-help@r-project.org
 Subject: [R] plotting two histograms on one plot with hist function

 I would like to plot two data sets (frequency (y-axis) of mean values for
 0-1(x=axis)) on a single histogram for comparison. The hist() only allow
 the overlay of two histograms, and although barplot() allows beside=TRUE,
 it does not show frequency values (like hist) but rather all of the
 values. Is there any way that I can use the hist() to plot two data sets
 similar to the barplot(). Any help or advice will be appreciated!

 Kind regards,
 Marguerite

 E-pos vrywaringsklousule

 Hierdie e-pos mag vertroulike inligting bevat en mag regtens
 geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit
 geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u
 hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik,
 versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per
 telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie
 aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit
 hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos
 nie.

 E-mail disclaimer

 This e-mail may contain confidential information and may...{{dropped:11}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread David Winsemius



On Jun 26, 2012, at 11:29 AM, Sarah Goslee wrote:


You could use points() instead of plot() for the second command.



Ummm. Maybe not. I think think that plot.Predict uses lattice  
graphics. You may need to use trellis.focus() followed by lpoints().  
Or use the + operation with suitable objects.


--
David.




Sarah

On Tue, Jun 26, 2012 at 8:37 AM, achaumont agnes.chaum...@live.be  
wrote:

Hello,

I have a question about the “plot.predict” function in Frank  
Harrell's rms

package.
Do you know how to superpose in the same graph the prediction curve  
of ols

and raw data points?
Put most simply, I would like to combine these two graphs:

 fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE,  
y=TRUE)

p - Predict(fit_linear,x2,conf.int=FALSE)
plot (p, ylim =c(-2,0.5), xlim=c(0,100))  # graph n°1


z - plot  
(x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)

# graph n°2


Thanks all,

Agnès






--
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting two histograms on one plot with hist function

2012-06-26 Thread ilai

On Tue, Jun 26, 2012 at 10:02 AM, John Kane jrkrid...@inbox.com wrote:

 Why not just plot the two histograms on the same scale in a 2 panel plot?


I think OP request was for comparison. Two panels may do, but why not a
barplot of the histograms in the same panel ?

barplot( rbind(
hist(rbeta(30,2,4),breaks=seq(0,1,.1),plot=F)$counts,
hist(rbeta(30,6,8),breaks=seq(0,1,.1),plot=F)$counts),
beside=T)

see str(hist(yourdata)) or ?hist

Cheers
Ilai



 John Kane
 Kingston ON Canada


  -Original Message-
  From: mb...@sun.ac.za
  Sent: Tue, 26 Jun 2012 15:24:55 +0200
  To: r-help@r-project.org
  Subject: [R] plotting two histograms on one plot with hist function
 
  I would like to plot two data sets (frequency (y-axis) of mean values for
  0-1(x=axis)) on a single histogram for comparison. The hist() only allow
  the overlay of two histograms, and although barplot() allows beside=TRUE,
  it does not show frequency values (like hist) but rather all of the
  values. Is there any way that I can use the hist() to plot two data sets
  similar to the barplot(). Any help or advice will be appreciated!
 
  Kind regards,
  Marguerite
 
 
 
 
 

  E-pos vrywaringsklousule
 
  Hierdie e-pos mag vertroulike inligting bevat en mag regtens
  geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit
  geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u
  hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik,
  versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per
  telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie
  aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit
  hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie e-pos
  nie.
 
  E-mail disclaimer
 
  This e-mail may contain confidential information and may...{{dropped:11}}
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk and
 most webmails

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shapiro.test()

2012-06-26 Thread Özgür Asar

Actually, your sample size is 3. Sorry for that.

Ozgur

--
View this message in context: 
http://r.789695.n4.nabble.com/shapiro-test-tp4634513p4634525.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] increase the usage of CPU and Memory

2012-06-26 Thread Christofer Bogaso


On 26-06-2012 16:33, Oliver Ruebenacker wrote:

  Hello Xi,

   If a program has input or output to disk or network, this may cause
it to wait and not use the available CPU.

   Output is usually buffered, but may cause delay if the buffer gets
full (I'm not sure though whether this is an issue with plenty of
memory available)

  Take care
  Oliver

On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote:

Dear All,

I have been searching online for help increasing my R code more efficiently
for almost a whole day, however, there is no solution to my case. So if
anyone could give any clue to solve my problem, I would be very appreciate
for you help. Thanks in advance.

Here is my issue:

My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA GTX
480  graphic card, and I am using a 64-bit version of R under 64-bit Windows
.

I am running a for loop to generate a 461*5 matrix data, which is coming
from the coefficients of 5 models. The loop would produce 5 values one
time, and it will run 461 times in total. I have tried to run the code
inside the loop just once, it will cost almost 10 seconds, so if
we intuitively calculate the time of the whole loop will cost, it would be
4610 seconds, equal to almost one and a half hours, which is exactly the
whole loop taking indeed. But I have to run this kinda loop for
30 data-sets!

Although I thought I am using a not-bad at all desktop, I checked the usage
of CPU and memory during my running R code, and found out the whole code
just used 15% of CPU and 10% of memory. Does anyone have the same issue
with me? or Does anyone know some methods to shorten the running time and
increase the usage of CPU and memory?

Many thanks,
Xi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Hi Oliver, can you please give some details on what you are meaning by 
'Output is usually buffered'?


Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Intersection

Hi,

Try this:
 dat1-data.frame(value=c(15,20,25,30,45,50),dates=c(2005-05-25,2005-06-25,2005-07-25,2005-08-25,2005-09-25,2005-10-25))
dat2-data.frame(value=c(15,20,25,50),dates=c(2005-05-25,2005-06-25,2005-07-25,2005-10-25))
 merge(dat1,dat2, by=dates)
   dates value.x value.y
1 2005-05-25  15  15
2 2005-06-25  20  20
3 2005-07-25  25  25
4 2005-10-25  50  50
or
subset(dat1,(dates %in% dat2$dates))
  value  dates
1    15 2005-05-25
2    20 2005-06-25
3    25 2005-07-25
6    50 2005-10-25

I hope this is what you meant.  You mentioned the datasets have different 
dimensions.  Not sure what you meant.

A.K.



- Original Message -
From: Васильченко Александр vasilchenko@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Tuesday, June 26, 2012 11:03 AM
Subject: [R] Intersection

Hello.
I have a problem with 2 dataframes. There are 2 columns - value and
dates. These dataframes have different dimension. Some dates coincide.
And I need to intersect them by dates and have on output two dataframes
with identical columns dates and new dimension . value have to
recieve in compliance with dates.
Regards, Aleksander.

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How To Setup hunspell in R

2012-06-26 Thread Ribbis

Do you make any progress in solving this?  I'm having the same struggle. 
Thanks.

--
View this message in context: 
http://r.789695.n4.nabble.com/How-To-Setup-hunspell-in-R-tp4541801p4634523.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ljung-Box test (Box.test)

2012-06-26 Thread Steven Winter

I fit a simple linear model y = bX to a data set today, and that produced 24 
residuals (I have 24 data points, one for each year from 1984-2007). I would 
like to test the time-independence of the residuals of my model, and I was 
recommended by my supervisor to use the Ljung-Box test. The Box.test function 
in R takes 4 arguments: 

x a numeric vector or univariate time series. 
lag the statistic will be based on lag autocorrelation
coefficients. 
type test to be performed: partial matching is used. 
fitdf number of degrees of freedom to be subtracted if x is a series of 
residuals. 

Unfortunately, I never took a statistics class where I learned the Ljung-Box 
test, and information about it online is hard to find. What does lag mean, 
and what value would you guys recommend I use for the test? Also, what does 
fitdf represent, and what would the value for that parameter be in my case? 
Finally, the value of x is a vector of my 24 residuals, correct?

Thank you all so much. I apologize for the basic nature of the question.

Steven
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shapiro.test()

2012-06-26 Thread peter dalgaard


On Jun 26, 2012, at 16:43 , r...@uni-potsdam.de wrote:

 Hey,
 today I wanted to use the shapiro.test() on data containing 3 numerical 
 values per group.
 It is the first time that an NA was given back for some of the groups.
 In the follwing an example of code and output is shown:
 
 
 shapiro.test(c(0.000637806, 0.00175561, 0.001196708))
 
   Shapiro-Wilk normality test
 
 data:  c(0.000637806, 0.00175561, 0.001196708)
 W = 1, p-value = NA
 
 I am not able to find the bug in our data, so I think there might be a 
 problem with the shapiro.test().

The clue is that

 diff(sort(c(0.000637806, 0.00175561, 0.001196708)))
[1] 0.000558902 0.000558902

which is either an extreme coincidence or a sign that your data are not 
independent samples from a continuous distribution. Since the normal quantiles 
are also equidistant, you get a correlation of W=1 in the QQ-plot, and 
apparently this triggers the NA p-value. 

I suppose returning p=1.0 would arguably be a better choice for this case, but 
it _is_ pretty extreme. 

-pd

 
 I use the following technical background:
 
 platform   x86_64-pc-linux-gnu
 arch   x86_64
 os linux-gnu
 system x86_64, linux-gnu
 status
 major  2
 minor  14.1
 year   2011
 month  12
 day22
 svn rev57956
 language   R
 version.string R version 2.14.1 (2011-12-22)
 
 
 Thanks,
 Judith
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] increase the usage of CPU and Memory

2012-06-26 Thread Oliver Ruebenacker

 Hello Christopher,

  If a process has data to write to hard disk, the data is usually
written to a buffer in memory, and from there it is written to the
hard disk independently of the CPU. Since writing to memory is much
faster than writing to hard disk, this allows the process to run
faster. To the process, it appears as if the data is already on disk.
If, however, the buffer runs full, an attempt by a process to write
more data will cause the process to wait until space is available in
the buffer. If a process spends time waiting, it means it does not use
all the CPU it could otherwise.

  I don't know how much input is buffered, but since only the process
knows where it will request input from next, this limits ways to
buffer input. I'm assuming though, that if you open a file and read
the first few bytes, some more bytes may be read into a buffer since
the process is likely to request them next. But in any case, input
form disk or network is almost certain to cause waiting times and
therefore decreases used CPU time.

 Take care
 Oliver

On Tue, Jun 26, 2012 at 1:53 PM, Christofer Bogaso
bogaso.christo...@gmail.com wrote:
 On 26-06-2012 16:33, Oliver Ruebenacker wrote:

      Hello Xi,

   If a program has input or output to disk or network, this may cause
 it to wait and not use the available CPU.

   Output is usually buffered, but may cause delay if the buffer gets
 full (I'm not sure though whether this is an issue with plenty of
 memory available)

      Take care
      Oliver

 On Mon, Jun 25, 2012 at 8:07 PM, Xi amzhan...@gmail.com wrote:

 Dear All,

 I have been searching online for help increasing my R code more
 efficiently
 for almost a whole day, however, there is no solution to my case. So if
 anyone could give any clue to solve my problem, I would be very
 appreciate
 for you help. Thanks in advance.

 Here is my issue:

 My desktop is with i7-950 Quad-core CPU with 24Gb memory, and a  NVIDIA
 GTX
 480  graphic card, and I am using a 64-bit version of R under 64-bit
 Windows
 .

 I am running a for loop to generate a 461*5 matrix data, which is
 coming
 from the coefficients of 5 models. The loop would produce 5 values one
 time, and it will run 461 times in total. I have tried to run the code
 inside the loop just once, it will cost almost 10 seconds, so if
 we intuitively calculate the time of the whole loop will cost, it would
 be
 4610 seconds, equal to almost one and a half hours, which is exactly the
 whole loop taking indeed. But I have to run this kinda loop for
 30 data-sets!

 Although I thought I am using a not-bad at all desktop, I checked the
 usage
 of CPU and memory during my running R code, and found out the whole code
 just used 15% of CPU and 10% of memory. Does anyone have the same issue
 with me? or Does anyone know some methods to shorten the running time and
 increase the usage of CPU and memory?

 Many thanks,
 Xi

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 Hi Oliver, can you please give some details on what you are meaning by
 'Output is usually buffered'?

 Thanks and regards,




-- 
Oliver Ruebenacker, Bioinformatics and Network Analysis Consultant
President and Founder of Knowomics
(http://www.knowomics.com/wiki/Oliver_Ruebenacker)
Consultant at Predictive Medicine
(http://predmed.com/people/oliverruebenacker.html)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Compile C files

2012-06-26 Thread Frederico Mestre

Hello:

 

Sorry, this might look like a beginner question, but I'm just starting to
work on the C and R interface.

 

I'm trying to compile a C file (with a function) to load it to an R function
but, in the command line I keep getting a lot of errors, like:

 

C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected
declaration specifiers before 'SEXP'

 

I've been able to compile this file before, so I 

 

I'm using Windows 7 in a 64 bits computer.

 

Best regards,

 

Frederico 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?

2012-06-26 Thread Terry Therneau

I'm finally back from vacation and looking at your email.

1. The primary mistake is in your call, where you say
 fit - survfit(mod.allison.5, newdata.1, id=Id)

This will use the character string Id as the value of the identifier, 
not the data.  The effect is exactly the same as the difference between 
print(x) and print('x').

2. In reply to John's comment that all the id values are the same.  It 
is correct.  Normally the survfit routine is used to produce multiple 
curves, one curve per line of the input data, for time-independent 
variables.   The presence of an id argument is used to tell it that 
there are multiple lines per subject in the data, e.g. time-dependent 
covariates.  So even though there is only one curve being produced we 
need an id statement to trigger the behavior.
   If you only want one curve for one individual, then individual=TRUE 
is an alternate, as John pointed out.

3. It's very important to specify the Surv object and the formula 
directly in the coxph function ...
Yes, I agree.  I always use your suggested form because it gives better 
documentation -- variable names are directly visible in the coxph call.  
I don't understand the attraction of the other form, but lot's of people 
use it.
Why did it go wrong?  Because the survfit function was evaluating
 Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + 
age + age:stop + pro, data=newdata.1

The length of the variables will be different.  The error message comes 
from the R internals, not my program.

Terry Therneau


On 06/16/2012 08:04 AM, Jürgen Biedermann wrote:

 Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...

 I don't find a solution to use the survfit function (package:
 survival)  for a defined pattern of covariates with a Cox-Model
 including a time dependent interaction term. Somehow the definition of
 my newdata argument seems to be erroneous.
 I already googled the problem, found many persons having the same or a
 similar problem, but still no solution.
 I want to stress that my time-dependent covariate does not depend on the
 failure of an individual (in this case it doesn't seem sensible to
 predict a survivor function for an individual). Rather one of my effects
 declines with time (time-dependent coefficient).

 For illustration, I use the example of John Fox's paper Cox
 Proportional - Hazards Regression for Survival Data.
 http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf

 Do you know any help? See code below

 Thanks very much in advance
 Jürgen Biedermann

 #
 #Code

 Rossi -
 read.table(http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt;,
 header=T)

 Rossi.2 - fold(Rossi, time='week',
  event='arrest', cov=11:62, cov.names='employed')

 # see below for the fold function from John Fox

 # modeling an interaction with time (Page 14)

 mod.allison.5 - coxph(Surv(start, stop, arrest.time) ~
  fin + age + age:stop + prio,
  data=Rossi.2)
 mod.allison.5

 # Attempt to get the survivor function of a person with age=30, fin=0
 and prio=5

 newdata.1 -
 data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
 fit - survfit(mod.allison.5,newdata.1,id=Id)

 Error message:

 Fehler in model.frame.default(data = newdata.1, id = Id, formula =
 Surv(start,  :
Variablenlängen sind unterschiedlich (gefunden für '(id)')

 -- failure, length of variables are different.

 #-
 fold - function(data, time, event, cov,
  cov.names=paste('covariate', '.', 1:ncovs, sep=),
  suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
  vlag - function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
  xlag - function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
  all.cov - unlist(cov)
  if (!is.list(cov)) cov - list(cov)
  ncovs - length(cov)
  nrow - nrow(data)
  ncol - ncol(data)
  ncov - length(cov[[1]])
  nobs - nrow*ncov
  if (length(unique(c(sapply(cov, length), length(cov.times)-1)))  1)
  stop(paste(
  all elements of cov must be of the same length and \n,
  cov.times must have one more entry than each element of
 cov.))
  var.names - names(data)
  subjects - rownames(data)
  omit.cols - if (!common.times) c(all.cov, cov.times) else all.cov
  keep.cols - (1:ncol)[-omit.cols]
  nkeep - length(keep.cols)
  if (is.numeric(event)) event - var.names[event]
  times - if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
  else data[, cov.times]
  new.data - matrix(Inf, nobs, 3 + ncovs + nkeep)
  rownames - rep(, nobs)
  colnames(new.data) - c('start', 'stop', paste(event, suffix, 
 sep=),
  var.names[-omit.cols], cov.names)
  end.row - 0
  for (i in 1:nrow){
  start.row - end.row + 1
  end.row - end.row + ncov
  start -

Re: [R] Ljung-Box test (Box.test)

2012-06-26 Thread Rui Barradas


Hello,

That's a statistics question, but it's also about using an R function.

The Ljung-Box test isn't supposed to be used in such a context, to test 
the residuals of an ols y = bX + e. It is used to test time independence 
of the original series or of the residuals of an ARMA(p, q) fit.


In both cases you are right, 'x' is a series.
'lag' can be explained as follows: you have a time series and want to 
know if the value observed today depends on what was observed in the 
past. Then, a linear regression of today on yesterday could be


X[t] = b[1]*X[t-1] + e[t], e ~ Normal(0, sigma^2)

A linear regression on two time units in the past would be

X[t] = b[1]*X[t-1] + b[2]*X[t-2] + e[t], e ~ Normal(0, sigma^2)

etc. This is a regression of the series on itself lagged by a certain 
number of time units, the present is regressed on the past. Function 
ar() fits this kind of model to a time series. In the first case, the 
order is p=1, in the second, p=2.


Now, in the first case, is there second order serial correlation? Test 
the residuals with lag=2, fitdf=1, the value of p. Third order? lag=3, 
fitdf=p=1, etc.


You are NOT fitting this type of model, so the Ljung-Box test is 
misused. Test the original series with default parameters, lag=1. If 
there is serial correlation, fit an AR (Auto-Regressive) model with 
ar(). See the help page ?ar. And see a statiscian with experience in 
time series. It's a world on its own, I haven't even mentioned 
seasonality. And almost everything else about time series.


Do ask someone near you.

Hope this helps,

Rui Barradas
Em 26-06-2012 19:01, Steven Winter escreveu:

I fit a simple linear model y = bX to a data set today, and that produced 24 
residuals (I have 24 data points, one for each year from 1984-2007). I would 
like to test the time-independence of the residuals of my model, and I was 
recommended by my supervisor to use the Ljung-Box test. The Box.test function 
in R takes 4 arguments:

x a numeric vector or univariate time series.
lag the statistic will be based on lag autocorrelation
coefficients.
type test to be performed: partial matching is used.
fitdf number of degrees of freedom to be subtracted if x is a series of 
residuals.

Unfortunately, I never took a statistics class where I learned the Ljung-Box test, and information 
about it online is hard to find. What does lag mean, and what value would you guys 
recommend I use for the test? Also, what does fitdf represent, and what would the value 
for that parameter be in my case? Finally, the value of x is a vector of my 24 residuals, correct?

Thank you all so much. I apologize for the basic nature of the question.

Steven
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Storing whole regression results

2012-06-26 Thread Kevin Chang

Hello seasons R users,

 

Is it possible to store a complete regression result into an array? I've
already been able to save individual regression coefficients, but would like
to store the whole regression results into different arrays through a loop.

 

So that in under different quantiles regressions, I would be able to create
a loop and store the full regression result each time into a different array
for printing.

The only way I can think of is to pre-generate a whole set of arrays and
matrices to individually store each regression coefficients one at a time.

 

Thank you,

 

Kevin

 

Master of Science Student |  University of Guelph Department of Food,
Resource and Agricultural Economics 
J.D. MacLachlan Building - Room 002 Guelph, ON N1G 2W1 
Webpage:http://fare.uoguelph.ca/users/kchang01
http://fare.uoguelph.ca/users/kchang01 
Email:mailto:kchan...@uoguelph.ca kchan...@uoguelph.ca

Mobile:226-979-2813

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Storing whole regression results

You can store entire regression results in a list, then use lapply()
to retrieve individual coefficients as desired.

Lists are very powerful for managing odd data formats, and no loops needed.

Sarah

On Tue, Jun 26, 2012 at 4:19 PM, Kevin Chang kchan...@uoguelph.ca wrote:
 Hello seasons R users,



 Is it possible to store a complete regression result into an array? I've
 already been able to save individual regression coefficients, but would like
 to store the whole regression results into different arrays through a loop.



 So that in under different quantiles regressions, I would be able to create
 a loop and store the full regression result each time into a different array
 for printing.

 The only way I can think of is to pre-generate a whole set of arrays and
 matrices to individually store each regression coefficients one at a time.



 Thank you,



 Kevin




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] chisq.test

2012-06-26 Thread Omphalodes Verna

Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example: 

        tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
        dimnames(tabela) - list(
        SEX = c(M,F),
        HAIR = c(Brown, Black, Red, Blonde))
        addmargins(tabele)
        prop.table(tabele)
        chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.

Thanks a lot to all, OV

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread SSimek

Hello, 

I have count data that illustrate the presence or absence of individuals in
my study population. I created a grid cell across the study area and
calcuated a count value for each individual per season per year for each
grid cell. The count value is the number of time an individual was present
in each grid cell.  For illustration my data columns look something like
this and are repeated for each individual:

Cell_ID Param1  Param2  Param3  Param4  COUNT   NameYearSeason  Cov
1   160.565994  729.08  15037930.3  0   AA  2010AUT 
Open
1   160.565994  729.08  15037930.3  22  AA  2011SPR 
Open
1   160.565994  729.08  15037930.3  12  AA  2009SUM 
Open
1   160.565994  729.08  15037930.3  0   AA  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 0   AA  2010AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 16  AA  2011SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2009SUM 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2010SUM 
oldHard
…   
563 86.777099   612.69  977 4474.6  62  AA  2010AUT 
Water
563 86.777099   612.69  977 4474.6  12  AA  2011SPR 
Water
563 86.777099   612.69  977 4474.6  55  AA  2009SUM 
Water


1   160.565994  729.08  15037930.3  0   BB  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 72  BB  2010SUM 
oldHard
5   160.75  614.95  1503.31 2878.98 16  BB  2010SUM medHard
6   170.404998  510.58  1489.44 743.14  0   BB  2010SUM 
Water
…   
563 86.777099   612.69  977 4474.6  0   BB  2010SUM 
Water


1   160.565994  729.08  15037930.3  14  C   2005AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006SPR 
Open
1   160.565994  729.08  15037930.3  56  C   2007SPR 
Open
1   160.565994  729.08  15037930.3  0   C   2006SUM 
Open
2   169.427001  491.87  1503.31 5101.09 124 C   2005AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 231 C   2006AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 889 C   2006SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   C   2007SPR 
oldHard
…   
563 86.777099   612.69  977 4474.6  0   C   2005
AUT Water
563 86.777099   612.69  977 4474.6  231 C   2006
AUT Water
563 86.777099   612.69  977 4474.6  185 C   2006
SPR Water
563 86.777099   612.69  977 4474.6  123 C   2007
SPR Water
563 86.777099   612.69  977 4474.6  52  C   2006
SUM Water



I have 563 grid cells across my study area and each individual has 1-563
cells associated for each year and each season the individual was monitored.
Therefore my grid cells are repeated. I end up with 71,000 records and 925
records have a Count value 0; which means 70,075 records have a Count value
= 0. 

I wanted to run a zero inflated poisson model to determine mixed effects (of
parameters) with individual as the random effect. But I have been advised
two things:

1. I cannot run a zero inflated poisson model because my data are too
extremely inflated (i.e. 70,075 vs 925) and 

2. I cannot run the model with each cell repeated for each individual. I am
told the model doesn't recognize that Cell_ID #1 for individual A is the
same Cell_ID #1 for individual B.

Does anyone know if either or both of these points are true? I would
appreciate any thoughts, advice, or suggestions. 

Thanks!

-Stephanie

--
View this message in context: 
http://r.789695.n4.nabble.com/Zero-inflated-is-there-a-limit-to-the-level-of-inflation-tp4634532.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented,

[R] Figuring out encodings of PDFs in R

2012-06-26 Thread Jonas Michaelis

Dear list,

I am currently scraping some text data from several PDFs using the
readPDF() function in the tm package. This all works very well and in most
cases the encoding seems to be latin1 - in some, however, it is not. Is
there a good way in R to check character encodings? I found the functions
is.utf8() and is.local() in the tau package but that obviously only gets me
so far.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] chisq.test

2012-06-26 Thread Peter Ehlers


On 2012-06-26 11:27, Omphalodes Verna wrote:

Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example:

 tabele- matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
 dimnames(tabela)- list(
 SEX = c(M,F),
 HAIR = c(Brown, Black, Red, Blonde))
 addmargins(tabele)
 prop.table(tabele)
 chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.


Do this:

  ct - chisq.test(tabele)
  ct$expected

If that does not give you a sufficient hint, then you need
to review the assumptions underlying the chisquare test.

Peter Ehlers



Thanks a lot to all, OV

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] chisq.test

2012-06-26 Thread David L Carlson

The warning means that you have many cells with expected values less than 5
(4 of 8 cells in this case) so that the chi square estimate may be inflated.
The good news is that the probability of the inflated chi square is .0978
which you probably would not consider to be significant anyway. If you want
to get a simulated p value using Monte Carlo simulation (see the references
in the manual page for chisq.test), just change the call to

chisq.test(tabele, simulate.p.value=TRUE, B=2000)

When I run this five times, I get probability estimates ranging from .09795
to .1089.

Alternatively, get more data.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Omphalodes Verna
 Sent: Tuesday, June 26, 2012 1:28 PM
 To: r-help@r-project.org
 Subject: [R] chisq.test
 
 Dear list!
 
 I would like to calculate chisq.test on simple data set with 70
 observations, but the output is ''Warning message:''
 
 Warning message:
 In chisq.test(tabele) : Chi-squared approximation may be incorrect
 
 
 Here is an example:
 
 tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow
 = TRUE)
 dimnames(tabela) - list(
 SEX = c(M,F),
 HAIR = c(Brown, Black, Red, Blonde))
 addmargins(tabele)
 prop.table(tabele)
 chisq.test(tabele)
 Please, give me an advice / suggestion / recommendation.
 
 Thanks a lot to all, OV
 
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] flatten lists

2012-06-26 Thread Jeroen Ooms

I am looking for a function to flatten a list to a list of only 1
level deep. Very similar to unlist, however I don't want to turn it
into a vector because then everything will be casted to character
vectors:

x - list(name=Jeroen, age=27, married=FALSE,
home=list(country=Netherlands, city=Utrecht))
unlist(x)

This function sort of does it:

flatlist - function(mylist){
  lapply(rapply(mylist, enquote, how=unlist), eval)
}

flatlist(x)

However it is a bit slow. Is there a more native way?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting two histograms on one plot with hist function

2012-06-26 Thread John Kane

Oh, I had not thought of it in those terms.  It does make sense now.

   John Kane
   Kingston ON Canada

   -Original Message-
   From: ke...@math.montana.edu
   Sent: Tue, 26 Jun 2012 10:57:31 -0600
   To: jrkrid...@inbox.com
   Subject: Re: [R] plotting two histograms on one plot with hist function

   On Tue, Jun 26, 2012 at 10:02 AM, John Kane [1]jrkrid...@inbox.com wrote:

 Why not just plot the two histograms on the same scale in a 2 panel plot?

   I think OP request was for comparison. Two panels may do, but why not a
   barplot of the histograms in the same panel ?
   barplot( rbind(
   hist(rbeta(30,2,4),breaks=seq(0,1,.1),plot=F)$counts,
   hist(rbeta(30,6,8),breaks=seq(0,1,.1),plot=F)$counts),
   beside=T)
   see str(hist(yourdata)) or ?hist
   Cheers
   Ilai

 John Kane
 Kingston ON Canada
  -Original Message-
  From: [2]mb...@sun.ac.za
  Sent: Tue, 26 Jun 2012 15:24:55 +0200
  To: [3]r-help@r-project.org
  Subject: [R] plotting two histograms on one plot with hist function

  I would like to plot two data sets (frequency (y-axis) of mean values
 for
  0-1(x=axis)) on a single histogram for comparison. The hist() only allow
   the  overlay  of  two  histograms, and although barplot() allows
 beside=TRUE,
  it does not show frequency values (like hist) but rather all of the
  values. Is there any way that I can use the hist() to plot two data sets
  similar to the barplot(). Any help or advice will be appreciated!

  Kind regards,
  Marguerite

  E-pos vrywaringsklousule

  Hierdie e-pos mag vertroulike inligting bevat en mag regtens
  geprivilegeerd wees en is slegs bedoel vir die persoon aan wie dit
  geadresseer is. Indien u nie die bedoelde ontvanger is nie, word u
  hiermee in kennis gestel dat u hierdie dokument geensins mag gebruik,
  versprei of kopieer nie. Stel ook asseblief die sender onmiddellik per
  telefoon in kennis en vee die e-pos uit. Die Universiteit aanvaar nie
  aanspreeklikheid vir enige skade, verlies of uitgawe wat voortspruit uit
  hierdie e-pos en/of die oopmaak van enige l?ers aangeheg by hierdie
 e-pos
  nie.

  E-mail disclaimer

This   e-mail   may   contain   confidential   information  and
 may...{{dropped:11}}

  __
  [4]R-help@r-project.org mailing list
  [5]https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  [6]http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 GET   FREE   SMILEYS   FOR   YOUR  IMEMAIL  -  Learn  more  at
 [7]http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and
 most webmails
 __
 [8]R-help@r-project.org mailing list
 [9]https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 [10]http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 _

   Free Online Photosharing - Share your photos online with your friends and
   family!
   Visit [11]http://www.inbox.com/photosharing to find out more!

References

   1. mailto:jrkrid...@inbox.com
   2. mailto:mb...@sun.ac.za
   3. mailto:r-help@r-project.org
   4. mailto:R-help@r-project.org
   5. https://stat.ethz.ch/mailman/listinfo/r-help
   6. http://www.R-project.org/posting-guide.html
   7. http://www.inbox.com/smileys
   8. mailto:R-help@r-project.org
   9. https://stat.ethz.ch/mailman/listinfo/r-help
  10. http://www.R-project.org/posting-guide.html
  11. http://www.inbox.com/photosharing
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread Marc Schwartz

On Jun 26, 2012, at 2:10 PM, SSimek wrote:

 Hello, 
 
 I have count data that illustrate the presence or absence of individuals in
 my study population. I created a grid cell across the study area and
 calcuated a count value for each individual per season per year for each
 grid cell. The count value is the number of time an individual was present
 in each grid cell.  For illustration my data columns look something like
 this and are repeated for each individual:
 
 Cell_ID   Param1  Param2  Param3  Param4  COUNT   NameYearSeason  
 Cov
 1 160.565994  729.08  15037930.3  0   AA  2010AUT 
 Open
 1 160.565994  729.08  15037930.3  22  AA  2011SPR 
 Open
 1 160.565994  729.08  15037930.3  12  AA  2009SUM 
 Open
 1 160.565994  729.08  15037930.3  0   AA  2010SUM 
 Open
 2 169.427001  491.87  1503.31 5101.09 0   AA  2010AUT 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 16  AA  2011SPR 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 0   AA  2009SUM 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 0   AA  2010SUM 
 oldHard
 … 
 563   86.777099   612.69  977 4474.6  62  AA  2010AUT 
 Water
 563   86.777099   612.69  977 4474.6  12  AA  2011SPR 
 Water
 563   86.777099   612.69  977 4474.6  55  AA  2009SUM 
 Water
   
   
 1 160.565994  729.08  15037930.3  0   BB  2010SUM 
 Open
 2 169.427001  491.87  1503.31 5101.09 72  BB  2010SUM 
 oldHard
 5 160.75  614.95  1503.31 2878.98 16  BB  2010SUM medHard
 6 170.404998  510.58  1489.44 743.14  0   BB  2010SUM 
 Water
 … 
 563   86.777099   612.69  977 4474.6  0   BB  2010SUM 
 Water
   
   
 1 160.565994  729.08  15037930.3  14  C   2005AUT 
 Open
 1 160.565994  729.08  15037930.3  0   C   2006AUT 
 Open
 1 160.565994  729.08  15037930.3  0   C   2006SPR 
 Open
 1 160.565994  729.08  15037930.3  56  C   2007SPR 
 Open
 1 160.565994  729.08  15037930.3  0   C   2006SUM 
 Open
 2 169.427001  491.87  1503.31 5101.09 124 C   2005AUT 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 231 C   2006AUT 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 889 C   2006SPR 
 oldHard
 2 169.427001  491.87  1503.31 5101.09 0   C   2007SPR 
 oldHard
 … 
 563   86.777099   612.69  977 4474.6  0   C   2005
 AUT Water
 563   86.777099   612.69  977 4474.6  231 C   2006
 AUT Water
 563   86.777099   612.69  977 4474.6  185 C   2006
 SPR Water
 563   86.777099   612.69  977 4474.6  123 C   2007
 SPR Water
 563   86.777099   612.69  977 4474.6  52  C   2006
 SUM Water
 
 
 
 I have 563 grid cells across my study area and each individual has 1-563
 cells associated for each year and each season the individual was monitored.
 Therefore my grid cells are repeated. I end up with 71,000 records and 925
 records have a Count value 0; which means 70,075 records have a Count value
 = 0. 
 
 I wanted to run a zero inflated poisson model to determine mixed effects (of
 parameters) with individual as the random effect. But I have been advised
 two things:
 
 1. I cannot run a zero inflated poisson model because my data are too
 extremely inflated (i.e. 70,075 vs 925) and 
 
 2. I cannot run the model with each cell repeated for each individual. I am
 told the model doesn't recognize that Cell_ID #1 for individual A is the
 same Cell_ID #1 for individual B.
 
 Does anyone know if either or both of these points are true? I would
 appreciate any thoughts, advice, or suggestions. 
 
 Thanks!
 
 -Stephanie


Hi Stephanie,

Some comments:

1. You should think about or at least be open to a zero inflated negative 
binomial distribution rather than zero inflated poisson. 

2. You should at least review the vignette for the pscl CRAN package, which 
provides standard fixed effects models and related functions for count based 
data and importantly,

Re: [R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread Achim Zeileis


On Tue, 26 Jun 2012, Marc Schwartz wrote:


On Jun 26, 2012, at 2:10 PM, SSimek wrote:


Hello,

I have count data that illustrate the presence or absence of individuals in
my study population. I created a grid cell across the study area and
calcuated a count value for each individual per season per year for each
grid cell. The count value is the number of time an individual was present
in each grid cell.  For illustration my data columns look something like
this and are repeated for each individual:

Cell_ID Param1  Param2  Param3  Param4  COUNT   NameYearSeason  Cov
1   160.565994  729.08  15037930.3  0   AA  2010AUT 
Open
1   160.565994  729.08  15037930.3  22  AA  2011SPR 
Open
1   160.565994  729.08  15037930.3  12  AA  2009SUM 
Open
1   160.565994  729.08  15037930.3  0   AA  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 0   AA  2010AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 16  AA  2011SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2009SUM 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   AA  2010SUM 
oldHard
?
563 86.777099   612.69  977 4474.6  62  AA  2010AUT 
Water
563 86.777099   612.69  977 4474.6  12  AA  2011SPR 
Water
563 86.777099   612.69  977 4474.6  55  AA  2009SUM 
Water


1   160.565994  729.08  15037930.3  0   BB  2010SUM 
Open
2   169.427001  491.87  1503.31 5101.09 72  BB  2010SUM 
oldHard
5   160.75  614.95  1503.31 2878.98 16  BB  2010SUM medHard
6   170.404998  510.58  1489.44 743.14  0   BB  2010SUM 
Water
?
563 86.777099   612.69  977 4474.6  0   BB  2010SUM 
Water


1   160.565994  729.08  15037930.3  14  C   2005AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006AUT 
Open
1   160.565994  729.08  15037930.3  0   C   2006SPR 
Open
1   160.565994  729.08  15037930.3  56  C   2007SPR 
Open
1   160.565994  729.08  15037930.3  0   C   2006SUM 
Open
2   169.427001  491.87  1503.31 5101.09 124 C   2005AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 231 C   2006AUT 
oldHard
2   169.427001  491.87  1503.31 5101.09 889 C   2006SPR 
oldHard
2   169.427001  491.87  1503.31 5101.09 0   C   2007SPR 
oldHard
?
563 86.777099   612.69  977 4474.6  0   C   2005
AUT Water
563 86.777099   612.69  977 4474.6  231 C   2006
AUT Water
563 86.777099   612.69  977 4474.6  185 C   2006
SPR Water
563 86.777099   612.69  977 4474.6  123 C   2007
SPR Water
563 86.777099   612.69  977 4474.6  52  C   2006
SUM Water



I have 563 grid cells across my study area and each individual has 1-563
cells associated for each year and each season the individual was monitored.
Therefore my grid cells are repeated. I end up with 71,000 records and 925
records have a Count value 0; which means 70,075 records have a Count value
= 0.

I wanted to run a zero inflated poisson model to determine mixed effects (of
parameters) with individual as the random effect. But I have been advised
two things:

1. I cannot run a zero inflated poisson model because my data are too
extremely inflated (i.e. 70,075 vs 925) and

2. I cannot run the model with each cell repeated for each individual. I am
told the model doesn't recognize that Cell_ID #1 for individual A is the
same Cell_ID #1 for individual B.

Does anyone know if either or both of these points are true? I would
appreciate any thoughts, advice, or suggestions.

Thanks!

-Stephanie



Hi Stephanie,

Some comments:

1. You should think about or at least be open to a zero inflated negative 
binomial distribution rather than zero inflated poisson.

2. You should at least review the vignette for the pscl CRAN package, which 
provides standard fixed effects models and related functions for count based 
data and importantly, some good conceptual content:

 http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf

3. Given the repeated measures framework and correlation issues you likely 
have, you should subscribe to and re-post your query to the R-sig-mixed-models 
list:

 https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

which will avail you of experts in the field.

4. There is also a draft FAQ for mixed models here:

 http://glmm.wikidot.com/faq

which I believe is maintained by Ben Bolker,

Re: [R] flatten lists

2012-06-26 Thread Neal Fultz

do.call(c, x) 

maybe?

On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:
 
 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)
 
 This function sort of does it:
 
 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }
 
 flatlist(x)
 
 However it is a bit slow. Is there a more native way?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flatten lists

2012-06-26 Thread Jeroen Ooms

Hmm that doesn't seem to work if the original list is nested more than
2 levels deep. I should have probably given a better example:

x - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flatten lists

2012-06-26 Thread Jeroen Ooms

Alright, but I need something recursive for lists with arbitrary deepness.



On Tue, Jun 26, 2012 at 3:37 PM, arun smartpink...@yahoo.com wrote:
 Hi,

 Try:

 do.call(c,do.call(c,x))

 x1-do.call(c,do.call(c,x))
  x2-flatlist(x)
  identical(x1,x2)
 [1] TRUE



 A.K.



 - Original Message -
 From: Jeroen Ooms jeroen.o...@stat.ucla.edu
 To: Neal Fultz nfu...@gmail.com
 Cc: r-help@r-project.org
 Sent: Tuesday, June 26, 2012 6:23 PM
 Subject: Re: [R] flatten lists

 Hmm that doesn't seem to work if the original list is nested more than
 2 levels deep. I should have probably given a better example:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




 On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] chisq.test

2012-06-26 Thread David Winsemius



On Jun 26, 2012, at 2:27 PM, Omphalodes Verna wrote:


Dear list!

I would like to calculate chisq.test on simple data set with 70  
observations, but the output is ''Warning message:''


Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example:

tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4,  
byrow = TRUE)

dimnames(tabela) - list(
SEX = c(M,F),
HAIR = c(Brown, Black, Red, Blonde))
addmargins(tabele)
prop.table(tabele)
chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.


Read any introductory stats book regarding  small cell sizes:

 [,1] [,2] [,3] [,4]
[1,]   11335
[2,]3   186   21





--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?

2012-06-26 Thread Søren Højsgaard

Dear Duncan

Thanks for your suggestion, but I really need sparse matrices: I have 
implemented various graph algorithms based on adjacency matrices. For large 
graphs, storing all the 0's in an adjacency matrices become uneconomical, and 
therefore I thought I would use sparse matrices but the speed of [i,j] will 
slow down the algorithms. However, using RcppEigen it is possible to mimic 
[i,j] with a slowdown of only a factor 16 which is much better than what is 
obtained when using [i,j]:

 benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
+ columns=c(test, replications, elapsed, relative), replications=5)
   test replications elapsed relative
1   lookup(mm, `[`)50.05  1.0
2   lookup(MM, `[`)5   23.54470.8
3 lookup(MM, Xiijj)50.84 16.8

The code for producing the result is given below.

Best regards,
Søren

-

library(inline)
library(RcppEigen)
library(rbenchmark)
library(Matrix)

src - '
using namespace Rcpp;
typedef Eigen::SparseMatrixdouble MSpMat;
const MSpMat X(asMSpMat(XX_));
int i = asint(ii_)-1;
int j = asint(jj_)-1;
double ans = X.coeff(i,j);
return(wrap(ans));
'

Xiijj - cxxfunction(signature(XX_=matrix, ii_=integer, jj_=integer), 
body=src, plugin=RcppEigen)

mm - matrix(c(1,0,0,0,0,0,0,0), nr=100, nc=100)
MM - as(mm, Matrix)
object.size(mm)
object.size(MM)

lookup - function(mat, func){
  for (i in 1:nrow(mat)){
for (j in 1:ncol(mat)){
v-func(mat,i,j)
}
   }
}

benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
columns=c(test, replications, elapsed, relative), 
replications=5)











-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: 25. juni 2012 11:27
To: Søren Højsgaard
Cc: r-help@r-project.org
Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to 
be very slow. Are there faster alternatives?

On 12-06-24 4:50 PM, Søren Højsgaard wrote:
 Dear all,

 Indexing matrices from the Matrix package with [i,j] seems to be very slow. 
 For example:

   library(rbenchmark)
   library(Matrix)
   mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20)
   MM- as(mm, Matrix)
   lookup- function(mat){
 for (i in 1:nrow(mat)){
   for (j in 1:ncol(mat)){
  mat[i,j]
   }
 }
 }

   benchmark(lookup(mm), lookup(MM),  columns=c(test, replications, 
 elapsed, relative), replications=50)
 test replications elapsed relative
 1 lookup(mm)   500.01   1
 2 lookup(MM)   508.77  877

 I would have expected a small overhead when indexing a matrix from the Matrix 
 package, but this result is really surprising...
 Does anybody know if there are faster alternatives to [i,j] ?

There's also a large overhead when indexing a dataframe, though Matrix appears 
to be slower.  It's designed to work on whole matrices at a time, not single 
entries.  So I'd suggest that if you need to use [i,j] indexing, then try to 
arrange your code to localize the access, and extract a submatrix as a regular 
fast matrix first. (Or if it will fit in memory, convert the whole thing to a 
matrix just for the access.  If I just add the line

mat - as.matrix(mat)

at the start of your lookup function, it becomes several hundred times
faster.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mixture distribution with positive and negative probabilities

2012-06-26 Thread Yakir Gagnon

Hi!
Any ideas on which package (e.g. mixdist, flexmix, etc) how I could fit a
mixture of say 3 Gaussian functions where 2 have their proportions, means,
and sigmas, and the third has a mean, sigma but a negative proportion.
Basically I'm trying to fit a mixture model to a distribution that
I know is the sum of 3 distributions, where one inhibits the other two. Is
there such a thing?
Thanks in advance!

Yakir Gagnon
cell+1 919 886 3877
office +1 919 684 7188
Johnsen Lab
Biology Department
Box 90338
Duke University
Durham, NC 27708
BioSci Building
Room 307
http://fds.duke.edu/db/aas/Biology/postdoc/yg32
http://www.biology.duke.edu/johnsenlab/people/yakir.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flatten lists

2012-06-26 Thread Bert Gunter

Frankly, I'm not sure what you mean, but presumably

unlist(yourlist, recurs=FALSE)

is not it, right?

-- Bert

On Tue, Jun 26, 2012 at 2:25 PM, Jeroen Ooms jeroen.o...@stat.ucla.eduwrote:

 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
  lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Indexing matrices from the Matrix package with [i, j] seems to be very slow. Are there faster alternatives?

2012-06-26 Thread Søren Højsgaard

Duncan,
I should probably add that I am aware that my code is not the solution and also 
that the relative gain of my code probably decreases with the problem size 
until eventually it will perform worse that [i,j] (because of copying I 
suppose). So my point is just:  It would just be nice if [i,j] was faster...
Regards
Søren

PS: For a 2000 x 2000 matrix I get:
  test replications elapsed  relative
1   lookup(mm, `[`)  514.85 1.00
2 lookup(MM, Xiijj)5  133.66 9.000673

Using the modified code

src - '
using namespace Rcpp;
typedef Eigen::MappedSparseMatrixdouble MSpMat;
const MSpMat X(asMSpMat(XX_));
int i = asint(ii_)-1;
int j = asint(jj_)-1;
double ans = X.coeff(i,j);
return(wrap(ans));
'





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Søren Højsgaard
Sent: 27. juni 2012 01:20
To: Duncan Murdoch
Cc: r-help@r-project.org
Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to 
be very slow. Are there faster alternatives?

Dear Duncan

Thanks for your suggestion, but I really need sparse matrices: I have 
implemented various graph algorithms based on adjacency matrices. For large 
graphs, storing all the 0's in an adjacency matrices become uneconomical, and 
therefore I thought I would use sparse matrices but the speed of [i,j] will 
slow down the algorithms. However, using RcppEigen it is possible to mimic 
[i,j] with a slowdown of only a factor 16 which is much better than what is 
obtained when using [i,j]:

 benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
+ columns=c(test, replications, elapsed, relative), 
+ replications=5)
   test replications elapsed relative
1   lookup(mm, `[`)50.05  1.0
2   lookup(MM, `[`)5   23.54470.8
3 lookup(MM, Xiijj)50.84 16.8

The code for producing the result is given below.

Best regards,
Søren

-

library(inline)
library(RcppEigen)
library(rbenchmark)
library(Matrix)

src - '
using namespace Rcpp;
typedef Eigen::SparseMatrixdouble MSpMat; const MSpMat X(asMSpMat(XX_)); 
int i = asint(ii_)-1; int j = asint(jj_)-1; double ans = X.coeff(i,j); 
return(wrap(ans)); '

Xiijj - cxxfunction(signature(XX_=matrix, ii_=integer, jj_=integer), 
body=src, plugin=RcppEigen)

mm - matrix(c(1,0,0,0,0,0,0,0), nr=100, nc=100) MM - as(mm, Matrix)
object.size(mm)
object.size(MM)

lookup - function(mat, func){
  for (i in 1:nrow(mat)){
for (j in 1:ncol(mat)){
v-func(mat,i,j)
}
   }
}

benchmark(lookup(mm,`[`), lookup(MM,`[`), lookup(MM, Xiijj),
columns=c(test, replications, elapsed, relative), 
replications=5)











-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: 25. juni 2012 11:27
To: Søren Højsgaard
Cc: r-help@r-project.org
Subject: Re: [R] Indexing matrices from the Matrix package with [i, j] seems to 
be very slow. Are there faster alternatives?

On 12-06-24 4:50 PM, Søren Højsgaard wrote:
 Dear all,

 Indexing matrices from the Matrix package with [i,j] seems to be very slow. 
 For example:

   library(rbenchmark)
   library(Matrix)
   mm- matrix(c(1,0,0,0,0,0,0,0), nr=20, nc=20)
   MM- as(mm, Matrix)
   lookup- function(mat){
 for (i in 1:nrow(mat)){
   for (j in 1:ncol(mat)){
  mat[i,j]
   }
 }
 }

   benchmark(lookup(mm), lookup(MM),  columns=c(test, replications, 
 elapsed, relative), replications=50)
 test replications elapsed relative
 1 lookup(mm)   500.01   1
 2 lookup(MM)   508.77  877

 I would have expected a small overhead when indexing a matrix from the Matrix 
 package, but this result is really surprising...
 Does anybody know if there are faster alternatives to [i,j] ?

There's also a large overhead when indexing a dataframe, though Matrix appears 
to be slower.  It's designed to work on whole matrices at a time, not single 
entries.  So I'd suggest that if you need to use [i,j] indexing, then try to 
arrange your code to localize the access, and extract a submatrix as a regular 
fast matrix first. (Or if it will fit in memory, convert the whole thing to a 
matrix just for the access.  If I just add the line

mat - as.matrix(mat)

at the start of your lookup function, it becomes several hundred times
faster.)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,

Re: [R] Compile C files

2012-06-26 Thread Duncan Murdoch


On 12-06-26 2:48 PM, Frederico Mestre wrote:

Hello:



Sorry, this might look like a beginner question, but I'm just starting to
work on the C and R interface.



I'm trying to compile a C file (with a function) to load it to an R function
but, in the command line I keep getting a lot of errors, like:


You'll need to tell us what you did before  you can expect us to 
interpret the error messages.


Duncan Murdoch





C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected
declaration specifiers before 'SEXP'



I've been able to compile this file before, so I



I'm using Windows 7 in a 64 bits computer.



Best regards,



Frederico




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Figuring out encodings of PDFs in R

2012-06-26 Thread Duncan Murdoch


On 12-06-26 3:28 PM, Jonas Michaelis wrote:

Dear list,

I am currently scraping some text data from several PDFs using the
readPDF() function in the tm package. This all works very well and in most
cases the encoding seems to be latin1 - in some, however, it is not. Is
there a good way in R to check character encodings? I found the functions
is.utf8() and is.local() in the tau package but that obviously only gets me
so far.



There are heuristics for guessing encodings, but I don't think they are 
built into R.  I think the way to do what you want is to read the PDF 
spec to find out how the strings are encoded in the source file, and 
believe that.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RES: Compile C files

2012-06-26 Thread Frederico Mestre

Hello:

I just reinstalled R and Rtools. 

It works perfectly now.

Thanks,

Frederico 



-Mensagem original-
De: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Enviada em: quarta-feira, 27 de Junho de 2012 01:06
Para: Frederico Mestre
Cc: r-help@r-project.org
Assunto: Re: [R] Compile C files

On 12-06-26 2:48 PM, Frederico Mestre wrote:
 Hello:



 Sorry, this might look like a beginner question, but I'm just starting 
 to work on the C and R interface.



 I'm trying to compile a C file (with a function) to load it to an R 
 function but, in the command line I keep getting a lot of errors, like:

You'll need to tell us what you did before  you can expect us to interpret
the error messages.

Duncan Murdoch




 C:/Program~1/R/R-215~1.0/include/Rinternals.h:1066:1: error: expected 
 declaration specifiers before 'SEXP'



 I've been able to compile this file before, so I



 I'm using Windows 7 in a 64 bits computer.



 Best regards,



 Frederico




   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rms package-superposition prediction curve of ols and data points

2012-06-26 Thread Frank Harrell

This is what the addpanel argument to plot.Predict is for, something along
the lines of

ap - function(...) lpoints(age, weight)
plot(Predict(. . .), addpanel=ap)

Frank


David Winsemius wrote
 
 On Jun 26, 2012, at 11:29 AM, Sarah Goslee wrote:
 
 You could use points() instead of plot() for the second command.
 
 
 Ummm. Maybe not. I think think that plot.Predict uses lattice  
 graphics. You may need to use trellis.focus() followed by lpoints().  
 Or use the + operation with suitable objects.
 
 -- 
 David.
 
 

 Sarah

 On Tue, Jun 26, 2012 at 8:37 AM, achaumont lt;agnes.chaumont@gt;  
 wrote:
 Hello,

 I have a question about the “plot.predict” function in Frank  
 Harrell's rms
 package.
 Do you know how to superpose in the same graph the prediction curve  
 of ols
 and raw data points?
 Put most simply, I would like to combine these two graphs:

  fit_linear - ols (y4 ~ rcs(x2,c(5,10,15,20,60,80,90)), x=TRUE,  
 y=TRUE)
 p - Predict(fit_linear,x2,conf.int=FALSE)
 plot (p, ylim =c(-2,0.5), xlim=c(0,100))  # graph n°1

 z - plot  
 (x2,y4,ylim=c(-2,0.5),xlim=c(0,100),type=p,lwd=6,col=blue)
 # graph n°2

 Thanks all,

 Agnès





 -- 
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/rms-package-superposition-prediction-curve-of-ols-and-data-points-tp4634503p4634566.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question about formatting Dates

2012-06-26 Thread Erin Hodgess

Dear R People:

I have dates as factors in the following:

 poudel.df$DATE
[1] 1/2/2011  1/4/2011  1/4/2011  1/4/2011  1/6/2011  1/7/2011  1/8/2011
[8] 1/9/2011  1/10/2011
Levels: 1/10/2011 1/2/2011 1/4/2011 1/6/2011 1/7/2011 1/8/2011 1/9/2011


I want them to be regular dates which can be sorted, etc.

But when I did this:

 as.character(poudel.df$DATE)
[1] 1/2/2011  1/4/2011  1/4/2011  1/4/2011  1/6/2011  1/7/2011
[7] 1/8/2011  1/9/2011  1/10/2011

and
 as.Date(as.character(poudel.df$DATE),%m/%d/$Y)
[1] NA NA NA NA NA NA NA NA NA

because the dates do not have leading zeros.

There are approximately 30 years of nearly daily data in the entire set.

Any suggestions would be much appreciated.

Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] chisq.test



Hi,

The error is due to less than 5 observations in some cells.

You can try,
fisher.test(tabele)
    Fisher's Exact Test for Count Data

data:  tabele 
p-value = 0.0998
alternative hypothesis: two.sided 

A.K.



- Original Message -
From: Omphalodes Verna omphalodes.ve...@yahoo.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Tuesday, June 26, 2012 2:27 PM
Subject: [R] chisq.test

Dear list!

I would like to calculate chisq.test on simple data set with 70 observations, 
but the output is ''Warning message:''

Warning message:
In chisq.test(tabele) : Chi-squared approximation may be incorrect


Here is an example: 

        tabele - matrix(c(11, 3, 3, 18, 3, 6, 5, 21), ncol = 4, byrow = TRUE)
        dimnames(tabela) - list(
        SEX = c(M,F),
        HAIR = c(Brown, Black, Red, Blonde))
        addmargins(tabele)
        prop.table(tabele)
        chisq.test(tabele)
Please, give me an advice / suggestion / recommendation.

Thanks a lot to all, OV

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] flatten lists

Hi,

Try:

do.call(c,do.call(c,x))

x1-do.call(c,do.call(c,x))
 x2-flatlist(x)
 identical(x1,x2)
[1] TRUE



A.K.



- Original Message -
From: Jeroen Ooms jeroen.o...@stat.ucla.edu
To: Neal Fultz nfu...@gmail.com
Cc: r-help@r-project.org
Sent: Tuesday, June 26, 2012 6:23 PM
Subject: Re: [R] flatten lists

Hmm that doesn't seem to work if the original list is nested more than
2 levels deep. I should have probably given a better example:

x - list(name=Jeroen, age=27, married=FALSE,
home=list(country=list(name=Netherlands, short=NL), city=Utrecht))




On Tue, Jun 26, 2012 at 3:04 PM, Neal Fultz nfu...@gmail.com wrote:
 do.call(c, x)

 maybe?

 On Tue, Jun 26, 2012 at 02:25:40PM -0700, Jeroen Ooms wrote:
 I am looking for a function to flatten a list to a list of only 1
 level deep. Very similar to unlist, however I don't want to turn it
 into a vector because then everything will be casted to character
 vectors:

 x - list(name=Jeroen, age=27, married=FALSE,
 home=list(country=Netherlands, city=Utrecht))
 unlist(x)

 This function sort of does it:

 flatlist - function(mylist){
   lapply(rapply(mylist, enquote, how=unlist), eval)
 }

 flatlist(x)

 However it is a bit slow. Is there a more native way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Zero inflated: is there a limit to the level of inflation

2012-06-26 Thread Stephanie L. Simek

Thank you both for your quick response and input. I will consider all of
your points and see what we are able to derive from there. 

Thank you again for your time and expertise.

-Stephanie

---
Stephanie L. Simek
Carnivore Ecology Lab
Forest and Wildlife Research Center
Mississippi State University
Box 9690
Mississippi State, MS 39762
Cell: (850) 591-1430
Email: ssi...@cfr.msstate.edu


-Original Message-
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] 
Sent: Tuesday, June 26, 2012 4:46 PM
To: Marc Schwartz
Cc: Stephanie L. Simek; r-help@r-project.org
Subject: Re: [R] Zero inflated: is there a limit to the level of
inflation

On Tue, 26 Jun 2012, Marc Schwartz wrote:

 On Jun 26, 2012, at 2:10 PM, SSimek wrote:

 Hello,

 I have count data that illustrate the presence or absence of 
 individuals in my study population. I created a grid cell across the 
 study area and calcuated a count value for each individual per season

 per year for each grid cell. The count value is the number of time an

 individual was present in each grid cell.  For illustration my data 
 columns look something like this and are repeated for each
individual:

 Cell_ID  Param1  Param2  Param3  Param4  COUNT   NameYear
Season  Cov
 1160.565994  729.08  15037930.3  0   AA  2010
AUT Open
 1160.565994  729.08  15037930.3  22  AA  2011
SPR Open
 1160.565994  729.08  15037930.3  12  AA  2009
SUM Open
 1160.565994  729.08  15037930.3  0   AA  2010
SUM Open
 2169.427001  491.87  1503.31 5101.09 0   AA  2010
AUT oldHard
 2169.427001  491.87  1503.31 5101.09 16  AA  2011
SPR oldHard
 2169.427001  491.87  1503.31 5101.09 0   AA  2009
SUM oldHard
 2169.427001  491.87  1503.31 5101.09 0   AA  2010
SUM oldHard
 ?
 563  86.777099   612.69  977 4474.6  62  AA  2010
AUT Water
 563  86.777099   612.69  977 4474.6  12  AA  2011
SPR Water
 563  86.777099   612.69  977 4474.6  55  AA  2009
SUM Water


 1160.565994  729.08  15037930.3  0   BB  2010
SUM Open
 2169.427001  491.87  1503.31 5101.09 72  BB  2010
SUM oldHard
 5160.75  614.95  1503.31 2878.98 16  BB  2010SUM
medHard
 6170.404998  510.58  1489.44 743.14  0   BB  2010
SUM Water
 ?
 563  86.777099   612.69  977 4474.6  0   BB  2010
SUM Water


 1160.565994  729.08  15037930.3  14  C   2005
AUT Open
 1160.565994  729.08  15037930.3  0   C   2006
AUT Open
 1160.565994  729.08  15037930.3  0   C   2006
SPR Open
 1160.565994  729.08  15037930.3  56  C   2007
SPR Open
 1160.565994  729.08  15037930.3  0   C   2006
SUM Open
 2169.427001  491.87  1503.31 5101.09 124 C   2005
AUT oldHard
 2169.427001  491.87  1503.31 5101.09 231 C   2006
AUT oldHard
 2169.427001  491.87  1503.31 5101.09 889 C   2006
SPR oldHard
 2169.427001  491.87  1503.31 5101.09 0   C   2007
SPR oldHard
 ?
 563  86.777099   612.69  977 4474.6  0   C
2005AUT Water
 563  86.777099   612.69  977 4474.6  231 C
2006AUT Water
 563  86.777099   612.69  977 4474.6  185 C
2006SPR Water
 563  86.777099   612.69  977 4474.6  123 C
2007SPR Water
 563  86.777099   612.69  977 4474.6  52  C
2006SUM Water



 I have 563 grid cells across my study area and each individual has 
 1-563 cells associated for each year and each season the individual
was monitored.
 Therefore my grid cells are repeated. I end up with 71,000 records 
 and 925 records have a Count value 0; which means 70,075 records 
 have a Count value = 0.

 I wanted to run a zero inflated poisson model to determine mixed 
 effects (of
 parameters) with individual as the random effect. But I have been 
 advised two things:

 1. I cannot run a zero inflated poisson model because my data are too

 extremely inflated (i.e. 70,075 vs 925) and

 2. I cannot run the model with each cell repeated for each 
 individual. I am told the model doesn't recognize that Cell_ID #1 for

 individual A is the same Cell_ID #1 for individual B.

 Does anyone know if either or both of these points are true? I would 
 appreciate any thoughts, advice, or suggestions.

 Thanks!

 -Stephanie


 Hi Stephanie,

 Some comments:

 1. You should think about or at least be open to a zero inflated
negative binomial distribution rather than zero inflated poisson.

 2. You should at least review the vignette for the pscl CRAN package,
which provides standard fixed effects models and

Re: [R] flatten lists