Re: [R] Storing and managing custom R functions for re-use

2011-07-09 Thread Abhijit Dasgupta, PhD
I think most of us are in a similar situation. I've usually kept mine in 
a file which is sourced when I start R. The main problem I have with 
this is that it clutters up my environment with a lot of stuff I don't 
need all the time. I'm in the process of creating a custom package which 
will be lazy-loaded. I believe a previous discussion of this topic 
suggested this as the preferred method.



On 07/09/2011 07:30 AM, Simon Chamaillé-Jammes wrote:

Dear all,

sorry if this is a bit on the sidetrack for R-help.

As a regular R user I have developed quite a lot of custom R 
functions, to the point of not always remembering what I have already 
programmed, where the file is and so on.
I was wondering what other people do in this regards. A basic file 
with all your functions, or a custom R package, or directly integrated 
into a profile file ??? I'm considering that a blog with tagged posts 
may be a good solution (and really good ones could join R-bloggers 
maybe).


If someone is happy to share what (s)he considers good practice, thanks.

simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence bands in ggplot2

2011-07-07 Thread Abhijit Dasgupta, PhD

You can easily do this by:

qplot(x=as.factor(sch),y=est, geom='point', colour='red') +
geom_pointrange(aes(x=as.factor(sch), y=est, ymin=lower.95ci, ymax=upper.95ci))+
xlab('School') + ylab(Value-added)+theme_bw()




On 07/07/2011 05:55 PM, Christopher Desjardins wrote:

Hi,
I have the following data:


est

  sch190  sch107  sch290  sch256  sch287  sch130  
sch139
  4.16656026  2.64306071  4.22579866  6.12024789  4.49624748 11.12799127  
1.17353917
  sch140  sch282  sch161  sch193  sch156  sch288  
sch352
  3.48197696 -0.29659410 -1.99194986 10.23489859  7.77342138  6.77624539  
9.66795001
  sch368  sch225  sch301  sch105  sch353  sch291  
sch179
  7.20229569  4.41989204  5.61586860  5.99460203 -2.65019242 -9.42614560 
-0.25874193
  sch134  sch135  sch324  sch360 bb1
  3.26432479 10.52555091 -0.09637968  2.49668858 -3.24173545


se

sch190sch107sch290sch256sch287sch130sch139sch140
  3.165127  3.710750  4.680911  6.335386  3.896302  4.907679  4.426284  4.266303
sch282sch161sch193sch156sch288sch352sch368sch225
  3.303747  4.550193  3.995261  5.787374  5.017278  7.820763  7.253183  4.483988
sch301sch105sch353sch291sch179sch134sch135sch324
  4.076570  7.564359 10.456522  5.705474  4.247927  5.671536 10.567093  4.138356
sch360   bb1
  4.943779  1.935142


sch

  [1] 190 107 290 256 287 130 139 140 282 161 193 156 
288
[14] 352 368 225 301 105 353 291 179 134 135 324 360 
BB


 From this data I have created 95% confidence intervals assuming a normal 
distribution.

lower.95ci- est - se*qnorm(.975)
upper.95ci- est + se*qnorm(.975)

What I'd like to do is plot the estimate (est) and have lines attach to the 
points located in lower.95ci and upper.95ci.  Presently I am doing the 
following:

qplot(x=as.factor(sch),y=lower.95ci) + geom_point(aes(x=as.factor(sch),y=upper.95ci),colour=black) + 
geom_point(aes(x=as.factor(sch), y=est),colour=red) + ylab(Value-Added) + 
xlab(School) + theme_bw()

Which creates this graph ---   
http://dl.dropbox.com/u/1501309/value_added_test.pdf

That's fine except that it doesn't connect the points vertically. Does anyone 
know how I could make the 'black' points connect to the 'red' point, i.e. show 
confidence bands?

Thanks,
Chris


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear regression in a data.frame using recast -- A fortunes candidate??

2011-03-16 Thread Abhijit Dasgupta, PhD

Seconded

On 03/16/2011 05:37 PM, Bert Gunter wrote:

Ha! -- A fortunes candidate?
-- Bert


If this is really a time series, then you will have serious validity
problems due to auto-correlation among non-independent units. (But if you
are just searching for a way to pull the wool over the eyes of the
statistically uninformed, then I guess there's no stopping you.)

--

David Winsemius, MD
West Hartford, CT


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Revolution Analytics reading SAS datasets

2011-02-11 Thread Abhijit Dasgupta, PhD


I'm sure the legal ground is tricky. However, OpenOffice and LibreOffice 
and KWord have been able to open the (proprietary) MS Word doc format 
for a while now, and they are open source (and Libre Office might even 
be GPL'd), so the algorithm is in fact published in Jeremy's sense, 
and has been for several years. I figure the reason for keeping the SAS 
reading functionality proprietary is Revolution's (perfectly legitimate) 
wish to make money by separating their product from GNU R and adding 
features that would make people want to buy rather than just download 
from CRAN.


Within GNU R there are of course sas.get in the Hmisc package (which 
requires SAS). It should also be quite easy to write a wrapper around 
dsread, a command-line closed source product freely downloadable in a 
limited form which will convert sas7bdat files to csv or tsv format (and 
SQL if you pay). This latter path won't require SAS locally.


I'm also sure that SAS has a way to export its datasets into R, since 
the current version of IML Studio will in fact interact with R.



On 02/10/2011 03:11 PM, Jeremy Miles wrote:

On 10 February 2011 12:01, Matt Shotwellm...@biostatmatt.com  wrote:

On Thu, 2011-02-10 at 10:44 -0800, David Smith wrote:

The SAS import/export feature of Revolution R Enterprise 4.2 isn't
open-source, so we can't release it in open-source Revolution R
Community, or to CRAN as we do with the ParallelR packages (foreach,
doMC, etc.).

Judging by the language of Dr. Nie's comments on the page linked below,
it seems unlikely this feature is the result of a licensing agreement
with SAS. Is that correct?



There was some discussion of this on the SAS email list.  People who
seem to know what they were talking about said that they would have
had to reverse engineer it to decode the file format.  It's slightly
tricky legal ground - the file format can't be copyrighted but
publishing the algorigthm might not be allowed.  I guess if they
release it as open source, that could be construed as publishing the
algorithm. (SPSS and WPS both can open SAS files, and I'd be surprised
if SAS licensed to them.  [Esp WPS, who SAS are (or were) suing for
all kinds of things in court in London.)

Jeremy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Programmaticly finding number of processors by R code

2010-10-03 Thread Abhijit Dasgupta, PhD
  If you have installed multicore (for unix/mac), you can find the 
number of cores by /*multicore:::detectCores()*/

On 10/3/10 1:03 PM, Ajay Ohri wrote:
 Dear List

 Sorry if this question seems very basic.

 Is there a function to pro grammatically find number of processors in
 my system _ I want to pass this as a parameter to snow in some serial
 code to parallel code functions

 Regards

 Ajay



 Websites-
 http://decisionstats.com
 http://dudeofdata.com


 Linkedin- www.linkedin.com/in/ajayohri

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating R objects in Java

2010-10-01 Thread Abhijit Dasgupta, PhD

 On 10/1/10 9:18 AM, lord12 wrote:

How do you call R methods from Java? I want to create a GUI using Swing in
Jaa that calls R methods in Java.



Look in the documentation for the rJava package

--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Code for paper?

2010-09-30 Thread Abhijit Dasgupta, PhD
Look at the qvalue package by Dabney and Storey, which might satisfy 
your last query


On 09/30/2010 06:40 PM, Jim Silverton wrote:

Does anyone has the Rcode for Gilbert's 2005 paper on the discrete FDR and
Tarone's 1990 paper? And Storey's pFDR?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Code for paper?

2010-09-30 Thread Abhijit Dasgupta, PhD
Reading Gilbert's paper and references, and going on the web, I see that 
Gilbert provided Fortran source code for his method as well as Tarone's 
method. It might be possible to wrap this in R



On 09/30/2010 06:40 PM, Jim Silverton wrote:

Does anyone has the Rcode for Gilbert's 2005 paper on the discrete FDR and
Tarone's 1990 paper? And Storey's pFDR?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up regressions using ddply

2010-09-22 Thread Abhijit Dasgupta, PhD
 There has been a recent addition of parallel processing capabilities 
to plyr (I believe v1.2 and later), along with a dataframe iterator 
construct. Both have improved performance of ddply greatly for 
multicore/cluster computing. So we now have the niceness of plyr's 
grammar with pretty good performance. From the plyr NEWS file:


Version 1.2 (2010-09-09)
--

NEW FEATURES

* l*ply, d*ply, a*ply and m*ply all gain a .parallel argument that when 
TRUE,
  applies functions in parallel using a parallel backend registered 
with the

  foreach package:

  x - seq_len(20)
  wait - function(i) Sys.sleep(0.1)
  system.time(llply(x, wait))
  #  user  system elapsed
  # 0.007   0.005   2.005

  library(doMC)
  registerDoMC(2)
  system.time(llply(x, wait, .parallel = TRUE))
  #  user  system elapsed
  # 0.020   0.011   1.038



On 9/22/10 10:41 AM, Ista Zahn wrote:

Hi Alison,

On Wed, Sep 22, 2010 at 11:05 AM, Alison Macaladya...@kmhome.org  wrote:


Hi,

I have a data set that I'd like to run logistic regressions on, using ddply
to speed up the computation of many models with different combinations of
variables.

In my experience ddply is not particularly fast. I use it a lot
because it is flexible and has easy to understand syntax, not for it's
speed.

I would like to run regressions on every unique two-variable

combination in a portion of my data set,  but I can't quite figure out how
to do using ddply.

I'm not sure ddply is the tool for this job.

The data set looks like this, with status as the

binary dependent variable and V1:V8 as potential independent variables in
the logistic regression:

m- matrix(rnorm(288), nrow = 36)
colnames(m)- paste('V', 1:8, sep = '')
x- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),
   as.data.frame(m))


You can use combn to determine the combinations you want:

Varcombos- combn(names(x)[-1], 2)

 From there you can do a loop, something like

results- list()
for(i in 1:dim(Varcombos)[2])
{
   log.glm- glm(as.formula(paste(status ~ , Varcombos[1,i],   + ,
Varcombos[2,i], sep=)), family=binomial(link=logit),
na.action=na.omit, data=x)
   glm.summary-summary(log.glm)
   aic- extractAIC(log.glm)
   coef- coef(glm.summary)
   results[[i]]- list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2])
#or whatever other output here
   names(results)[i]- paste(Varcombos[1,i], Varcombos[2,i], sep=_)
}

I'm sure you could replace the loop with something more elegant, but
I'm not really sure how to go about it.


I used melt to put my data frame into a more workable format
require(reshape)
xm- melt(x, id = 'status')

Here is the basic shape of the function I'd like to apply to every
combination of variables in the dataset:

h- function(df)
{

attach(df)
log.glm- (glm(status ~ value1+ value2 , family=binomial(link=logit),
na.action=na.omit)) #What I can't figure out is how to specify 2 different
variables (I've put value1 and value2 as placeholders) from the xm to
include in the model

glm.summary-summary(log.glm)
aic- extractAIC(log.glm)
coef- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2]) #or whatever other output
here
}

And then I'd like to use ddply to speed up the computations.

require(pplyr)
output-dddply(xm, .(variable), as.data.frame.function(h))
output


I can easily do this using ddply when I only want to use 1 variable in the
model, but can't figure out how to do it with two variables.

I don't think this approach can work. You are saying split up xm by
variable and then expecting  to be able to reference different levels
of variable within each split, an impossible request.

Hope this helps,
Ista


Many thanks for any hints!

Ali




Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
  Laboratory of Tree Ring Research

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating publication-quality plots for use in Microsoft Word

2010-09-15 Thread Abhijit Dasgupta, PhD

 On 9/15/10 10:38 AM, dadrivr wrote:

Hi everyone,

I am trying to make some publication-quality plots for use in Microsoft
Word, but I am having trouble creating high-quality plots that are supported
by Microsoft Word.

If I use the R plot function to create the figure, the lines are jagged, and
the picture is not of high quality (same with JPEG(), TIFF(), and PNG()
functions).  I have tried using the Cairo package, but it distorts my dashed
lines, and the win.metafile results in a picture of terrible quality.  The
only way I have succeeded in getting a high quality picture in a file is by
using the pdf() function to save the plot as a pdf file, but all my attempts
to convert the image in the pdf file to a TIFF or other file type accepted
by Word result in considerably degraded quality.  Do you have any
suggestions for creating publication-quality plots in R that can be placed
in Word documents?  What packages, functions (along with options), and/or
conversions would you use?  Thanks so much for your help!
Another option I've used is to export to PDF (which seems to give the 
best quality) and then use the (free) Imagemagick program to convert the 
PDF to high-resolution PNG. This worked for some involved heatmaps that 
were submitted to a journal. Imagemagick can be downloaded directly for 
Windows or via Cygwin.


Suppose your figure is in fig1.pdf. You can use the following command 
(once Imagemagick is downloaded and in your path):


system(convert -density 300x300 fig1.pdf fig1.png)



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate, by, *apply

2010-09-15 Thread Abhijit Dasgupta, PhD
 I would approach this slightly differently. I would make func a 
function of x and y.


func - function(x,y){
m - median(x)
return(m  2  m  y)
}

Now generate tmp just as you have. then:

require(plyr)
res - daply(tmp, .(z), summarise, res=func(x,y))

I believe this does the trick

Abhijit
On 9/15/10 5:45 PM, Mark Ebbert wrote:

Dear R gurus,

I regularly come across a situation where I would like to apply a function to a 
subset of data in a dataframe, but I have not found an R function to facilitate 
exactly what I need. More specifically, I'd like my function to have a context 
of where the data it's analyzing came from. Here is an example:

### BEGIN ###
func-function(x){
m-median(x$x)
if(m  2  m  x$y){
return(T)
}
return(F)
}

tmp-data.frame(x=1:10,y=c(rep(34,3),rep(35,3),rep(34,4)),z=c(rep(a,3),rep(b,3),rep(c,4)))
res-aggregate(tmp,list(z),func)
### END ###

The values in the example are trivial, but the problem is that only one column 
is passed to my function at a time, so I can't determine how 'm' relates to 
'x$y'. Any tips/guidance is appreciated.

Mark T. W. Ebbert
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving/loading custom R scripts

2010-09-08 Thread Abhijit Dasgupta, PhD
 You can create a .First function in your .Rprofile file (which  will 
be in ~/.Rprofile). For example


.First - function(){
source(Friedman-Test-with-Post-Hoc.r.txt)
}

You can also create your own package (mylibrary) down the line (see 
the R manual for creating extensions at 
http://cran.fhcrc.org/doc/manuals/R-exts.pdf) which will be a collection 
of your custom scripts that you have written, and then you can 
automatically load them using


.First - function(){
library(mylibrary)
}

Hope this helps.

Abhijit

On 9/8/10 3:25 AM, DrCJones wrote:

Hi,
How does R automatically load functions so that they are available from the
workspace? Is it anything like Matlab - you just specify a directory path
and it finds it?

The reason I ask is because  I found a really nice script that I would like
to use on a regular basis, and it would be nice not to have to 'copy and
paste' it into R on every startup:

http://www.r-statistics.com/wp-content/uploads/2010/02/Friedman-Test-with-Post-Hoc.r.txt

This would be for Ubuntu, if that makes any difference.

Cheers



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Something similar to layout in lattice or ggplot

2010-09-07 Thread Abhijit Dasgupta, PhD

 Hi Thierry,

It's really the latter I want..independent plots. I use faceting quite a 
bit, but I need things like a page of plots for simulations under 
different conditions. I suppose I can still use faceting combined with 
reshape, but I'd rather not go that route if I can help it.


Abhijit

On 9/7/10 10:44 AM, ONKELINX, Thierry wrote:

Dear Abhijit,

In ggplot you can use facetting (facet_grid() or facet_wrap()) to create
subplot based on the same dataset. Or you can work with viewport() if
you want several independent plots.

HTH,

Thierry




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey



-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] Namens Abhijit Dasgupta
Verzonden: dinsdag 7 september 2010 16:38
Aan: r-help@r-project.org
Onderwerp: [R] Something similar to layout in lattice or ggplot

Hi,

Is there a function similar to the layout function in base
graphics in either lattice or ggplot? I'm hoping someone has
written a function wrapper to the appropriate commands in
grid that would make this easier :)

Abhijit

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Something similar to layout in lattice or ggplot

2010-09-07 Thread Abhijit Dasgupta, PhD

 Thank you all for the suggestions. They have all been immensely helpful.

Abhijit

On 9/7/10 10:44 AM, ONKELINX, Thierry wrote:

Dear Abhijit,

In ggplot you can use facetting (facet_grid() or facet_wrap()) to create
subplot based on the same dataset. Or you can work with viewport() if
you want several independent plots.

HTH,

Thierry




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey



-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] Namens Abhijit Dasgupta
Verzonden: dinsdag 7 september 2010 16:38
Aan: r-help@r-project.org
Onderwerp: [R] Something similar to layout in lattice or ggplot

Hi,

Is there a function similar to the layout function in base
graphics in either lattice or ggplot? I'm hoping someone has
written a function wrapper to the appropriate commands in
grid that would make this easier :)

Abhijit

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot inside cycle

2010-08-26 Thread Abhijit Dasgupta, PhD
You haven't wrapped p in the print command, which is one of the ways to 
make sure the plot gets printed  when we need it.

 print(p+geom_point(aes(size=3))) does the trick
On 08/26/2010 06:08 AM, Petr PIKAL wrote:

Dear all

I want to save several ggplots in one pdf document. I tried this

for (i in names(iris)[2:4]) {
p-ggplot(iris, aes(x=Sepal.Length, y=iris[,i], colour=Species))
p+geom_point(aes(size=3))
}

with different variations of y input but was not successful. In past I
used qplot in similar fashion which worked

for(i in names(mleti)[7:15]) print(qplot(sito, mleti1[,i],
facets=~typ,ylab=i, geom=c(point, line), colour=ordered(minuty),
data=mleti1))

So I wonder if anybody used ggplot in cycle and how to solve input of
variables throughout cycle

Thank you

Petr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove rows based on frequency of factor and then difference date scores

2010-08-24 Thread Abhijit Dasgupta, PhD

An answer to 1)

 x = data.frame(Type=c('A','A','B','B'), ID=c(1,1,3,1), Date = 
c('16/09/2010','23/09/2010','18/8/2010','13/5/2010'), Value=c(8,9,7,6))

 x
  Type ID   Date Value
1A  1 16/09/2010 8
2A  1 23/09/2010 9
3B  3  18/8/2010 7
4B  1  13/5/2010 6
 x$Date = as.Date(x$Date,format='%d/%m/%Y')
 library(plyr)
 x$uniqueID = paste(x$Type, x$ID, sep='')
 nobs = daply(x, ~uniqueID, nrow)
 keep = names(nobs)[nobs1]
 newx = x[x$uniqueID %in% keep,]

An answer to 2)
 require(plyr)
 ddply(newx, ~uniqueID, transform, newDate = as.numeric(Date - 
min(Date)+1))



On 08/24/2010 01:19 PM, Chris Beeley wrote:

Hello-

A basic question which has nonetheless floored me entirely. I have a
dataset which looks like this:

Type  ID DateValue
A   116/09/2020   8
A   1 23/09/2010  9
B   3 18/8/20107
B   1 13/5/20106

There are two Types, which correspond to different individuals in
different conditions, and loads of ID labels (1:50) corresponding to
the different individuals in each condition, and measurements at
different times (from 1 to 10 measurements) for each individual.

I want to perform the following operations:

1) Delete all individuals for whom only one measurement is available.
In the dataset above, you can see that I want to delete the row Type B
ID 3, and Type B ID 1, but without deleting the Type A ID 1 data
because there is more than one measurement for Type A ID 1 (but not
for Type B ID1)

2) Produce difference scores for each of the Dates, so each individual
(Type A ID1 and all the others for whom more than one measurement
exists) starts at Date 1 and goes up in integers according to how
many days have elapsed.

I just know there's some incredibly cunning R-ish way of doing this
but after many hours of fiddling I have had to admit defeat.

I would be very grateful for any words of advice.

Many thanks,
Chris Beeley,
Institute of Mental Health, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove rows based on frequency of factor and then difference date scores

2010-08-24 Thread Abhijit Dasgupta, PhD
The only problem with this is that Chris's unique individuals are a 
combination of Type and ID, as I understand it. So Type=A, ID=1 is a 
different individual from Type=B,ID=1. So we need to create a unique 
identifier per person, simplistically by uniqueID=paste(Type, ID, 
sep=''). Then, using this new identifier, everything follows.


On 08/24/2010 01:53 PM, David Winsemius wrote:


On Aug 24, 2010, at 1:19 PM, Chris Beeley wrote:


Hello-

A basic question which has nonetheless floored me entirely. I have a
dataset which looks like this:

Type  ID DateValue
A   116/09/2020   8
A   1 23/09/2010  9
B   3 18/8/20107
B   1 13/5/20106

There are two Types, which correspond to different individuals in
different conditions, and loads of ID labels (1:50) corresponding to
the different individuals in each condition, and measurements at
different times (from 1 to 10 measurements) for each individual.

I want to perform the following operations:

1) Delete all individuals for whom only one measurement is available.
In the dataset above, you can see that I want to delete the row Type B
ID 3, and Type B ID 1, but without deleting the Type A ID 1 data
because there is more than one measurement for Type A ID 1 (but not
for Type B ID1)

2) Produce difference scores for each of the Dates, so each individual
(Type A ID1 and all the others for whom more than one measurement
exists) starts at Date 1 and goes up in integers according to how
many days have elapsed.

I just know there's some incredibly cunning R-ish way of doing this
but after many hours of fiddling I have had to admit defeat.


Not sure about terribly cunning. Let's assume your dataframe was read 
in with stringsAsFactors=FALSE and is called txt.df:



 txt.df$dt2 - as.Date(txt.df$Date, format=%d/%m/%Y)
 txt.df
  Type ID   Date Valuedt2
1A  1 16/09/2020 8 2020-09-16
2A  1 23/09/2010 9 2010-09-23
3B  3  18/8/2010 7 2010-08-18
4B  1  13/5/2010 6 2010-05-13

 txt.df$nn - ave(txt.df$ID,txt.df$ID, FUN=length)
 txt.df
  Type ID   Date Valuedt2 nn
1A  1 16/09/2020 8 2020-09-16  3
2A  1 23/09/2010 9 2010-09-23  3
3B  3  18/8/2010 7 2010-08-18  1
4B  1  13/5/2010 6 2010-05-13  3
 txt.df[ -which( txt.df$nn =1), ]
  Type ID   Date Valuedt2 nn
1A  1 16/09/2020 8 2020-09-16  3
2A  1 23/09/2010 9 2010-09-23  3
4B  1  13/5/2010 6 2010-05-13  3

# Task #1 accomplished

 tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x)
$`1`
Time differences in days
[1]0 3646 3779

$`3`
Time difference of 0 days

 unlist( tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) )
  11   12   133
   0 3646 37790
 txt.df$diffdays - unlist( tapply(txt.df$dt2, txt.df$ID, function(x) 
x[1] -x) )

 txt.df
  Type ID   Date Valuedt2 nn diffdays
1A  1 16/09/2020 8 2020-09-16  30
2A  1 23/09/2010 9 2010-09-23  3 3646
3B  3  18/8/2010 7 2010-08-18  1 3779
4B  1  13/5/2010 6 2010-05-13  30






I would be very grateful for any words of advice.

Many thanks,
Chris Beeley,
Institute of Mental Health, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove rows based on frequency of factor and then difference date scores

2010-08-24 Thread Abhijit Dasgupta, PhD
The paste-y argument is my usual trick in these situations. I forget 
that tapply can take multiple ordering arguments :)


Abhijit

On 08/24/2010 02:17 PM, David Winsemius wrote:


On Aug 24, 2010, at 1:59 PM, Abhijit Dasgupta, PhD wrote:

The only problem with this is that Chris's unique individuals are a 
combination of Type and ID, as I understand it. So Type=A, ID=1 is a 
different individual from Type=B,ID=1. So we need to create a unique 
identifier per person, simplistically by uniqueID=paste(Type, ID, 
sep=''). Then, using this new identifier, everything follows.


I see your point. I agree that a tapply method should present both 
factors in the indices argument.


 new.df - txt.df[ -which( txt.df$nn =1), ]
 new.df - new.df[ with(new.df, order(Type, ID) ), ]  # and possibly 
needs to be ordered?
 new.df$diffdays - unlist( tapply(new.df$dt2, list(new.df$ID, 
new.df$Type), function(x) x[1] -x) )

 new.df
  Type ID   Date Valuedt2 nn diffdays
1A  1 16/09/2020 8 2020-09-16  30
2A  1 23/09/2010 9 2010-09-23  3 3646
4B  1  13/5/2010 6 2010-05-13  30

But do not agree that you need, in this case at least, to create a 
paste()-y index. Agreed, however, such a construction can be useful in 
other situations.





--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to apply apply?!

2010-08-06 Thread Abhijit Dasgupta, PhD

For 1, an easy way is

dat - transform(dat, CLOSE2=2*CLOSE)

For 2:

apply(dat,1,fun)

On 08/06/2010 03:06 PM, Raghuraman Ramachandran wrote:

guRus

I have say a dataframe, d and I wish to do the following:

1) For each row, I want to take one particular value of the row and multiply
it by 2. How do I do it. Say the data frame is as below:
OPEN HIGH LOW CLOSE 1931.2 1931.2 1931.2 1931.2 0 0 0 999.05 0 0 0 1052.5
0 0 0 987.8 0 0 0 925.6 0 0 0 866 0 0 0 1400.2 0 0 0 754.5 0 0 0 702.6 0 0 0
653.25 0 0 0 348 0 0 0 801 866.55 866.55 866.55 866.55 783.1 783.1 742.25
742.25 575 575 575 575 0 0 0 493 470 470 420 425 355 360 343 360 312.05
312.05 274 280.85 257.35 257.35 197 198.75 182 185.95 137 150.75 120.25 129
90.7 101.25 91.85 91.85 57 66.6

How do I multiply only the close of every row using the 'apply' function?
And once multiplied how do I obtain a new table that also contains the new
2*CLOSE column (without cbind?).

2) Also, how do I run a generic function per row. Say for example I want to
calculate the Implied Volatility for each row of this data frame ( using the
RMterics package). How do I do that please using the apply function? I am
focusing on apply because I like the vectorisation concept in R and I do not
want to use a for loop etc.

Many thanks for the enlightment,
Raghu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract se(coef) from cph?

2010-08-05 Thread Abhijit Dasgupta, PhD

if the cph model fit is m1, you can try

sqrt(diag(m1$var))

This is coded in print.cph.fit (library(rms))

On 08/05/2010 04:03 PM, Biau David wrote:

Hello,

I am modeling some survival data wih cph (Design). I have modeled a predictor
which showed non linear effect with restricted cubic splines. I would like to
retrieve the se(coef) for other, linear, predictors. This is just to make nice
LateX tables automatically. I have the coefficients with coef().

How do I do that?

Thanks,

  David Biau.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   



--

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.