date:20111004

You are right, but this is difficult or impossible to really solve.

The problem is that summary() is an S3 generic(?UseMethod) -- so essentially
it can mean anything and do anything depending on the structure to which
it's applied. In your case, the structures were a data frame and a vector
(that it was a column of the data frame is irrelevant) and, as you noted,
different options were used for the two functions. But it could be -- and
probably does get -- much worse than that.

The ability to dispatch different methods from a single generic call based
on the structure of the object to which a function is applied is generally
viewed as a positive feature of OO languages (of which native R has some
features). But nothing's perfect.

-- Bert

On Mon, Oct 3, 2011 at 8:12 PM, Jeanne M. Spicer xn8spi...@gmail.comwrote:

 The summary function behaves inconsistently with data frame columns, e.g.

 summary(rock)   #max of area 12212, correct
 summary(rock$area)  #max of area 12210, incorrect max

 I know that
 summary(rock$area, digits=5)
 will correct the error (I DID read the manual). But my point is the
 inconsistency, because I get the correct answer without having to add the
 digits option in the first statement when referring to the full dataframe.
 This is one of the first functions that beginners use and if they have to
 RTM and tinker with options before they can get a consistent value for the
 max of an integer column, it is off-putting to say the least. At worst it
 confirms the skeptic's suspicion that open-source software is a bit flaky.
  Would it be out of line to report this to r-bugs -- at least to improve on
 the documentation?

 -jms
 r2.13.1 maclion


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting a polygon with xyplot

2011-10-04 Thread Ken Knoblauch

markm0705 markm0705 at gmail.com writes:
 I would like to plot a string of points as a polygon in xyplot.  I'm a bit
 lost as to how to get the points plotting in the correct order.  I would
 also like some hints on how to render or fill the polygon.
 
 Scrpt below and data file attached
 
 Thanks
 
 Markm
 
 library(lattice)
 
 # set size of the window
 windows(height=7, width=10,rescale=c(fixed))
 
 Data_poly- read.table(111004_Lode_Outlines.csv,header = TRUE,sep = ,,)
 
 xyplot(z~y,
   data=Data_poly,
   type=l
 ) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv
 111004_Lode_Outlines.csv 
 

Before you try this with lattice, you might spend some time
getting your abscissa values in an order that will plot the
contour in a sequential fashion.  It's not obvious how to
do this a priori.  Here is a simple-minded attempt after looking
at your graphic, just using base graphics.  Maybe, it will
be sufficient for you to tweak it a bit further for what you
want.

Data_poly- 
read.table(http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv;,
header = TRUE,sep = ,,)
par(mfrow = c(1, 2), pty = s)
plot(z ~ y, Data_poly, type = l)

fh - with(Data_poly, which(z  240))
D_poly - rbind(Data_poly[fh, ], Data_poly[-rev(fh), ])
D_poly - rbind(D_poly, Data_poly[1, ])

plot(z ~ y, D_poly, type = n)
with(D_poly, polygon(y, z, col = lightblue))

-- 
Ken Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package.skeleton generates .env = environment

2011-10-04 Thread Duncan Murdoch


On 04/10/2011 6:40 AM, pedabreu wrote:

Hello,

i trying to create a package using package.skeleton. I use R.oo package to
create oriented-object classes. When i use package.skeleton, this creates
the following file:

classA-
structure(function()
{

  extend(Object(),Class A,
 .var1= NULL)


}
, .env =environment, class = c(Class, Object), formals = c(public,
class), modifiers = c(public, class))

Then i compile using R CMD build myPkg.

when i try to install.package and give this error:

  /tmp/RtmpaOZ7IQ/R.INSTALL412da433/JSSbase/R/GTHeuristic.R:7:10:
unexpected ''
6:}
7: , .env =

why the package.skeleton creates .env =environment??


package.skeleton tries to deparse your code, but in some cases, that 
can't be done.  As ?deparse says, However, not all objects are 
deparse-able even with this option and a warning will be issued if the 
function recognizes that it is being asked to do the impossible.


What you need to do is to copy your original source code that created 
classA into the package source.  Presumably it uses some functions from 
R.oo to construct the object properly.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Hadley Wickham

On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov wrote:
  I'm interested in creating a graphic -like- this:

 c - ggplot(mtcars, aes(qsec, wt))
 c + geom_point() + stat_smooth(fill=blue, colour=darkblue, size=2, alpha
 = 0.2)

 but I need to show 2 sets of bands (with different shading) using 5%, 25%,
 75%, 95% limits that I specify and where the heavy blue line is the median.
 I don't understand how to do this with ggplot2.

Exactly what sort of limits do you want?  It sounds like maybe you are
looking for smoothed quantile regression.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding multiple gates/filters in densityplot

2011-10-04 Thread Michael Jahn

Hi R-Users,

I posted this question a while ago on the bioconductor mailing list but got no 
answers. Maybe here is somebody who might know a solution:

I failed at drawing multiple filters in a densityplot() using the 
FlowCore/FlowViz packages.
I
 found a way to draw multiple filters in xyplot(), using the glpolygon 
method within the panel-function, but some similar attempts for 
densityplot failed.
I could draw simply some vertical lines using 
panel.abline, but this doesn't look as appealing as the original method 
when using a single filter with the standard filter=xyz argument.
I 
bet there is a method to draw multiple gates through the panel-function,
 as curv1filter can also identify multiple peaks automatically and 
draw them into a densityplot...


This script works for  xyplot but not for densitylot:

    library(flowCore)
    library(flowViz)


    data(GvHD)
    Filter1        -    rectangleGate(filterId=Filter1, FSC-H = c(0, 200))
    Filter2        -    rectangleGate(filterId=Filter1, FSC-H = c(300, 
400))


    xyplot( `SSC-H` ~ `FSC-H` , data=GvHD[[1]],
        panel = function(...) { 
            panel.xyplot.flowset(...)
           glpolygon( Filter1 )
           glpolygon( Filter2 )
        }
    )
    

    densityplot( ~ `FSC-H`, data=GvHD[[1]],
        panel = function(...) { 
            panel.densityplot.flowset(...)
            glpolygon( Filter1 )
            glpolygon( Filter2 )
        }
    )

The glpolygon method yields not the typical look of the densityplot filters, 
but red lined gate boundaries. The desired look of the filter is a lighter 
color and dotted lines as limits.
Thank you in advance!

All the best,
Michael


--
Michael Jahn
PhD student
Helmholtz-Centre for Environmental Research
Leipzig, Germany
http://www.ufz.de(http://www.ufz.de/)


(http://www.ufz.de/)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting a polygon with xyplot

?? Use the appropriate panel function, not panel.xyplot().
If you don't know what this means, you need to read up on lattice/trellis
graphics.

?panel.polygon

-- Bert

On Tue, Oct 4, 2011 at 7:14 AM, Ken Knoblauch ken.knobla...@inserm.frwrote:

 markm0705 markm0705 at gmail.com writes:
  I would like to plot a string of points as a polygon in xyplot.  I'm a
 bit
  lost as to how to get the points plotting in the correct order.  I would
  also like some hints on how to render or fill the polygon.
 
  Scrpt below and data file attached
 
  Thanks
 
  Markm
 
  library(lattice)
 
  # set size of the window
  windows(height=7, width=10,rescale=c(fixed))
 
  Data_poly- read.table(111004_Lode_Outlines.csv,header = TRUE,sep =
 ,,)
 
  xyplot(z~y,
data=Data_poly,
type=l
  ) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv
  111004_Lode_Outlines.csv
 

 Before you try this with lattice, you might spend some time
 getting your abscissa values in an order that will plot the
 contour in a sequential fashion.  It's not obvious how to
 do this a priori.  Here is a simple-minded attempt after looking
 at your graphic, just using base graphics.  Maybe, it will
 be sufficient for you to tweak it a bit further for what you
 want.

 Data_poly-
 read.table(
 http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv;,
 header = TRUE,sep = ,,)
 par(mfrow = c(1, 2), pty = s)
 plot(z ~ y, Data_poly, type = l)

 fh - with(Data_poly, which(z  240))
 D_poly - rbind(Data_poly[fh, ], Data_poly[-rev(fh), ])
 D_poly - rbind(D_poly, Data_poly[1, ])

 plot(z ~ y, D_poly, type = n)
 with(D_poly, polygon(y, z, col = lightblue))

 --
 Ken Knoblauch
 Inserm U846
 Stem-cell and Brain Research Institute
 Department of Integrative Neurosciences
 18 avenue du Doyen Lépine
 69500 Bron
 France
 tel: +33 (0)4 72 91 34 77
 fax: +33 (0)4 72 91 34 61
 portable: +33 (0)6 84 10 64 10
 http://www.sbri.fr/members/kenneth-knoblauch.html

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question about linear mixed effects model (nlme)

2011-10-04 Thread Panagiotis

Hi,

I applied a linear mixed effect model in my data using the nlme package.
lme2-lme(distance~temperature*condition, random=~+1|trial, data) and then
anova. 
I want to ask if it is posible to get the least squares means for the
interaction effect and the corresponding 95%ci. And then plot this values.

Thank you 
Panagiotis

--
View this message in context: 
http://r.789695.n4.nabble.com/Question-about-linear-mixed-effects-model-nlme-tp3871203p3871203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] `partykit': A Toolkit for Recursive Partytioning

2011-10-04 Thread Torsten Hothorn



New package `partykit': A Toolkit for Recursive Partytioning

The purpose of the package is to provide a toolkit with infrastructure for
representing, summarizing, and visualizing tree-structured regression and
classification models. Thus, the focus is not on _inferring_ such a
tree structure from data but to _represent_ a given tree so that
printing/plotting and computing predictions can be performed in a
standardized  way. In particular, this unified infrastructure can be
used for reading/coercing tree models from different sources
(packages `rpart', `RWeka', `PMML') yielding objects that share
functionality for `print()', `plot()', and `predict()' methods.

The impatient users will hopefully have fun with

install.packages(partykit)
library(partykit)
library(rpart)
### from ?rpart
fit - rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
plot(as.party(fit))


Best,

Torsten  Achim

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Hadley Wickham

You need to set the labels...

Hadley

On Sat, Sep 24, 2011 at 3:49 AM, Casper Ti. Vector
caspervec...@gmail.com wrote:
 Is there any way to use expression() in legend labels with ggplot2?

 It seems that things like
 scale_shape_manual(value = c(
   x = expression(italic(x)),
   y = expression(italic(y))
 ))
 don't work.

 Thanks very much :)

 --
    Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134,
 valid from 2010 to 2013) from a key server.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about linear mixed effects model (nlme)

Below.

On Tue, Oct 4, 2011 at 7:34 AM, Panagiotis p...@hi.is wrote:

Hi,

I applied a linear mixed effect model in my data using the nlme package.
lme2-lme(distance~temperature*condition, random=~+1|trial, data) and then
anova.
I want to ask if it is posible to get the least squares means for the
interaction effect and the corresponding 95%ci. And then plot this values.

Uh-Oh. You may have unloosed The Wrath of Khan -- or at least of Venables.
(An explanation of this cryptic remark should follow from others, so please
do not ask me what it means if you do not know).
:-)

-- Bert

Thank you
Panagiotis

--
View this message in context:
http://r.789695.n4.nabble.com/Question-about-linear-mixed-effects-model-nlme-tp3871203p3871203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

Re: [R] how do i put two scatterplots on same graph

2011-10-04 Thread William Revelle



If the data are from one data.frame (e.g., the iris data set), then simply 
label the red and white flowers with different colors:
e.g.,

with the iris data set

plot(iris$Sepal.Length,iris$Sepal.Width,col=c(red,blue,black)[iris$Species],pch=c(16:18)[iris$Species])

Bill




On Oct 4, 2011, at 4:20 AM, Paul Hiemstra wrote:

 On 10/04/2011 06:19 AM, jricci wrote:
 Have two sets of scatterplot data
 hypothetically  
 a) stem lenght vs number of petals in red flowers
 b) stem lenght vs number of petals in white flowers
 
 want to place on same scatter plot with same x,y axis but different collored
 markers
 
 How do I do this in R
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Hi,
 
 You could take a look at the ggplot2 package.
 
 good luck,
 Paul
 
 -- 
 Paul Hiemstra, Ph.D.
 Global Climate Division
 Royal Netherlands Meteorological Institute (KNMI)
 Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
 P.O. Box 201 | 3730 AE | De Bilt
 tel: +31 30 2206 494
 
 http://intamap.geo.uu.nl/~paul
 http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

William Revellehttp://personality-project.org/revelle.html
Professor  http://personality-project.org
Department of Psychology   http://www.wcas.northwestern.edu/psych/
Northwestern Universityhttp://www.northwestern.edu/
Use R for psychology http://personality-project.org/r
It is 6 minutes to midnighthttp://www.thebulletin.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inconsistent behavior of summary function

On Tue, Oct 4, 2011 at 7:42 AM, Jeanne M. Spicer xn8spi...@gmail.comwrote:

 I'm not sure how returning an incorrect result is ever a 'positive' feature



It is **not** incorrect; perhaps unexpected, but that is not the same.


 but at least the documentation could more clearly warn users that this
 method behaves differently in these cases -- summary(rock[,1]) vs
 summary(rock[,1:2]) -- and that the method can and *does* return incorrect
 results without any warning messages.


What is (in)adequate in documentation is often in the mind of the beholder.

Note:
 class(rock[,1])
[1] integer

 class(rock[,1:2])
[1] data.frame

This means that different methods are dispatched, leading to the different
results. Morever,
 summary(rock[,1,drop=FALSE])
  area
 Min.   : 1016
 1st Qu.: 5305
 Median : 7487
 Mean   : 7188
 3rd Qu.: 8870
 Max.   :12212

... and that is because
 class(rock[,1,drop=FALSE])
[1] data.frame

So the relevant Help file is ?[.data.frame





 I would encourage anyone teaching introductory R to look at the 'epicalc'
 package.  The re-vamped function 'summ' in that package returns correct
 results regardless - summ(rock), summ(rock$area).  In addition, when you
 only ask for one column you not only get the correct results, you also get a
 bonus distribution plot.

 I'd would like all of our students to use R, but little things like this
 are huge stumbling blocks for them.


I have no doubt that this is true. R is powerful, flexible and, as an
inevitable result, complex. To master it, honest effort is required,
probably a somewhat scarce commodity in introductory classes, especially for
non-statisticians. For that reason, there are numerous learning resources
available, to be found on CRAN. Have you looked at them? Moreover,there are
several R GUI's that attempt to shield the beginner from the initial shock,
to be found in the R-GUIs link under Other Projects. Have you considered
those?

So I think something more than righteous indignation is called for here.
Nevertheless, the bottom line is that you get what you pay for: R **IS**
hard -- but for many serious data analysts of all stripes, worth the effort.

Cheers,
Bert



 -jeanne





-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] file input with readLines




On 03.10.2011 19:19, Cable, Sam B Civ USAF AFMC AFRL/RVBXI wrote:

I am using readLines to read a fairly large ASCII file.  readLines reads
a fixed number of lines, then other R code processes the data, then
readLines reads the same number of lines again, then other R code
processes the data, then 



Sort of like:



conn-file('filename','r')

for (chunk in 1:10) {

Lines-readLines(conn,n=25)

   # process Lines

}



The code is working, but I notice that it slows down greatly as time
progresses.  It took 2 seconds to read my first chunk of data, 4 seconds
to read the next chunk, 10 after that.  The quasi-exponential trend has
slowed, thank goodness, but after about a hundred reads, the read time
for the next chunk is over a minute.  Let me stress that the number of
lines read in each chunk of data is absolutely fixed.



The only processing I am doing at the point is to parse the new data,
and rbind the results to an existing data frame.


And that's may be the interesting point.
Have you tried to allocate the whole data.frame and assign into it 
later? It is probbaly not readLines() slowing you down.
A minute seems to be quite a lot for resonable sized data. How many 
columns are we talking about?.


Uwe Ligges





 Processing of new data
in no way depends on earlier data.



So, my question is why is the reading taking longer as time goes on?  Is
there a way to fix this?  Is there a better method than readLines?



Thanks.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] adding a dummy variable...

2011-10-04 Thread grazia

Hi all,

I have a dataset of individuals where the variable ID corresponds to the
identification of the household where the individual lives. rel.head stands
for the relationship with the household head. so rel.head=1 is the household
head, rel.head=2 is the spouse, rel.head=3 is the children.

Here is an example to see how it looks like:

df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103, 17103,
 17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


I want to add a dummy variable that is equal to 1 when these conditions
held simultaneously :

a) the number of rows with same ID is equal to 2
b) the variable rel.head=1 and rel.head=3


So my ideal output is:

   ID  rel.head   added.dummy
1  171001   1
2  171003   1
3  171011   0
4  171021   0
5  171031   0
6  171032   0
7  171041   0
8  171042   0
9  171043   0
10 171051   1
11 171053   1

Is there a simple way to do that?
Can somebody help?

Thanks in advance,
Grazia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installation from local Compiled directory




On 03.10.2011 18:16, Sandeep Patil wrote:

Hello everyone

I have manually compiled directory of gstat in a particular folder of my
Unix system.
I want to install this and am unable to use either of the following two
commands

1. R CMD INSTALL
2. Install.packages





If this is a precompiled (i.e. binary) package produced for this R 
version and this OS, then the magic is to just copy the directory into 
your library.


best,
Uwe Ligges



I do not understand how to coax above commands to locate the directory that
i have
compiled.

Please understand that i have solved a number of related issues concerning
this
installation and it is a special case where

1. I cannot use CRAN mirror to download and install
2. Install from TAR file

Essentially this is the only option i have.

Thank you

Sandeep

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [Workshop] Finance with R

2011-10-04 Thread Peter Ruckdeschel

The Financial Mathematics department of Fraunhofer ITWM 
is offering a two-days workshop on Finance with R:

%-
[Workshop] Finance with R
%-

Oct 20, 2011, 10:00-17:00 and
Oct 21, 2011,  9:00-16:00

Fraunhofer ITWM, Fraunhofer-Platz 1, 67663 Kaiserslautern,
Germany

%-
Scope and purpose
%-

This workshop provides an introduction to R for professionals 
and academics in Finance. 

It gives an insight into possibilities of data analysis and 
statistics with R, import of data sets, generation of graphics 
and preparation of reports, according to their relevance 
in Finance.

Besides providing insight into financial modeling in R, in 
particular we demonstrate the use of the Rmetrics family 
of R packages as well as an R bridge to the Quantlib library. 
We also cover integration of R into Excel, interaction with 
Matlab, and import from Bloomberg.

%
Benefits of attending
%

The workshop provides insight into statistical models and 
concepts in R which are useful for various problems arising 
in Finance. 

The attendees will be able to import datasets into R, analyze 
them statistically and apply concepts from time series modeling. 

In practical sessions, the attendees will learn and practice 
how to use R.

The fee for the workshop is 500 EUR.

For further details, see
http://www.itwm.fraunhofer.de/en/departments/financial-mathematics/events/2011-workshop-series.html

Peter Ruckdeschel

-- 
Dr. habil. Peter Ruckdeschel, Abteilung Finanzmathematik, F3.17
Fraunhofer ITWM, Fraunhofer Platz 1, 67663 Kaiserslautern
Telefon:  +49 631/31600-4699   Fax:  +49 631/31600-5699
E-Mail :  peter.ruckdesc...@itwm.fraunhofer.de
http://www.itwm.fraunhofer.de/abteilungen/finanzmathematik/mitarbeiterinnen/mitarbeiter/dr-peter-ruckdeschel.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] handling constant factors in prediction using svm

On 04.10.2011 08:53, Divyam wrote:

Hi users!

I am fitting a model with several factor variables as independents using
svm. since there are lots of categorical variables,the training and test
data sets have been created using dummy.data.frame option from dummies
package. I have a factor A in the training data set with 2 levels (0,1).In
the test set, this factor A has only 1 level (1) and hence when applying
dummy.data.frame, the variable gets dropped(and that's how i want it too).
The problem comes when I am trying to predict the test data as an error is
thrown saying A0 object is not found. Is there anyway to solve this
problem?

Errr, if you learned a model that predicts based on several variables,
including A0, what do you expect what happens if A0 is not given? Well,
you cannot predict. So if A0 is constant in your test cases, just supply it!

To simplify, consider a linear model y=bX+e. Now one column of X is
missing for prediction. y will be undefined, obviously.

Uwe Ligges

Thanks
Divya

--
View this message in context:
http://r.789695.n4.nabble.com/handling-constant-factors-in-prediction-using-svm-tp3870093p3870093.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] The use of period in function names and variable names

2011-10-04 Thread jdospina

Hello.

Not at all in the way you have shown. Just to improve your code
readability, try to avoid naming your variables beginning with period
(example: .hello).

In contrast with Matlab (for example) the period in R is not to have access
to an object property.

--
View this message in context: 
http://r.789695.n4.nabble.com/The-use-of-period-in-function-names-and-variable-names-tp3869913p3871407.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Jeanne M. Spicer

I'm not sure how returning an incorrect result is ever a 'positive' feature but 
at least the documentation could more clearly warn users that this method 
behaves differently in these cases -- summary(rock[,1]) vs summary(rock[,1:2]) 
-- and that the method can and does return incorrect results without any 
warning messages.   

I would encourage anyone teaching introductory R to look at the 'epicalc' 
package.  The re-vamped function 'summ' in that package returns correct results 
regardless - summ(rock), summ(rock$area).  In addition, when you only ask for 
one column you not only get the correct results, you also get a bonus 
distribution plot.  

I'd would like all of our students to use R, but little things like this are 
huge stumbling blocks for them. 
-jeanne 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Casper Ti. Vector

Hmm, that's my fault when composing this mail, but the problem was
really encountered at that time.
Nevertheless, neither can I reproduce the problem now, perhaps I just
made another mistake at that time.
Thanks all the same, and sorry for the disturbance anyway :|

On Tue, Oct 04, 2011 at 10:10:56AM -0500, Hadley Wickham wrote:
 You need to set the labels...

-- 
Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134,
valid from 2010 to 2013) from a key server.



signature.asc
Description: Digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about linear mixed effects model (nlme)

2011-10-04 Thread Ben Bolker

Bert Gunter gunter.berton at gene.com writes:

 
 Below.
 
 On Tue, Oct 4, 2011 at 7:34 AM, Panagiotis pat2 at hi.is wrote:
 
  Hi,
 
  I applied a linear mixed effect model in my data using the nlme package.
  lme2-lme(distance~temperature*condition, random=~+1|trial, data) and then
  anova.
  I want to ask if it is posible to get the least squares means for the
  interaction effect and the corresponding 95%ci. And then plot this values.
 
 
 Uh-Oh. You may have unloosed The Wrath of Khan -- or at least of Venables.
 (An explanation of this cryptic remark should follow from others, so please
 do not ask me what it means if you do not know).
 

   You should probably ask (a version of) this question on the
r-sig-mixed-models list instead.
  What do you mean by the least squares means for the interaction effect?
How is it different from the estimate of the interaction parameter?
You can use the predict() function if you want to calculate predicted
values for any particular combination of predictors (you probably want
to specify level=0 to get the population-level effects).  Getting 'good'
confidence intervals for mixed-effect models is surprisingly difficult.
If you are willing to ignore the uncertainty of the among-trial variance,
you can use a modification of the recipe found at http://glmm.wikidot.com/faq

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Dennis Murphy

Hi:

Here's a reproducible example:

d - data.frame(grp = factor(rep(c('x', 'y'), each = 5)),
 ev = rnorm(10), dv = rnorm(10))
labl - list(expression(italic('x')), expression(italic('y')))

ggplot(d, aes(x = ev, y = dv, shape = grp)) + geom_point() +
   scale_shape_manual('Group', breaks = levels(d$grp),
   values = 1:2,
   labels = labl)

HTH,
Dennis

On Tue, Oct 4, 2011 at 8:59 AM, Casper Ti. Vector
caspervec...@gmail.com wrote:
 Hmm, that's my fault when composing this mail, but the problem was
 really encountered at that time.
 Nevertheless, neither can I reproduce the problem now, perhaps I just
 made another mistake at that time.
 Thanks all the same, and sorry for the disturbance anyway :|

 On Tue, Oct 04, 2011 at 10:10:56AM -0500, Hadley Wickham wrote:
 You need to set the labels...

 --
    Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134,
 valid from 2010 to 2013) from a key server.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The use of period in function names and variable names

2011-10-04 Thread Steve Lianoglou

Hi,

On Tue, Oct 4, 2011 at 11:39 AM, jdospina jdosp...@gmail.com wrote:
 Hello.

 Not at all in the way you have shown. Just to improve your code
 readability, try to avoid naming your variables beginning with period
 (example: .hello).

Well, that's not exactly true.

It's common practice to name variables with a leading period if you
want them to be considered hidden, in some respect. See the
`all.names` argument to the `ls` function, for instance.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rug plot curve reversal




On 04.10.2011 13:30, Peter Minting wrote:


Dear R-help
Can anyone tell me why my curve appears the wrong way round on a rug plot?
I am using the same code as on pg 596 of the Crawley R-book.



mod-glm(mort~logBd,binomial)


What is mort, what is logBd? I don't have access to the book. I have 
hidden it in my other office so that nobody can find it anymore.




par(mfrow=c(2,2))
xv-seq(0,8,0.01)
yv-predict(mod,list(logBd=xv),type=response)
plot(logBd,mort)
lines(xv,yv)
I've tried swapping xv and yv around but no luck.


Hopefully mort is a binary factor, i.e. with two levels. I that case 
they are at positions 1 and 2 on the y axis in plot().
yv is the reponse, i.e. is in the interval (0,1) if the binomial glm was 
successful. So a different scale.


So I guess
 lines(xv,yv+1)
could help.

Whatelse I think about The R Book can be found in my book review 
published in Statistical Papers.


Best,
Uwe Ligges







Thanks,
Pete
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The use of period in function names and variable names

2011-10-04 Thread Duncan Murdoch


On 04/10/2011 7:04 AM, S Ellison wrote:

See para 10.3.2 'Identifiers' in the R language definition (always distributed 
with R in the html help system), or ?make.names, for a concise statement of 
what constitutes a valid variable name in R.

It's actually underscores that might give trouble with older versions, not '.'. 
But they'd have to be a lot older by R standards (pre 1.9.0).

I am not sure why there has been a recent shift away from periods and towards 
camelCase in some R packages;


Presumably the authors of those packages prefer camelCase.  I don't 
think it's any more complicated than that.


Duncan Murdoch



personally I find a period or underscore much more useful for making a variable 
name readable. And a mix of camelCase and period.breaks makes it a lot harder 
to guess which case-sensitive string to use. The number of different 
combinations of case and period I end up trying for R.Version (occasionally 
used, never quite often enought to be automatic) defies belief ;-).


S Ellison

  From: r-help-boun...@r-project.org On Behalf Of Smart Guy
  Sent: 04 October 2011 05:20
  To: r-help@r-project.org
  Subject: [R] The use of period in function names and variable names

  Hi,
   I am looking for some guidance on whether I can use the
  period(.) in function names and variable names.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inconsistent behavior of summary function




On 04.10.2011 16:42, Jeanne M. Spicer wrote:

I'm not sure how returning an incorrect result is ever a 'positive' feature but 
at least the documentation could more clearly warn users that this method 
behaves differently in these cases -- summary(rock[,1]) vs summary(rock[,1:2]) 
-- and that the method can and does return incorrect results without any 
warning messages.



What are you talking about? Probably it appeared prior in this thread? 
Please always cite.


Anyway, I guess you werre looking for

summary(rock[,1, drop=FALSE])

rock[,1] is implified to a vector whle rock[,1:2] is still a matrix or 
data.frame (and since this is not cited, I do not know).




I would encourage anyone teaching introductory R to look at the 'epicalc' 
package.  The re-vamped function 'summ' in that package returns correct results 
regardless - summ(rock), summ(rock$area).  In addition, when you only ask for 
one column you not only get the correct results, you also get a bonus 
distribution plot.

I'd would like all of our students to use R, but little things like this are 
huge stumbling blocks for them.


Then you told them about summary() before telling how to deal with data 
structures correctly. And that is te m,ost important part in learning R. 
I know from my courses that applied people do not like that, but I 
always managed to convince them this is the most impoertant topic to 
learn about R.


Best,
Uwe Ligges




-jeanne



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] The use of period in function names and variable names




On 04.10.2011 18:18, Duncan Murdoch wrote:

On 04/10/2011 7:04 AM, S Ellison wrote:

See para 10.3.2 'Identifiers' in the R language definition (always
distributed with R in the html help system), or ?make.names, for a
concise statement of what constitutes a valid variable name in R.

It's actually underscores that might give trouble with older versions,
not '.'. But they'd have to be a lot older by R standards (pre 1.9.0).

I am not sure why there has been a recent shift away from periods and
towards camelCase in some R packages;


Presumably the authors of those packages prefer camelCase. I don't think
it's any more complicated than that.


I switched to that when I realized that it is somewhat dangerous to 
conflict with S3 naming conventions and R CMD check yelled correctly 
because I used a generic.class notation where either generic or 
class was really the name of a generic or class but I had not realized 
before.


Uwe





Duncan Murdoch



personally I find a period or underscore much more useful for making a
variable name readable. And a mix of camelCase and period.breaks makes
it a lot harder to guess which case-sensitive string to use. The
number of different combinations of case and period I end up trying
for R.Version (occasionally used, never quite often enought to be
automatic) defies belief ;-).


S Ellison

 From: r-help-boun...@r-project.org On Behalf Of Smart Guy
 Sent: 04 October 2011 05:20
 To: r-help@r-project.org
 Subject: [R] The use of period in function names and variable names

 Hi,
 I am looking for some guidance on whether I can use the
 period(.) in function names and variable names.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with .C

Without knowing that C code, we cannot know. Have you read Writing R 
Extensions carefully? I.e. take care with memory allocation and printing 
as mentioned in the manual.


Uwe Ligges


On 04.10.2011 14:04, Grigory Alexandrovich wrote:

Hello,

I wrote a function in C, which works fine if called from the
main-function in C.

But as soon as I try to call this function from R like .C('foo',
as.double(x), as.integer(y)), the programm crashes.

I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and
loaded it into R with dyn.load().

What can be the cause of such behaviour?
Again, the C-funcion itself works, but not if called from R.

Thanks
Grigory Alexandrovich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding a dummy variable...

2011-10-04 Thread Martyn Byng

Hi,

I am sure there are better / more efficient ways of doing this, but the
following seems to work ...

ids - sapply(split(df,df$ID),function(x) {length(x$rel.head)==2  
any(x$rel.head==1)  any(x$rel.head==3)})
ids - as.numeric(names(ids)[ids])
added.dummy - as.numeric(df$ID%in%ids)
cbind(df,added.dummy)

Martyn

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of gra...@stat.columbia.edu
Sent: 04 October 2011 16:45
To: r-help@r-project.org
Subject: [R] adding a dummy variable...

Hi all,

I have a dataset of individuals where the variable ID corresponds to the
identification of the household where the individual lives. rel.head
stands
for the relationship with the household head. so rel.head=1 is the
household
head, rel.head=2 is the spouse, rel.head=3 is the children.

Here is an example to see how it looks like:

df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103,
17103,
 17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


I want to add a dummy variable that is equal to 1 when these conditions
held simultaneously :

a) the number of rows with same ID is equal to 2
b) the variable rel.head=1 and rel.head=3


So my ideal output is:

   ID  rel.head   added.dummy
1  171001   1
2  171003   1
3  171011   0
4  171021   0
5  171031   0
6  171032   0
7  171041   0
8  171042   0
9  171043   0
10 171051   1
11 171053   1

Is there a simple way to do that?
Can somebody help?

Thanks in advance,
Grazia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with .C

2011-10-04 Thread Jeff Newmiller

This looks like a classic case of not reading the manual, and then compounding 
it by not reading the posting guide. The manual would be the Writing R 
Extensions pdf that comes with R or you can google it. The posting guide is 
referenced at the bottom of this and every other posting on this mailing list.
There are nearly an infinite variety of errors that can lead to a crash, so 
it is really unreasonable of you to pose this question this way and expect 
constructive assistance.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Grigory Alexandrovich alexandrov...@mathematik.uni-marburg.de wrote:

Hello,

I wrote a function in C, which works fine if called from the 
main-function in C.

But as soon as I try to call this function from R like .C('foo', 
as.double(x), as.integer(y)), the programm crashes.

I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and 
loaded it into R with dyn.load().

What can be the cause of such behaviour?
Again, the C-funcion itself works, but not if called from R.

Thanks
Grigory Alexandrovich

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Giant font on the R plots...




On 04.10.2011 11:10, D.Emad wrote:

Hello,

I've been facing a really stupid problem... When I try to plot using
heatplot or hclust or any similar function, the labels of the x-axis - which
are the samples names - are giant  overlapping. I can't even read the
samples names!


R heatplot
Error: object 'heatplot' not found

R hclust(dist(USArrests), ave)
# does not plot anything

So let m try

R plot(hclust(dist(USArrests), ave))
# no x axis

Do you mean the labels at the dendrogram? These are controlled by cex 
(rather than cex.lab).


Uwe Ligges




I tried  cex.lab = 0.5, it helped only with the y axis and not the x-axis...
Any help please?!

--
View this message in context: 
http://r.789695.n4.nabble.com/Giant-font-on-the-R-plots-tp3870335p3870335.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] number of analogs in significance test of MAT reconstructions using randomTF from palaeoSig

2011-10-04 Thread Jason Paul Joines

I'm trying to use the randomTF function from package palaeoSig to 
test the significance of a MAT reconstruction with nine analogs and a 
WA-PLS reconstruction with four components.  I'm probably missing 
something obvious here but how do I make sure that randomTF is testing 
the reconstruction based on the desired number of analogs / components?


In:
fitmap.wapls = WAPLS( lumapspc, lumap)
sig.wapls = randomTF( spp = sqrt( lumapspc ), env = lumapenv, fos = 
sqrt( hcspc ), n = 999, fun = WAPLS, col = 4 )
I assume col = 4 tells randomTF to test the reconstruction based 
on the four component WA-PLS model as that's what the documentation 
seems to indicate.


However, in:
fitmap.mat = MAT( lumapspc, lumap, dist.method = chord, k = 20 )
sig.mat = randomTF( spp = lumapspc, env = lumapenv, fos = hcspc, n = 
999, fun = MAT, col = 9 )
it seems that col = 9 does not tell randomTF to test the 
reconstruction based on the 9 analog MAT model.  If I give col a value 
other than one or two, I get a subscript out of bounds error.  So I 
assume the col argument in this case selects between the mean and 
weighted mean predictions.
If I pass additional arguments, k = 9 and dist.method = chord to 
randomTF, then the values of sig.mat$preds do not match the values 
obtained from:

predmap.mat = predict( fitmap.mat, hcspc, k = 9 )
Also, if I give randomTF a k value less than 5, I get the error k 
out of range.  So, passing k to randomTF must not be telling randomTF 
to use that number of analogs as I would not be able to select a four 
analog model.



Jason
===

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Thomas . Adams

Hadley,

Thanks for responding. No, not smoothed quantile regression. If you go here: 
http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored 
squares, you can see we have 'boxplots'. What I want to express is the 
uncertainty as depicted in the example from my previous email where I can 
specify the limits calculated for the 'boxplots' using  5%, 25%,75%, 95% limits 
as we have with the 'boxplots'.

Tom

- Original Message -
From: Hadley Wickham had...@rice.edu
Date: Tuesday, October 4, 2011 10:23 am
Subject: Re: [R] Question about ggplot2 and stat_smooth
To: Thomas Adams thomas.ad...@noaa.gov
Cc: R-help forum r-help@r-project.org


 On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov 
 wrote:
   I'm interested in creating a graphic -like- this:
 
  c - ggplot(mtcars, aes(qsec, wt))
  c + geom_point() + stat_smooth(fill=blue, colour=darkblue, 
 size=2, alpha
  = 0.2)
 
  but I need to show 2 sets of bands (with different shading) using 
 5%, 25%,
  75%, 95% limits that I specify and where the heavy blue line is the 
 median.
  I don't understand how to do this with ggplot2.
 
 Exactly what sort of limits do you want?  It sounds like maybe you are
 looking for smoothed quantile regression.
 
 Hadley
 
 -- 
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding a dummy variable...

2011-10-04 Thread Dennis Murphy

Hi:

Here's another way to do it with the plyr package, also not terribly
elegant. It assumes that rel.head is a factor in your original data
frame:
 str(df)
'data.frame':   11 obs. of  2 variables:
 $ ID  : Factor w/ 6 levels 17100,17101,..: 1 1 2 3 4 4 5 5 5 6 ...
 $ rel.head: Factor w/ 3 levels 1,2,3: 1 3 1 1 1 2 1 2 3 1 ...

If this is not the case in your data, then you need to modify the
function f below accordingly. (This is why use of dput() is preferred
when sending example data to R-help, BTW.)

library('plyr')
f - function(d) {
tvec - factor(c(1, 3), levels = 1:3)   # target vector
if(nrow(d) != 2L) {d$dummy - rep(0, nrow(d)); return(d)}
# If the first if statement is FALSE, then the following code is run:
   d$dummy - ifelse(!identical(d[, 2], tvec), 0, 1)
   d
   }

ddply(df, .(ID), f)

  ID rel.head dummy
1  171001 1
2  171003 1
3  171011 0
4  171021 0
5  171031 0
6  171032 0
7  171041 0
8  171042 0
9  171043 0
10 171051 1
11 171053 1

HTH,
Dennis

On Tue, Oct 4, 2011 at 8:44 AM,  gra...@stat.columbia.edu wrote:
 Hi all,

 I have a dataset of individuals where the variable ID corresponds to the
 identification of the household where the individual lives. rel.head stands
 for the relationship with the household head. so rel.head=1 is the household
 head, rel.head=2 is the spouse, rel.head=3 is the children.

 Here is an example to see how it looks like:

 df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103, 17103,
                     17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


 I want to add a dummy variable that is equal to 1 when these conditions
 held simultaneously :

 a) the number of rows with same ID is equal to 2
 b) the variable rel.head=1 and rel.head=3


 So my ideal output is:

   ID      rel.head   added.dummy
 1  17100        1           1
 2  17100        3           1
 3  17101        1           0
 4  17102        1           0
 5  17103        1           0
 6  17103        2           0
 7  17104        1           0
 8  17104        2           0
 9  17104        3           0
 10 17105        1           1
 11 17105        3           1

 Is there a simple way to do that?
 Can somebody help?

 Thanks in advance,
 Grazia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading stopwords from a csv file

2011-10-04 Thread vioravis

I am using the tm package to do text miniing:

I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:

stopwordlist - read.csv(stopwords to be Removed 10042011.csv)
myStopwords - as.character(stopwordlist$stopwords)

When try removing the stopwords using 

tr1=tm_map(tr1,removeWords,myStopwords)

I am getting the following error:

Error in gsub(sprintf(\\b(%s)\\b, paste(words, collapse = |)), ,  : 
  internal error in compiling regexp

However, this works fine when I define myStopwords = c() instead of
reading from the csv file.

Can someone please help me to resolve this issue?

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Dennis Murphy

Hi:

The smooth is not going to replicate the quantile estimates you get
from the 'boxplots'; the smooth is estimating a conditional mean using
loess, with confidence limits associated with uncertainty in the
estimate of the conditional mean function, which are almost certainly
going to be narrower than the corresponding quantiles of the data
distributions.  If you want to mimic the behavior in the 'boxplots', I
would save the information from them into a data frame with columns
for each quantile, assign variable names to the quantiles, melt the
corresponding data frame so that the quantile names become factor
levels (with whatever variable is used to distinguish the 'boxplots'
as the ID variable in melt()), and then use ggplot2 or lattice to plot
the corresponding sets of lines.

Here's an example:

library('plyr')
library('reshape')

# Toy data frame
dd - data.frame(year = rep(2000:2008, each = 500), y = rnorm(4500))

# Function to compute quantiles and return a data frame
g - function(d) {
   qq - as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95
   names(qq) - paste('Q', c(5, 25, 50, 75, 95), sep = '')
   qq   }

# Apply function to each year of data in dd:
qdf - ddply(dd, .(year), g)
# melt to produce a factor variable whose levels are quantiles
qdfm - melt(qdf, id = 'year')

# Use ggplot() to plot the boxplots and quantile lines:
ggplot() +
geom_boxplot(data = dd, aes(x = factor(year), y = y)) +
geom_line(data = qdfm, aes(x = factor(year), y = value,
   group = variable, colour = variable),
  size = 1) +
labs(x = 'Year', colour = 'Quantile')

The idea of superimposing the lines over the boxplots is to show that
the default method of quantile() corresponds to the quantile() method
used to generate boxplots in ggplot2.

Is that closer to what you're after? If you want, you can always use
geom_ribbon() to shade the areas between the lines and
scale_colour_manual() to manually specify the line colors. Using the
above example, here's one way, using the unmelted quantile data:

ggplot(qdf, aes(x = year, y = Q50)) +
geom_line(size = 2, color = 'navyblue') +
geom_ribbon(aes(ymin = Q25, ymax = Q75), fill = 'blue', alpha = 0.4) +
geom_ribbon(aes(ymin = Q5, ymax = Q25), fill = 'blue', alpha = 0.2) +
geom_ribbon(aes(ymin = Q75, ymax = Q95), fill = 'blue', alpha = 0.2) +
labs(x = 'Year', y = 'Y')

Dennis

On Tue, Oct 4, 2011 at 10:01 AM,  thomas.ad...@noaa.gov wrote:
 Hadley,

 Thanks for responding. No, not smoothed quantile regression. If you go here: 
 http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored 
 squares, you can see we have 'boxplots'. What I want to express is the 
 uncertainty as depicted in the example from my previous email where I can 
 specify the limits calculated for the 'boxplots' using  5%, 25%,75%, 95% 
 limits as we have with the 'boxplots'.

 Tom

 - Original Message -
 From: Hadley Wickham had...@rice.edu
 Date: Tuesday, October 4, 2011 10:23 am
 Subject: Re: [R] Question about ggplot2 and stat_smooth
 To: Thomas Adams thomas.ad...@noaa.gov
 Cc: R-help forum r-help@r-project.org


 On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov
 wrote:
   I'm interested in creating a graphic -like- this:
 
  c - ggplot(mtcars, aes(qsec, wt))
  c + geom_point() + stat_smooth(fill=blue, colour=darkblue,
 size=2, alpha
  = 0.2)
 
  but I need to show 2 sets of bands (with different shading) using
 5%, 25%,
  75%, 95% limits that I specify and where the heavy blue line is the
 median.
  I don't understand how to do this with ggplot2.

 Exactly what sort of limits do you want?  It sounds like maybe you are
 looking for smoothed quantile regression.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to subset() from data frame using specific rows


  I have a data frame called chemdata with this structure:


str(chemdata)

'data.frame':   14886 obs. of  4 variables:
 $ site: Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114 128 
124 2 3 3 ...
 $ sampdate: Date, format: 1996-12-27 1996-08-22 ...
 $ param   : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ quant   : num  0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

  I've looked in the R Cookbook and Dalgaard's intro book without finding a
way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
when subdsetting a data frame..

  I need to create subsets (as data frames) based on sites, but including
all sites on each stream. For example, using the initial site factor shown
above, I want a subset containing all data for sites BC-0.5, BC-1.
BC-2, BC-3, BC-4, BC-5, and BC-6.

Pointers appreciated,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Sarah Goslee

Hi Rich,

You can use something like this:

 testdata - c(A1, A2, A3, B1, B2, B3)
 grep(^A, testdata)
[1] 1 2 3
 grepl(^A, testdata)
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Sarah

On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I have a data frame called chemdata with this structure:

 str(chemdata)

 'data.frame':   14886 obs. of  4 variables:
  $ site    : Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114
 128 124 2 3 3 ...
  $ sampdate: Date, format: 1996-12-27 1996-08-22 ...
  $ param   : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ quant   : num  0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

  I've looked in the R Cookbook and Dalgaard's intro book without finding a
 way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
 when subdsetting a data frame..

  I need to create subsets (as data frames) based on sites, but including
 all sites on each stream. For example, using the initial site factor shown
 above, I want a subset containing all data for sites BC-0.5, BC-1.
 BC-2, BC-3, BC-4, BC-5, and BC-6.

 Pointers appreciated,

 Rich


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows

This isn't going to be the most elegant, but it should work:

## Get the factors as characters

ff - as.character(chemdata$site)

## Identify those that match what you want
ff - grepl(ff, BC-)

now use this logical vector to subset

chemdata[ff, ]

Can't test, but should be good to go assuming that BC- entirely
identifies those sites you want. If you have other BC- things read
through the ?regex documentation and I think it describes how to do
selective wildcards

Michael

On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I have a data frame called chemdata with this structure:

 str(chemdata)

 'data.frame':   14886 obs. of  4 variables:
  $ site    : Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114
 128 124 2 3 3 ...
  $ sampdate: Date, format: 1996-12-27 1996-08-22 ...
  $ param   : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ quant   : num  0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

  I've looked in the R Cookbook and Dalgaard's intro book without finding a
 way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
 when subdsetting a data frame..

  I need to create subsets (as data frames) based on sites, but including
 all sites on each stream. For example, using the initial site factor shown
 above, I want a subset containing all data for sites BC-0.5, BC-1.
 BC-2, BC-3, BC-4, BC-5, and BC-6.

 Pointers appreciated,

 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows


On Tue, 4 Oct 2011, Sarah Goslee wrote:


You can use something like this:


testdata - c(A1, A2, A3, B1, B2, B3)
grep(^A, testdata)

[1] 1 2 3

grepl(^A, testdata)

[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE


Sarah,

  I don't see how this gives me a data frame containing only those sites I
specify. I want to plot by sites-within-streams specifying which param
factor to use.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Sarah Goslee

Hi Rich,

On Tue, Oct 4, 2011 at 2:58 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
 On Tue, 4 Oct 2011, Sarah Goslee wrote:

 You can use something like this:

 testdata - c(A1, A2, A3, B1, B2, B3)
 grep(^A, testdata)

 [1] 1 2 3

 grepl(^A, testdata)

 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

 Sarah,

  I don't see how this gives me a data frame containing only those sites I
 specify. I want to plot by sites-within-streams specifying which param
 factor to use.


You asked for pointers, and didn't provide a reproducible example, so
I offered a
pointer.

If you have a logical vector that specifies whether to include or omit
a row, you
can use that to subset your data frame.

sitesToUse - grepl(firstsite, mydata$mysitenames)
dataframeForThatSite - mydata[sitesToUse, ]

If you want real worked results, you'll need to provide a reproducible example
of your own.

Sarah
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows


On Tue, 4 Oct 2011, R. Michael Weylandt wrote:


This isn't going to be the most elegant, but it should work:
## Get the factors as characters
ff - as.character(chemdata$site)

## Identify those that match what you want

ff - grepl(ff, BC-)


Michael,

  Apparently grep works differently in R than it does on the command line:

bf - grep(ff, BC-)
Warning message:
In grep(ff, BC-) :
  argument 'pattern' has length  1 and only the first element will be used

  I understand what you suggest but it does not appear to work for me.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Jeff Newmiller

?grep
?names
Use indexing by name [, namevector]
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Rich Shepard rshep...@appl-ecosys.com wrote:

I have a data frame called chemdata with this structure:

 str(chemdata)
'data.frame':   14886 obs. of 4 variables:
$ site : Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114 128 124 2 
3 3 ...
$ sampdate: Date, format: 1996-12-27 1996-08-22 ...
$ param : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
$ quant : num 0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

I've looked in the R Cookbook and Dalgaard's intro book without finding a
way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
when subdsetting a data frame..

I need to create subsets (as data frames) based on sites, but including
all sites on each stream. For example, using the initial site factor shown
above, I want a subset containing all data for sites BC-0.5, BC-1.
BC-2, BC-3, BC-4, BC-5, and BC-6.

Pointers appreciated,

Rich

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding a dummy variable...

2011-10-04 Thread baptiste auguie

Hi,

Using ddply,

ddply(df, .(ID), mutate, nrows=length(rel.head), test = nrows==2 
all(rel.head %in% c(1,3)))

HTH,

baptiste


On 5 October 2011 06:02, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Here's another way to do it with the plyr package, also not terribly
 elegant. It assumes that rel.head is a factor in your original data
 frame:
 str(df)
 'data.frame':   11 obs. of  2 variables:
  $ ID      : Factor w/ 6 levels 17100,17101,..: 1 1 2 3 4 4 5 5 5 6 ...
  $ rel.head: Factor w/ 3 levels 1,2,3: 1 3 1 1 1 2 1 2 3 1 ...

 If this is not the case in your data, then you need to modify the
 function f below accordingly. (This is why use of dput() is preferred
 when sending example data to R-help, BTW.)

 library('plyr')
 f - function(d) {
    tvec - factor(c(1, 3), levels = 1:3)   # target vector
    if(nrow(d) != 2L) {d$dummy - rep(0, nrow(d)); return(d)}
    # If the first if statement is FALSE, then the following code is run:
       d$dummy - ifelse(!identical(d[, 2], tvec), 0, 1)
       d
   }

 ddply(df, .(ID), f)

      ID rel.head dummy
 1  17100        1     1
 2  17100        3     1
 3  17101        1     0
 4  17102        1     0
 5  17103        1     0
 6  17103        2     0
 7  17104        1     0
 8  17104        2     0
 9  17104        3     0
 10 17105        1     1
 11 17105        3     1

 HTH,
 Dennis

 On Tue, Oct 4, 2011 at 8:44 AM,  gra...@stat.columbia.edu wrote:
 Hi all,

 I have a dataset of individuals where the variable ID corresponds to the
 identification of the household where the individual lives. rel.head stands
 for the relationship with the household head. so rel.head=1 is the household
 head, rel.head=2 is the spouse, rel.head=3 is the children.

 Here is an example to see how it looks like:

 df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103, 17103,
                     17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


 I want to add a dummy variable that is equal to 1 when these conditions
 held simultaneously :

 a) the number of rows with same ID is equal to 2
 b) the variable rel.head=1 and rel.head=3


 So my ideal output is:

   ID      rel.head   added.dummy
 1  17100        1           1
 2  17100        3           1
 3  17101        1           0
 4  17102        1           0
 5  17103        1           0
 6  17103        2           0
 7  17104        1           0
 8  17104        2           0
 9  17104        3           0
 10 17105        1           1
 11 17105        3           1

 Is there a simple way to do that?
 Can somebody help?

 Thanks in advance,
 Grazia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows

... and, as an aside, if you had simply searched within R for (the
obvious?!)

??wildcard

you would have received the suggestion for glob2rx() in utils, which
actually would have enabled you to use a familiar wildcard expression.
However, the answers you've already received are simpler and more
straightforward.

-- Bert

On Tue, Oct 4, 2011 at 12:03 PM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi Rich,

 On Tue, Oct 4, 2011 at 2:58 PM, Rich Shepard rshep...@appl-ecosys.com
 wrote:
  On Tue, 4 Oct 2011, Sarah Goslee wrote:
 
  You can use something like this:
 
  testdata - c(A1, A2, A3, B1, B2, B3)
  grep(^A, testdata)
 
  [1] 1 2 3
 
  grepl(^A, testdata)
 
  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
 
  Sarah,
 
   I don't see how this gives me a data frame containing only those sites I
  specify. I want to plot by sites-within-streams specifying which param
  factor to use.


 You asked for pointers, and didn't provide a reproducible example, so
 I offered a
 pointer.

 If you have a logical vector that specifies whether to include or omit
 a row, you
 can use that to subset your data frame.

 sitesToUse - grepl(firstsite, mydata$mysitenames)
 dataframeForThatSite - mydata[sitesToUse, ]

 If you want real worked results, you'll need to provide a reproducible
 example
 of your own.

 Sarah
 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows

No, that was just a typo on my end:

the correct order of arguments should have been

ff - grepl(BC-, ff)

On Tue, Oct 4, 2011 at 3:07 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
 On Tue, 4 Oct 2011, R. Michael Weylandt wrote:

 This isn't going to be the most elegant, but it should work:
 ## Get the factors as characters
 ff - as.character(chemdata$site)

 ## Identify those that match what you want

 ff - grepl(ff, BC-)

 Michael,

  Apparently grep works differently in R than it does on the command line:

 bf - grep(ff, BC-)
 Warning message:
 In grep(ff, BC-) :
  argument 'pattern' has length  1 and only the first element will be used

  I understand what you suggest but it does not appear to work for me.

 Thanks,

 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Hadley Wickham

 # Function to compute quantiles and return a data frame
 g - function(d) {
   qq - as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95
   names(qq) - paste('Q', c(5, 25, 50, 75, 95), sep = '')
   qq   }

You could cut out the melt step by making this return a data frame:

g - function(df, qs = c(.05, .25, .50, .75, .95)) {
  data.frame(q = qs, quantile(d$y, qs))
}

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tinn-R

2011-10-04 Thread Charles McClure

I am new to R and have recently tried Tinn-R with very mixed and unexpected
results.  Can you point me to a Tinn-R tutorial on the web or a decent
reference book?

Thank you for your help;

Charles McClure
cmccl...@atrcorp.com
cfmccl...@verizon.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading stopwords from a csv file

2011-10-04 Thread vioravis

The following for loops does the work but it takes a good 30 minutes to run:

for(i in 1:length(myStopwords))
{
  currentWord - myStopwords[i]
  tr1=tm_map(tr1,removeWords,currentWord)
}

Are there any faster alternatives?? Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] F-values in nested designs

2011-10-04 Thread Marcus Nunes

Hello all

I'm trying to learn how to fit a nested model in R. I found a toy
example on internet where a dataset that have 3 areas and 4 sites
within these areas. When I use Minitab to fit a nested model to this
data, this is the ANOVA table that I got:

Nested ANOVA: y versus areas, sites

Analysis of Variance for y
Source  DFSS   MS  F  P
areas24.5000   2.2500  0.158  0.856
sites9  128.2500  14.2500  3.167  0.012
Error   24  108.   4.5000
Total   35  240.7500

When I use R, this is the ANOVA table that I got:

summary(aov(y ~ areas + Error(areas%in%sites)))

Error: areas:sites
  Df Sum Sq Mean Sq F value Pr(F)
areas  2   4.502.25  0.1579 0.8563
Residuals  9 128.25   14.25

Error: Within
  Df Sum Sq Mean Sq F value Pr(F)
Residuals 24108 4.5
Warning message:
In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular

The results are the same, except for one F-value and I don't
understand why. Hence, these are my questions:

1) I searched google and I can't find a reason to have this warning in
my code. Why is this happening?

2) why I don't have an F-value for the nested effect? I realize that R
call it as Residuals in the first part of the summary, but there is a
way to make R consider it s another factor?

INB4: if I have a nested design with treatment A and treatment B
within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I
make R give these values directly, without further coding?

Thanks for your help.

Below is my code and information about my system.
--
y = c(10, 12, 8, 13, 14, 8, 10, 12, 9, 10, 12, 11, 11, 13, 9, 10, 14,
11, 10, 9, 8, 9, 8, 8, 13, 14, 7, 10, 10, 13, 9, 7, 16, 12, 5, 4)
areas = as.factor(rep(c(m1, m2, m3), each=12))
#sites = as.factor(c(rep(c(1, 2, 3, 4), 3), rep(c(5, 6, 7, 8), 3),
rep(c(9, 10, 11, 12), 3)))
sites = as.factor(c(rep(c(1, 2, 3, 4), 9)))
repl  = as.factor(rep(c(1, 2, 3), each=4, 3))

summary(aov(y ~ areas + Error(areas%in%sites)))

summary(aov(y ~ areas + Error(areas%in%sites)))
Error: areas:sites
          Df Sum Sq Mean Sq F value Pr(F)
areas      2   4.50    2.25  0.1579 0.8563
Residuals  9 128.25   14.25
Error: Within
          Df Sum Sq Mean Sq F value Pr(F)
Residuals 24    108     4.5
Warning message:
In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular



sessionInfo()
R version 2.13.1 Patched (2011-08-25 r56798)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] car_2.0-11 survival_2.36-9nnet_7.3-1
[4] MASS_7.3-14lme4_0.999375-40   Matrix_0.999375-50
[7] lattice_0.19-33nlme_3.1-102

loaded via a namespace (and not attached):
[1] grid_2.13.1   stats4_2.13.1 tools_2.13.1
--
Marcus Nunes
marcus.nu...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question about sort and BH

On Mon, Oct 3, 2011 at 10:08 PM, chunjiang he camel...@gmail.com wrote:
 Hi,

 I have two questions want to ask.

 1. If I have a matrix like this, and I want to figure out the rows whose
 value in the 3rd column are less than 0.05. How can I do it with R.
 hsa-let-7a--MBTD1    0.528239197    2.41E-05
 hsa-let-7a--APOBEC1    0.507869409    5.51E-05
 hsa-let-7a--PAPOLA    0.470451884    0.000221774
 hsa-let-7a--NF2    0.469280186    0.000231065
 hsa-let-7a--SLC17A5    0.454597978    0.000381713
 hsa-let-7a--THOC2    0.447714054    0.000479322
 hsa-let-7a--SMG7    0.444972282    0.000524129


Suppose your data is d: then try which(d[,3]  0.05)

 2. I got the p.adjust.R from R source. In the method BH, I am not clear
 with the code:
           i - lp:1L


# Just the same as seq(lp, 1 , by = -1)


           o - order(p, decreasing = TRUE)
           ro - order(o)
           pmin(1, cummin( n / i * p[o] ))[ro]

# pmin does parallel minimums, p[o] is the same as sort(p) and
ordering by [ro] puts the outputted values in reverse order than the
went in.

As an exercise, I'd suggest you get the original paper, see how the
calculation is done there, implement it in R as best you can, even if
it seems loop-y, and refine it down to R Core's implementation. One of
the best ways I know to learn to think vectorwise.

Sorry I can't help more, but I don't know the method so I dont want to
read too much into the code and say something that I havent thought
through (Lord knows I do that enough on this list!!)

Michael




 How to explain the first and the fourth row.
 p.adjust.R===
 p.adjust.methods -
    c(holm, hochberg, hommel, bonferroni, BH, BY, fdr, none)
 p.adjust - function(p, method = p.adjust.methods, n = length(p))
 {
    ## Methods 'Hommel', 'BH', 'BY' and speed improvements contributed by
    ## Gordon Smyth sm...@wehi.edu.au.
    method - match.arg(method)
    if(method == fdr) method - BH # back compatibility
    nm - names(p)
    p - as.numeric(p); names(p) - nm
    p0 - p
    if(all(nna - !is.na(p))) nna - TRUE
    p - p[nna]
    lp - length(p)
    stopifnot(n = lp)
    if (n = 1) return(p0)
    if (n == 2  method == hommel) method - hochberg
    p0[nna] -
  switch(method,
        bonferroni = pmin(1, n * p),
        holm = {
     i - seq_len(lp)
     o - order(p)
     ro - order(o)
     pmin(1, cummax( (n - i + 1L) * p[o] ))[ro]
        },
        hommel = { ## needs n-1 = 2 in for() below
     if(n  lp) p - c(p, rep.int(1, n-lp))
     i - seq_len(n)
     o - order(p)
     p - p[o]
     ro - order(o)
     q - pa - rep.int( min(n*p/i), n)
     for (j in (n-1):2) {
         ij - seq_len(n-j+1)
         i2 - (n-j+2):n
         q1 - min(j*p[i2]/(2:j))
         q[ij] - pmin(j*p[ij], q1)
         q[i2] - q[n-j+1]
         pa - pmax(pa,q)
     }
     pmax(pa,p)[if(lp  n) ro[1:lp] else ro]
        },
        hochberg = {
     i - lp:1L
     o - order(p, decreasing = TRUE)
     ro - order(o)
     pmin(1, cummin( (n - i + 1L) * p[o] ))[ro]
        },
        BH = {
     i - lp:1L
     o - order(p, decreasing = TRUE)
     ro - order(o)
     pmin(1, cummin( n / i * p[o] ))[ro]
        },
        BY = {
     i - lp:1L
     o - order(p, decreasing = TRUE)
     ro - order(o)
     q - sum(1L/(1L:n))
     pmin(1, cummin(q * n / i * p[o]))[ro]
        },
        none = p)
    p0
 }
 


 I wrote a code to do my work in BH correction like the following:

 rm(list=ls())
 a-read.csv(test.txt,sep=\t,header=F,quote=)
 b-a[order(a[,3],decreasing=TRUE),]
 c-p.adjust(b[,3],method=BH)
 b[,4]-c
 write.table(b,zz.txt,sep=\t)

 Is that right? Thanks for all.

 Jiang

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows


On Tue, 4 Oct 2011, Sarah Goslee wrote:


You asked for pointers, and didn't provide a reproducible example, so I
offered a pointer.


Sarah,

  I did not realize that your pointer was to the factor component of the
subset() command.

  I think the most parsimonious thing for me to do is to modify the database
table with a new column of the full stream name, then re-export and re-read
into R.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to subset() from data frame using specific rows


On Tue, 4 Oct 2011, R. Michael Weylandt wrote:


No, that was just a typo on my end:
the correct order of arguments should have been
ff - grepl(BC-, ff)


Michael,

  Thank you.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] joining tables

2011-10-04 Thread Jose Bustos Melo

Hello everyone,

I know this is very basic question for you people. I'm working with mani 
diferent tables, but everyone has the same variables. (V1, V2, V3). The only 
think that I need to do is to put together this tables. In other words, 
creating just one big table with all the cases showed in the smaller tables. 
For example:

tabla1-data.frame(v1,v2,v3)
tabla2-data.frame(v1,v2,v3)
tabla3-data.frame(v1,v2,v3)
tabla4-data.frame(v1,v2,v3)

Just want to join it together in just one table. By the way, are more that 3 
Millon cases.
Thank you in advance!
José
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] distance coefficient for amatrix with ngative valus

You are, of course, entirely correct and, once again, I tip my hat to
the erudition of those who comment on this list. My initial
formulation, for a distance on a normed space inherited from the norm,
stands trivially, but as you rightly point out, I'm excluding many
interesting and possibly useful norms.

Follies of youth and all that

Michael

On Tue, Oct 4, 2011 at 2:06 AM, Rolf Turner rolf.tur...@xtra.co.nz wrote:
 On 04/10/11 17:05, R. Michael Weylandt wrote:

 SNIP

 More importantly, as I said in my initial response, any distance
 metric worth its salt is translation invariant.

 SNIP

 Point of order, Mr. Chairman.  (This is really *toadally* off topic;
 my apologies, but I couldn't resist --- I trained as a pure mathematician).

 A *metric* need not in general be translation invariant.  Indeed a metric
 need not be defined on a space in which translation makes any sense.

 A metric defined in terms of a *norm* (on a normed vector space)  by
 rho(x,y) = ||x - y|| is of course by definition translation invariant, and
 that's
 what most of us think in terms of.

 But there are perfectly ``reasonable''  metrics, defined on vector spaces,
 which are not translation invariant.  Whether these are ``worth their salt''
 is I suppose a matter of taste.  (You should pardon the expression. :-) )

 A simple e.g. of a non-translation-invariant metric is

    rho(x,y) = |x - y|/(1 + |x| + |y|)

 (defined on the real line).  It is easily checked that rho(.,.) satisfies
 the
 four conditions that a metric must satisfy.  (Exercise for the interested
 reader.)

 Note that rho(1,2) = 1/4  but rho(2,3) = 1/6, ergo not translation
 invariant.

    cheers,

        Rolf Turner


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] joining tables