Re: [R] distance coefficient for amatrix with ngative valus

2011-10-04 Thread Rolf Turner

On 04/10/11 17:05, R. Michael Weylandt wrote:

SNIP

More importantly, as I said in my initial response, any distance
metric worth its salt is translation invariant.


SNIP

Point of order, Mr. Chairman.  (This is really *toadally* off topic;
my apologies, but I couldn't resist --- I trained as a pure mathematician).

A *metric* need not in general be translation invariant.  Indeed a metric
need not be defined on a space in which translation makes any sense.

A metric defined in terms of a *norm* (on a normed vector space)  by
rho(x,y) = ||x - y|| is of course by definition translation invariant, 
and that's

what most of us think in terms of.

But there are perfectly ``reasonable''  metrics, defined on vector spaces,
which are not translation invariant.  Whether these are ``worth their salt''
is I suppose a matter of taste.  (You should pardon the expression. :-) )

A simple e.g. of a non-translation-invariant metric is

rho(x,y) = |x - y|/(1 + |x| + |y|)

(defined on the real line).  It is easily checked that rho(.,.) 
satisfies the

four conditions that a metric must satisfy.  (Exercise for the interested
reader.)

Note that rho(1,2) = 1/4  but rho(2,3) = 1/6, ergo not translation 
invariant.


cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating Venn diagram with 6 sets

2011-10-04 Thread peter dalgaard

On Oct 3, 2011, at 21:25 , Mao Jianfeng wrote:

 Dear Peter,
 
 I am glad to hearing your reply. That is really nice. Thanks a lot.
 
 
 ###
 # (1) the problem of the plot venneuler generated me is sets (A,B,C,D,E,F) 
 should shared 69604 elements.
 #  But, it illustrated nothing for me for this 6 sets sharing.
 
 
  But, vennerable can not be installed on my Mac book.
 
 Works for me. What are the symptoms?
 
 
 ##
 # (2) I compiled vennerable package, and then installed in my R-2.13.0. But 
 the plot can only generated 5 sets, and looks not good. 
 
 I have not saved the codes I tested. Could you please show me your codes? or 
 just show me the plot you generated.


You said that it couldn't be installed, I just tried installing it (from the 
R-forge binary). I did not attempt to solve your problem.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix/Vector manipulation

2011-10-04 Thread fernando.cabrera
Stylish, but ifelse only includes a cumsum less or equal than v and ignores the 
remainder, if v does not fit equally in say the first two weight buckets.
 
 R  - c(1.2, 1.3, 1.5)
 W - c(3,2,5)
 my_cumsum(4, R, W) # should take 3*1.2 + 1*1.3
[1] 4.0 
 sum(ifelse(cumsum(W) = 4, W, 0) * R) # ignores the 1*1.3 part because 3+2  4
[1] 3.6

Cheers,
Fer

-Original Message-
From: David Reiner [mailto:david.rei...@xrtrading.com] 
Sent: 3. oktober 2011 17:57
To: Cabrera, Fernando Álvarez; r-help@r-project.org
Subject: RE: [R] Matrix/Vector manipulation

sum(ifelse(cumsum(W)=v, W, 0) * R)

HTH,
David L. Reiner


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of fernando.cabr...@nordea.com
Sent: Monday, October 03, 2011 9:50 AM
To: r-help@r-project.org
Subject: [SPAM] - [R] Matrix/Vector manipulation - Bayesian Filter detected spam

Hi guys,

Have the following problem computing vectors with pure vector algebra and end 
up reverting to recursion or for-looping.

Function my_cumsum calculates a weighted average (W) of ratios (R), but only up 
to the given size/volume (v). Now I recurse into the vector (from left to 
right) with what you have left from the difference of volume minus current 
weight, and stop when the difference is less than or equal to the current 
weight.

Vectors W and R have the same length, and v is always a positive integer.

W: {w_1 w_2 .. w_m}
R: {r_1 r_2 .. r_m}

my_cumsum - function(v, R, W) {
if (v = W[1]) # check the head
v*R[1]
else
W[1]*R[1] + my_cumsum(v - W[1], R[2:length(R)], W[2:length(W)]) 
# recurse the tail }

Any help is greatly appreciated!

Fernando Alvarez

Great ideas originate in the muscles. ~ Thomas A. Edison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail and any materials attached hereto, including, without limitation, 
all content hereof and thereof (collectively, XR Content) are confidential 
and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are 
protected by intellectual property laws.  Without the prior written consent of 
XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
reproduced or otherwise used by anyone other than current employees of XR or 
its affiliates, on behalf of XR or its affiliates.

THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY 
KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE 
FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, 
DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS 
AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR 
INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do i put two scatterplots on same graph

2011-10-04 Thread Daniel Malter
?plot
?points

You will probably need to get some R basics down as to how to index certain
subsets of your data. This you find in any introductory R manual.

HTH,
Daniel

--
View this message in context: 
http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870074.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to make ARFIMA forecast by using r?

2011-10-04 Thread normah
please help..
I have estimate the value of parameter for AR,MA and fractional d.but I have
problem on having the right command for forecasting ARFIMA model.please
help..

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-make-ARFIMA-forecast-by-using-r-tp3869928p3869928.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xts/time-series and plot questions...

2011-10-04 Thread Douglas Philips
On 2011 Oct 3, at 4:55 PM, Joshua Ulrich wrote:
 ts requires a time-based index, so there's no way to make an index
 year-free.  What you can do, is split the xts object into years,
 convert all the index values to have the same year, and merge them
 together.

Ah... of course. I had read the vignette for xts, but didn't notice the
indexing capabilities. Thank you.

 ded below converts the index values to
 a specific year.  A month-free solution would be similar.  I'd also
 recommend using plot.zoo for more complex graphs.

Great! I was able to plot the comparative data quite easily with that code.

You mentioned month-free solution being similar, but I am not sure about that 
part.
Specifically, one thing I want to do is compare data by month, but I also want 
to see
that comparison preserve the day of the week. For example, 2011/10/01 is a 
Saturday, but
2010/10/01 is a Friday. When I see both on the same plot, I want to see them 
lined up
by days of week, rather than have the first day for each at the same place on 
the graph.
(I'm not sure I'm communicating this well, I hope this is making sense).

Thank you again!
   --Doug

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how do i put two scatterplots on same graph

2011-10-04 Thread jricci
Have two sets of scatterplot data
hypothetically  
a) stem lenght vs number of petals in red flowers
b) stem lenght vs number of petals in white flowers

want to place on same scatter plot with same x,y axis but different collored
markers

How do I do this in R

--
View this message in context: 
http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] handling constant factors in prediction using svm

2011-10-04 Thread Divyam
Hi users!

I am fitting a model with several factor variables as independents using
svm. since there are lots of categorical variables,the training and test
data sets have been created using dummy.data.frame option from dummies
package. I have a factor A in the training data set with 2 levels (0,1).In
the test set, this factor A has only 1 level (1) and hence when applying
dummy.data.frame, the variable gets dropped(and that's how i want it too).
The problem comes when I am trying to predict the test data as an error is
thrown saying A0 object is not found. Is there anyway  to solve this
problem?

Thanks
Divya

--
View this message in context: 
http://r.789695.n4.nabble.com/handling-constant-factors-in-prediction-using-svm-tp3870093p3870093.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Daniel Malter
I have not read the manual, but I drew 1 random normal vectors and 1
random Poisson vectors of length 1 and was unable to reproduce this
behavior. Can you provide an example (self-contained code) that reproduces
this problem?

Thanks,
Daniel


Jeanne M. Spicer wrote:
 
 The summary function behaves inconsistently with data frame columns, e.g.
 
 summary(rock)   #max of area 12212, correct
 summary(rock$area)  #max of area 12210, incorrect max
 
 I know that  
 summary(rock$area, digits=5)  
 will correct the error (I DID read the manual). But my point is the
 inconsistency, because I get the correct answer without having to add the
 digits option in the first statement when referring to the full dataframe.
 This is one of the first functions that beginners use and if they have to
 RTM and tinker with options before they can get a consistent value for the
 max of an integer column, it is off-putting to say the least. At worst it
 confirms the skeptic's suspicion that open-source software is a bit flaky. 
 Would it be out of line to report this to r-bugs -- at least to improve on
 the documentation?  
 
 -jms
 r2.13.1 maclion
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

--
View this message in context: 
http://r.789695.n4.nabble.com/inconsistent-behavior-of-summary-function-tp3869906p3870106.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to do a merge in R

2011-10-04 Thread Rainer Schuermann
 Any comments are very welcome,
So I give it a shot, although I don't have answers but only some ideas which 
avenues I would explore, not being an 
expert at all:

1. I would try to be more restrictive with the columns used for merge, trying 
something like
m1 - merge( x, y, by.x = V1, by.y = V1, all = TRUE )

2. It may be an option to use match() directly:
indices - match( y$V1, x$V1 )
That should give you a vector of 300,000 indices mapping the y values to their 
corresponding x records. I assume that 
there is always one record in y matching one record in x. You would still need 
to write some code to add the 
corresponding y values to a new column in x.

3. If that fails, and nobody else has a better idea, I would consider using a 
database engine for the job.

Again, no expert advice, just a few ideas!

Rgds,
Rainer


On Tuesday 04 October 2011 01:01:45 Aurélien PHILIPPOT wrote:
 Dear all,
 I am new in R and I have been faced with the following problem, that slows
 me down a lot.  I am short of ideas to circumvent it. So, any help would be
 highly appreciated:
 
 I have 2 dataframes x and y.  x is very big (70 million observations),
 whereas y is smaller (30 observations).
 All the observations of y are present in x. But y has one additional
 variable that I would like to incorporate to the dataframe x.
 
 For instance, imagine they have the following variable names:
 colnames(x)- c(V1, V2, V3, V4) and colnames(y)- c(V1, V2,
 V5)
 
 -Since the observations of y are present in x, my strategy was to merge x
 and y so that the dataframe x would get the values of the variable V5 for
 the observations that are both in x and y.
 
 -So, I did the following:
 dat- merge(x, y, all=TRUE).
 
 On a small example, it works fine. The only problem is that when I apply it
 to my big dataframe x, it really take for ever (several days and not done
 yet) and I have a very  fast computer. So, I don't know whether I should
 stop now or keep on waiting.
 
 Does anyone have any idea to perform this operation in a more efficient way
 (in terms of computation time)?
 In addition, does anyone know how to incoporate some sort of counter in a
 program to check what how much work has been done at a given point of time?
 
 Any comments are very welcome,
 Thanks,
 
 Best,
 Aurelien
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shapefile kriging

2011-10-04 Thread Leynnard Rey Matillano
I'm new to R and I'm working on point shapefiles. Is there a way that you could 
interpolate a shapefile via kriging in R using an attribute? All examples on 
the internet are using txt files and CSVs. Thanks a lot.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix/Vector manipulation

2011-10-04 Thread fernando.cabrera
Correction to my previous mail: my_cumsum(4,R,W) does not return 4.0, it 
returns 4.9!

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of fernando.cabr...@nordea.com
Sent: 4. oktober 2011 08:37
To: r-help@r-project.org
Subject: Re: [R] Matrix/Vector manipulation

Stylish, but ifelse only includes a cumsum less or equal than v and ignores the 
remainder, if v does not fit equally in say the first two weight buckets.
 
 R  - c(1.2, 1.3, 1.5)
 W - c(3,2,5)
 my_cumsum(4, R, W) # should take 3*1.2 + 1*1.3
[1] 4.0 
 sum(ifelse(cumsum(W) = 4, W, 0) * R) # ignores the 1*1.3 part because 3+2  4
[1] 3.6

Cheers,
Fer

-Original Message-
From: David Reiner [mailto:david.rei...@xrtrading.com] 
Sent: 3. oktober 2011 17:57
To: Cabrera, Fernando Álvarez; r-help@r-project.org
Subject: RE: [R] Matrix/Vector manipulation

sum(ifelse(cumsum(W)=v, W, 0) * R)

HTH,
David L. Reiner


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of fernando.cabr...@nordea.com
Sent: Monday, October 03, 2011 9:50 AM
To: r-help@r-project.org
Subject: [SPAM] - [R] Matrix/Vector manipulation - Bayesian Filter detected spam

Hi guys,

Have the following problem computing vectors with pure vector algebra and end 
up reverting to recursion or for-looping.

Function my_cumsum calculates a weighted average (W) of ratios (R), but only up 
to the given size/volume (v). Now I recurse into the vector (from left to 
right) with what you have left from the difference of volume minus current 
weight, and stop when the difference is less than or equal to the current 
weight.

Vectors W and R have the same length, and v is always a positive integer.

W: {w_1 w_2 .. w_m}
R: {r_1 r_2 .. r_m}

my_cumsum - function(v, R, W) {
if (v = W[1]) # check the head
v*R[1]
else
W[1]*R[1] + my_cumsum(v - W[1], R[2:length(R)], W[2:length(W)]) 
# recurse the tail }

Any help is greatly appreciated!

Fernando Alvarez

Great ideas originate in the muscles. ~ Thomas A. Edison

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail and any materials attached hereto, including, without limitation, 
all content hereof and thereof (collectively, XR Content) are confidential 
and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are 
protected by intellectual property laws.  Without the prior written consent of 
XR, the XR Content may not (i) be disclosed to any third party or (ii) be 
reproduced or otherwise used by anyone other than current employees of XR or 
its affiliates, on behalf of XR or its affiliates.

THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY 
KIND.  TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY 
DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR 
CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE 
FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, 
DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS 
AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR 
INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to do a merge in R

2011-10-04 Thread Joshua Wiley
On Tue, Oct 4, 2011 at 12:40 AM, Rainer Schuermann
rainer.schuerm...@gmx.net wrote:
 Any comments are very welcome,
 So I give it a shot, although I don't have answers but only some ideas which 
 avenues I would explore, not being an
 expert at all:

 1. I would try to be more restrictive with the columns used for merge, trying 
 something like
 m1 - merge( x, y, by.x = V1, by.y = V1, all = TRUE )

 2. It may be an option to use match() directly:
 indices - match( y$V1, x$V1 )
 That should give you a vector of 300,000 indices mapping the y values to 
 their corresponding x records. I assume that
 there is always one record in y matching one record in x. You would still 
 need to write some code to add the
 corresponding y values to a new column in x.

I think this idea is a good one (though even match could be slow with
70 million observations).  I believe related to the extraction and
assignment methods for data frames, some extra copies of data end up
being made (at least this is my understanding, experts may correct
me), so I would consider possibly using a list (you lose the builtin
data frame checking that all variables are of the same length (same
number of rows), but I think it makes it faster to work with.  If you
know the indices in x where the y values should go and the class of y
(say numeric) then:
tmp - vector(numeric, 7000)
tmp[indices] - y$V5
x$V5 - tmp
rm(tmp)
gc()
and you're done.  Takes less than a minute to run on my little laptop
(8GB RAM, 1.6ghz dual core, only slightly faster than a netbook).


 3. If that fails, and nobody else has a better idea, I would consider using a 
 database engine for the job.

Not a bad idea for working with large datasets either.


 Again, no expert advice, just a few ideas!

 Rgds,
 Rainer


 On Tuesday 04 October 2011 01:01:45 Aurélien PHILIPPOT wrote:
 Dear all,
 I am new in R and I have been faced with the following problem, that slows
 me down a lot.  I am short of ideas to circumvent it. So, any help would be
 highly appreciated:

 I have 2 dataframes x and y.  x is very big (70 million observations),
 whereas y is smaller (30 observations).
 All the observations of y are present in x. But y has one additional
 variable that I would like to incorporate to the dataframe x.

 For instance, imagine they have the following variable names:
 colnames(x)- c(V1, V2, V3, V4) and colnames(y)- c(V1, V2,
 V5)

 -Since the observations of y are present in x, my strategy was to merge x
 and y so that the dataframe x would get the values of the variable V5 for
 the observations that are both in x and y.

 -So, I did the following:
 dat- merge(x, y, all=TRUE).

 On a small example, it works fine. The only problem is that when I apply it
 to my big dataframe x, it really take for ever (several days and not done
 yet) and I have a very  fast computer. So, I don't know whether I should
 stop now or keep on waiting.

 Does anyone have any idea to perform this operation in a more efficient way
 (in terms of computation time)?
 In addition, does anyone know how to incoporate some sort of counter in a
 program to check what how much work has been done at a given point of time?

 Any comments are very welcome,
 Thanks,

 Best,
 Aurelien

       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Rolf Turner

On 04/10/11 19:58, Daniel Malter wrote:

I have not read the manual, but I drew 1 random normal vectors and 1
random Poisson vectors of length 1 and was unable to reproduce this
behavior. Can you provide an example (self-contained code) that reproduces
this problem?

The OP *did* provide a reproducible example.  The rock data are
a built-in data set.  See ?rock.

Also the OP is correct!

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to do a merge in R

2011-10-04 Thread Matthew Dowle

Joshua Wiley jwiley.ps...@gmail.com wrote in message 
news:canz9z_kopuwkzb-zxr96pvulhhf2znxntxso9xnyho-_jum...@mail.gmail.com...
 On Tue, Oct 4, 2011 at 12:40 AM, Rainer Schuermann
 rainer.schuerm...@gmx.net wrote:
 Any comments are very welcome,

 3. If that fails, and nobody else has a better idea, I would consider 
 using a database engine for the job.

 Not a bad idea for working with large datasets either.

or, the data.table package
http://datatable.r-forge.r-project.org/

Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot install.packages(data.table)

2011-10-04 Thread Matthew Dowle
Assuming you can install other packages ok, data.table depends on
R =2.12.0. Which version of R do you have?

_If_ that's the problem, does anyone know if anything prevents
R's error message from stating which dependency isn't satisfied? I think
I've seen users confused by this before, for other packages too.

Matthew

Emmanuel Mayssat emays...@gmail.com wrote in message 
news:cacb6zmctdrjkbftqrw+tv2owptrkgwytc_-hvvtguzwu9gq...@mail.gmail.com...
Hello,

I am new at R.
I am trying to see if R can work for me.
I need to do database like lookup (select * from table where
name=='toto') and work with matrix (transpose, add columns, remove
rows, etc).
It seems that the data.table package can help.
http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table

I installed R and ...

 install.packages(data.table)
Warning in install.packages(data.table) :
  argument 'lib' is missing: using '/usr/local/lib/R/site-library'
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package ‘data.table’ is not available

 install.packages()
doesn't show the package.

where can I find it?

--
Emmanuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shapefile kriging

2011-10-04 Thread Paul Hiemstra
On 10/04/2011 07:48 AM, Leynnard Rey Matillano wrote:
 I'm new to R and I'm working on point shapefiles. Is there a way that you 
 could interpolate a shapefile via kriging in R using an attribute? All 
 examples on the internet are using txt files and CSVs. Thanks a lot.
   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Hi,

Kriging is never done on a txt file or a csv file, but on an R object.
For gstat this is a SpatialPointsDataFrame and for geoR this is another
type of object. CVS files, txt files (what is the difference?) and
shapefiles can all be read into SpatialPointsDataFrame's. For reading
shapefiles, see the rgdal package.

Paul

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do i put two scatterplots on same graph

2011-10-04 Thread Paul Hiemstra
On 10/04/2011 06:19 AM, jricci wrote:
 Have two sets of scatterplot data
 hypothetically  
 a) stem lenght vs number of petals in red flowers
 b) stem lenght vs number of petals in white flowers

 want to place on same scatter plot with same x,y axis but different collored
 markers

 How do I do this in R

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Hi,

You could take a look at the ggplot2 package.

good luck,
Paul

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] matrix of chi-square results for all combinations of data frame

2011-10-04 Thread christiaan pauw
Hi everybody

I have a questionnaire with a lot of questions that allow for more than one
option to be chosen (like a tickbox in a html form). The data captured on a
mobile device and is supplied in a format where every option is a separate
variable (logical). I want to develop a generic function to process these
questions. As part of the analysis I want make a matrix of the p-value from
the Chi-sqaure test for all combinations of options for each question. I
tried to make a dataframe with all possible combinations and then use that
in a loop to get the p-values with chisq.test() . It works if I specify the
combination by hand but not in the loop. What am I doing wrong? (

I would appreciate any advice. Sample code below.
Thanks in advance
Christiaan

# Sample Code

# create test data

df=data.frame(x=sample(0:1,100,replace=TRUE),x.1=sample(0:1,100,replace=TRUE),
x.2=sample(0:1,100,replace=TRUE), x.3=sample(0:1,100,replace=TRUE))

# make a data frame of all possible combinations

grd=expand.grid(colnames(df),colnames(df))


# make vector of p values

pval - for (i in 1: length(grd[,1])){

chisq.test(df[,paste(grd$Var1[[i]])], df[,paste(grd$Var2[[i]])], correct =
TRUE)$p.value

}

# It works if I do i=3 and then chisq.test(df[,paste(grd$Var1[[i]])],
df[,paste(grd$Var2[[i]])], correct = TRUE)$p.value  Why does this not work
in the loop?

__
 sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
base

other attached packages:
[1] Hmisc_3.8-0 survival_2.35-8 prettyR_1.8-1

loaded via a namespace (and not attached):
[1] cluster_1.12.3 grid_2.11.1lattice_0.18-8 tools_2.11.1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The use of period in function names and variable names

2011-10-04 Thread S Ellison
See para 10.3.2 'Identifiers' in the R language definition (always distributed 
with R in the html help system), or ?make.names, for a concise statement of 
what constitutes a valid variable name in R.

It's actually underscores that might give trouble with older versions, not '.'. 
But they'd have to be a lot older by R standards (pre 1.9.0).

I am not sure why there has been a recent shift away from periods and towards 
camelCase in some R packages; personally I find a period or underscore much 
more useful for making a variable name readable. And a mix of camelCase and 
period.breaks makes it a lot harder to guess which case-sensitive string to 
use. The number of different combinations of case and period I end up trying 
for R.Version (occasionally used, never quite often enought to be automatic) 
defies belief ;-). 


S Ellison

 From: r-help-boun...@r-project.org On Behalf Of Smart Guy
 Sent: 04 October 2011 05:20
 To: r-help@r-project.org
 Subject: [R] The use of period in function names and variable names
 
 Hi,
  I am looking for some guidance on whether I can use the 
 period(.) in function names and variable names.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quasi-Binomial simulation

2011-10-04 Thread Ben Bolker
saber fallahpour s_fallahpour at yahoo.com writes:

 
 Hi
 I want to do simulation on quasi-binomial distribution with some covariates.
 Does anyone have an idea how to do that? 
 

  There is no such thing as a quasi-binomial distribution, but if you
parameterized the beta-binomial distribution appropriately I think it
would be straightforward to generate discrete data with a specified
maximum value, a mean that was specified by an inverse-link function
and a design matrix applied to the covariates, and had a variance proportional
(but not equal) to n*p*(1-p).  For comparison, you might want to look
up the negative binomial type I as defined by Hardin and Hilde, which
is quasi-Poisson in the same sense.

  See

  ?model.matrix
  ?plogis
  ?dbetabinom in the emdbook package (and probably elsewhere: install
the sos package and try findFn(beta-binomial)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matching two datasets and updating values

2011-10-04 Thread Vincy Pyne
Dear R forum

I have two datafarmes with category and cat_val forming one dataframe and cust 
and cust_category forming another dataframe.

category = c(C, D, B, A)
cat_val = c(0.10, 0.25, 0.40, 0.54)
cust = c(cust_1, cust_2, cust_3, cust_4, cust_5, cust_6, cust_7, 
cust_8, cust_9, cust_10)
cust_category = c(C, A, A, A, A, C, D, B, B, D)

Thus, I have 

 category
[1] C D B A

 cat_val
[1] 0.10 0.25 0.40 0.54

 cust
 [1] cust_1  cust_2  cust_3  cust_4  cust_5 
 [6] cust_6  cust_7  cust_8  cust_9  cust_10

 cust_category
 [1] C A A A A C D B B D

My problem is to match 'cust_category' with 'category' and accordingly selct 
the value assigned to this category value. In other words, 1st element of 
cust_category is C, so it should select the value 0.10, the second element is 
A, so it should assign value 0.54 against this. So effectively I should get

cust        cust_category  cat_val
cust_1    C                   0.10  
cust_2    A   0.54
cust_3    A   0.54

cust_10  D   0.25 


Kindly guide

Regards

Vincy


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matching two datasets and updating values

2011-10-04 Thread Petr PIKAL
Hi
 
 Dear R forum
 
 I have two datafarmes with category and cat_val forming one dataframe 
and 
 cust and cust_category forming another dataframe.
 
 category = c(C, D, B, A)
 cat_val = c(0.10, 0.25, 0.40, 0.54)
 cust = c(cust_1, cust_2, cust_3, cust_4, cust_5, cust_6, 
 cust_7, cust_8, cust_9, cust_10)
 cust_category = c(C, A, A, A, A, C, D, B, B, D)
 
 Thus, I have 
 
  category
 [1] C D B A
 
  cat_val
 [1] 0.10 0.25 0.40 0.54
 
  cust
  [1] cust_1  cust_2  cust_3  cust_4  cust_5 
  [6] cust_6  cust_7  cust_8  cust_9  cust_10
 
  cust_category
  [1] C A A A A C D B B D
 
 My problem is to match 'cust_category' with 'category' and accordingly 
 selct the value assigned to this category value. In other words, 1st 
 element of cust_category is C, so it should select the value 0.10, the 

 second element is A, so it should assign value 0.54 against this. So 
 effectively I should get

What about merge?

a-data.frame(category, cat_val)
b-data.frame(cust, cust_category)
merge(a,b, by.x=category, by.y=cust_category)
   category cat_valcust
1 A0.54  cust_3
2 A0.54  cust_4
3 A0.54  cust_5
4 A0.54  cust_2
5 B0.40  cust_8
6 B0.40  cust_9
7 C0.10  cust_1
8 C0.10  cust_6
9 D0.25  cust_7
10D0.25 cust_10


Regards
Petr



 
 custcust_category  cat_val
 cust_1C   0.10  
 cust_2A   0.54
 cust_3A   0.54
 
 cust_10  D   0.25 
 
 
 Kindly guide
 
 Regards
 
 Vincy
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing variable-length delimited strings into a matrix

2011-10-04 Thread jim holtman
Will this do it for you:

 x - readLines(textConnection(A,B,C
+ B,B
+ A,AA,C
+ A,B,BB,BBB,B,B))
 closeAllConnections()
 x.s - strsplit(x, ',')
 # determine max length
 x.max - max(sapply(x.s, length))
 # create character matrix
 x.mat - matrix(
+ sapply(x.s, function(a) c(a, rep(NA, x.max - length(a
+ , byrow = TRUE
+ , ncol = x.max
+ )


 x.mat
 [,1] [,2] [,3] [,4]  [,5] [,6]
[1,] A  B  C  NANA   NA
[2,] B  B  NA   NANA   NA
[3,] A  AA C  NANA   NA
[4,] A  B  BB BBB B  B



On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright bj...@well.ox.ac.uk wrote:

 I'm struggling to find a way of parsing a vector of data in this sort of form:

 A,B,C
 B,B
 A,AA,C
 A,B,BB,BBB,B,B

 into a matrix (or data frame). The catch is that I don't know a priori how 
 many entries there will be in each element, nor how many characters there 
 will be. strsplit(vec,,) gets me a list, but I can't find a way of turning 
 the list into a matrix. unlistlst) destroys the length data and 
 do.call(rbind, lst) fails because of the uneven lengths. It is possible to 
 go through the vector element by element, but that has proved too slow for my 
 purposes.

 Is there a reasonably quick method of achieving this in a vector-oriented way?

 Cheers,

 Ben

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] texi2dvi problem when compiling incorrect Latex code

2011-10-04 Thread syrvn
Hello,

I am working on a big R project using Eclipse/StatET/Texlipse. I'd like to
write a Latex document within that project but DO NOT want to Sweave it.
It's pure Latex. Via the external tools configurations I set up 2 different
versions to ensure that my latex document is processed correctly.

Version 1 (System Call):
library(tools)
setwd(${container_loc})
file = ${resource_loc:${source_file_path}}
try(system(paste(texi2pdf, shQuote(file)), intern=TRUE))


Version 2 (R Call):
library(tools)
setwd(${container_loc})
texi2dvi(file = ${resource_loc:${source_file_path}}, pdf = TRUE, quiet =
FALSE)


Both versions work well as long as there is no error in my latex code. As
soon as there is an error
the process of texi2pdf / texi2dvi is not finished as the programme waits
for user input (mostly just press enter key). The problem is that R
outputs the output only after the whole programme finished so I always end
up having to kill my R console.

Is there any workaround for that?

Syrvn



--
View this message in context: 
http://r.789695.n4.nabble.com/texi2dvi-problem-when-compiling-incorrect-Latex-code-tp3870827p3870827.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with .C

2011-10-04 Thread Grigory Alexandrovich

Hello,

I wrote a function in C, which works fine if called from the 
main-function in C.


But as soon as I try to call this function from R like .C('foo', 
as.double(x), as.integer(y)), the programm crashes.


I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and 
loaded it into R with dyn.load().


What can be the cause of such behaviour?
Again, the C-funcion itself works, but not if called from R.

Thanks
Grigory Alexandrovich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adonis and nmds help and questions for a novice.

2011-10-04 Thread Ashley Houlden
Hi,

forgive me if someone has already posted about this but I have had a look and 
cannot find the answer, also I am very new to R and been getting the grips with 
this.

I have been trying to use Adonis to find out if there are significant 
difference between groups on data that I have analyses with NMDS, and have been 
struggling with getting this to work and understanding what is going on.  I am 
looking at diversity in different soils with either woodland or grassland 
habitats.

I have run the scripts

library(vegan)
library(ecodist)
library(MASS)
mydata - read.table(ash_data.csv, header=TRUE, sep=,, row.names=Site)

envdata_fit - read.table(ash_env.csv, header=TRUE, sep=,, row.names=Site)

#distance matrix of samples using bray curtis
d= bcdist(mydata, rmzero=FALSE)

And then using the distance matrix from this to use for adoins? Is this correct.

With this I have then run Adonis

results = adonis(d ~ wood, envdata_fit, permutations = 1000)

and get significant values to see if sig diff in diversity between wood and 
grass habitat.

However I have been reading about combining the variables, but there seems to 
be different ways for example

results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000)

so get sig values for Wood and soil

or

results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000)

And I get sig values for wood, soil, and wood soil interaction.

This seems to make sense, however for both if I put the variable the other way 
around (soil+wood or soil*wood) I get very different sig values, even 
accounting for the fact they vary slightly due to the permutations. So whats is 
going on and why to the the values change so much?

I was also wondering in Adonis, can you nest treatments, so see effect of soil 
removing the effect of woodland as you can with anova?

Another general questions as well, if I have more than two groups in a 
treatment, say for soil, clay, sand, loam and do the stats, and I get a 
significant value, what does it actually mean, is it that soil generally has an 
effect, with each group separate, or there are general differences between 
soils which may be one group is very different to the other two?

Many many thanks to anyone who can help me as I have asked people who use R 
near me and no-one is sure and uses Adonis..

Ash


No virus found in this message.
Checked by AVG - www.avg.comhttp://www.avg.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating subsets and calculating weights

2011-10-04 Thread Samir Benzerfa
Dear all,

 

propably this question sounds stupid to you, but since I'm new to R I got
some troubles regarding the following issue (The table below does not
represent my real data, it is just a simplified example):

 

My intention is to first subdivide my data into several groups of vectors,
let's say for instance 3 groups, such that group 1={A,B}, group2={C,D},
group3={E,F}. How can I do this even for a much larger data table (about
3'000 columns)??

 

After that I'd like to calculate weights for each element in the table as
follows: weight(ij)=element(ij)/sum(elements in a row per group). 

 

So, for example:

For the first element of column A: 1/(1+2)=1/3

For the first element of column B: 2/(1+2)=2/3

For the second element of column A: 2/(2+3)=2/5

And so forth.

 

Table 1:

 

A B C D E F

1 2 3 4 5   9

2 3 4 5 6   8

3 4 5 6 7   7

4 5 6 7 8   6

5 6 7 8 9   5

 

 

I tried to do something like apply(Table1, 1, function(x)x/sum(x(i+1))) but
it returns an error (cannot find function x).

 

I would be truly grateful for any hints.

 

Many thanks in advance

Best, Sam

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About stepwise regression problem

2011-10-04 Thread pigpigmeow
First of all, I have GAMs 
noxd-gam(newNOX~pressure+maxtemp+s(avetemp,bs=cr)+s(mintemp,bs=cr)+s(RH,bs=cr)+s(solar,bs=cr)+s(windspeed,bs=cr)+s(transport,bs=cr),family=gaussian
(link=log),groupD,methods=REML)

Then  I type  summary(noxd). and show

Family: gaussian 
Link function: log 

Formula:
newNO2 ~ pressure + s(maxtemp, bs = cr) + s(avetemp, bs = cr) + 
s(mintemp, bs = cr) + RH + s(solar, bs = cr) + s(windspeed, 
bs = cr) + s(transport, bs = cr)

Parametric coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 2.721513   0.049108  55.419   2e-16 ***
pressure0.028988   0.019434   1.4920.140
RH  0.005228   0.009763   0.5350.594
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Approximate significance of smooth terms:
   edf Ref.df F p-value   
s(maxtemp)   6.346  7.276 1.223 0.29991   
s(avetemp)   1.000  1.000 0.226 0.63562   
s(mintemp)   1.908  2.396 1.066 0.35871   
s(solar) 3.797  4.490 2.164 0.07359 . 
s(windspeed) 5.305  6.341 2.346 0.03648 * 
s(transport) 7.234  7.984 2.807 0.00884 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

R-sq.(adj) =  0.307   Deviance explained = 49.1%
GCV score = 61.136  Scale est. = 44.49 n = 105

*I eliminate the greatest of p-value, that is s(avetemp) term then type
summary(no2d) and show
*

Family: gaussian 
Link function: log 

Formula:
newNO2 ~ pressure + s(maxtemp, bs = cr) + s(mintemp, bs = cr) + 
RH + s(solar, bs = cr) + s(windspeed, bs = cr) + s(transport, 
bs = cr)

Parametric coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept) 2.720973   0.048834  55.719   2e-16 ***
pressure0.031346   0.019040   1.6460.104
RH  0.006165   0.009583   0.6430.522
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Approximate significance of smooth terms:
   edf Ref.df F p-value   
s(maxtemp)   6.499  7.425 1.450  0.1942   
s(mintemp)   1.975  2.487 1.788  0.1655   
s(solar) 3.925  4.628 2.118  0.0770 . 
s(windspeed) 5.373  6.417 2.967  0.0101 * 
s(transport) 7.043  7.822 2.785  0.0097 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

R-sq.(adj) =  0.316   Deviance explained = 49.2%
GCV score = 59.746  Scale est. = 43.919n = 105
 


*I eliminate the greatest of p-value, that is RH term then type
summary(no2d) and show
*

Family: gaussian 
Link function: log 

Formula:
newNO2 ~ pressure + s(maxtemp, bs = cr) + s(mintemp, bs = cr) + 
s(solar, bs = cr) + s(windspeed, bs = cr) + s(transport, 
bs = cr)

Parametric coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.720010.04859  55.974   2e-16 ***
pressure 0.029780.01878   1.5860.117
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Approximate significance of smooth terms:
   edf Ref.df F p-value   
s(maxtemp)   6.544  7.468 1.654 0.12830   
s(mintemp)   1.952  2.460 1.697 0.18301   
s(solar) 3.977  4.686 2.869 0.02211 * 
s(windspeed) 5.381  6.425 2.641 0.01953 * 
s(transport) 7.052  7.830 3.348 0.00257 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

R-sq.(adj) =  0.321   Deviance explained =   49%
GCV score =  58.61  Scale est. = 43.591n = 105

I remove s(mintemp) term... until

Family: gaussian 
Link function: log 

Formula:
newNO2 ~ s(windspeed, bs = cr)

Parametric coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.781590.04701   59.16   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Approximate significance of smooth terms:
   edf Ref.dfF p-value  
s(windspeed) 1.775  2.251 4.54  0.0101 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

R-sq.(adj) =0.1   Deviance explained = 11.5%
GCV score = 59.348  Scale est. = 57.78 n = 105

I remain s(windspeed) term finally.my significant level = 0.05 I have a
question...

First, Does the backward elimation perform correctly?

Second, Is it possible run the process( backward elimation) automatically?

Third, I found the the linear part was listed Pr(|t|) and the smoothing
part  p-value. these two terms are the same meaning?

 


--
View this message in context: 
http://r.789695.n4.nabble.com/About-stepwise-regression-problem-tp3870217p3870217.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package.skeleton generates .env = environment

2011-10-04 Thread pedabreu
Hello,

i trying to create a package using package.skeleton. I use R.oo package to
create oriented-object classes. When i use package.skeleton, this creates
the following file:

classA -
structure(function()
   {
 
 extend(Object(),Class A,
.var1= NULL)
 
  
   }
, .env = environment, class = c(Class, Object), formals = c(public, 
class), modifiers = c(public, class))

Then i compile using R CMD build myPkg.

when i try to install.package and give this error:

  /tmp/RtmpaOZ7IQ/R.INSTALL412da433/JSSbase/R/GTHeuristic.R:7:10:
unexpected ''
6:}
7: , .env = 

why the package.skeleton creates .env = environment??


Thank you





--
View this message in context: 
http://r.789695.n4.nabble.com/package-skeleton-generates-env-environment-tp3870577p3870577.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Giant font on the R plots...

2011-10-04 Thread D.Emad
Hello,

I've been facing a really stupid problem... When I try to plot using
heatplot or hclust or any similar function, the labels of the x-axis - which
are the samples names - are giant  overlapping. I can't even read the
samples names!

I tried  cex.lab = 0.5, it helped only with the y axis and not the x-axis...
Any help please?!

--
View this message in context: 
http://r.789695.n4.nabble.com/Giant-font-on-the-R-plots-tp3870335p3870335.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rug plot curve reversal

2011-10-04 Thread Peter Minting

Dear R-help
Can anyone tell me why my curve appears the wrong way round on a rug plot?
I am using the same code as on pg 596 of the Crawley R-book.
mod-glm(mort~logBd,binomial)
par(mfrow=c(2,2))
xv-seq(0,8,0.01)
yv-predict(mod,list(logBd=xv),type=response)
plot(logBd,mort)
lines(xv,yv)
I've tried swapping xv and yv around but no luck.
Thanks,
Pete  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting a polygon with xyplot

2011-10-04 Thread markm0705
Dear R helpers

I would like to plot a string of points as a polygon in xyplot.  I'm a bit
lost as to how to get the points plotting in the correct order.  I would
also like some hints on how to render or fill the polygon.

Scrpt below and data file attached

Thanks

Markm

library(lattice)

# set size of the window
windows(height=7, width=10,rescale=c(fixed))

Data_poly- read.table(111004_Lode_Outlines.csv,header = TRUE,sep = ,,)

xyplot(z~y,
data=Data_poly,
type=l
) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv
111004_Lode_Outlines.csv 

--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-a-polygon-with-xyplot-tp3870788p3870788.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge two data frames and find common values and non-matching values

2011-10-04 Thread francesca casalino
Yes, your code did exactly what I needed.

Thank you!!
-f

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Import in R with White Spaces

2011-10-04 Thread francesca casalino
Ok I added quoting and it did work...Not sure why, but thank you for both
your replies!
-f

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correlation based on the attributes of vertices

2011-10-04 Thread Ali.Abbas
Dear all,
I have a directed graph - an Igraph to be more precise - which has some
vertices attributes (like dorm, year etc). Edges and the graph itself do not
have any attributes. Based on the attributes of the vertices, I'd like to
calculate correlation among the edges (e.g. how likely people of the same
dorm are connected?) for the whole graph. Also, I'd like to calculate
inter-attributes correlation for the whole graph (how correlated dorm and
year attributes are?)
Could you kindly tell how to go about it?
I thought of populating a list just like the graph edge list, and then
replacing each source and destination by its attribute value. For instance,
instead of the edge (0-1), I will replace it by (dorm_valueOf(0) -
dorm_valueOf(1)) and then run the function /cor/ over it. It does not seem
like a nice solution.
On a side note, how does one get source and destination out of an edge list
by the Edge Iterator? For example, I'd like to know the source and the
target vertices' index of the first edge, E(graph)[0]. How can I extract
this information?

Thanks!

Best Regards,
Ali
P.S. I have already posted the same question on the igraph mailing list. On
receiving no response from there, I am posting it over here.

--
View this message in context: 
http://r.789695.n4.nabble.com/Correlation-based-on-the-attributes-of-vertices-tp3870844p3870844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge two data frames and find common values and non-matching values

2011-10-04 Thread francesca casalino
Sorry---I thought it worked but I think I am actually definitely doing
something wrong...

The problem might be that there are NA's and there are also duplicated
values...My fault. I can't figure out what is going wrong...
I'll be more thorough and modify the two df to mirror more what I have to
explain better:

df1 is:

Name Position location
francesca A 75
maria A 75
cristina B 36

And df2 is:

location Country
75 UK
75 Italy
56 France
56 Austria

So I thought I had to first eliminate the duplicates like this:
df1_unique-subset(df1, !duplicated(location))
df2_unique-subset(df2, !duplicated(location))

After doing this I get:

df1 :

Name Position location
francesca A 75
cristina B 36

And df2:

location Country
75 UK
56 France

And I would like to match on Location and the output to tell me which
records are matching in df1 and not in df2, the ones matching in both, and
the ones which are in df2 but are not matching in df1...

Name Position Location Match
francesca A 75 1
cristina B 36 0

As William suggested,


df12 - merge(df1, cbind(df2, fromDF2=TRUE), all.x=TRUE, by=location)
df12$Match - !is.na(df12$fromDF2)
new_common- new[which(new$Match==TRUE),]

Would give me the records that are matching, which should be correct, but I
am not getting the correct value for the non-shared elements (the variants
that are in the df2 but not indf1):
df2_only - subset(df1_unique, !(location %in% df2_unique))
df2_only- df2_unique[-which(df2_unique$location %in% df1_unique$location),]


Neither of these work and give me wrong records...
My questions are:

1. How do I calculate the records from df2 which are NOT in df1?
2.Do I need to eliminate the duplictaes (or is there a way to record where
they came from)?

Any help is very appreciated...
THANK YOU very much!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Bert Gunter
You are right, but this is difficult or impossible to really solve.

The problem is that summary() is an S3 generic(?UseMethod) -- so essentially
it can mean anything and do anything depending on the structure to which
it's applied. In your case, the structures were a data frame and a vector
(that it was a column of the data frame is irrelevant) and, as you noted,
different options were used for the two functions. But it could be -- and
probably does get -- much worse than that.

The ability to dispatch different methods from a single generic call based
on the structure of the object to which a function is applied is generally
viewed as a positive feature of OO languages (of which native R has some
features). But nothing's perfect.

-- Bert

On Mon, Oct 3, 2011 at 8:12 PM, Jeanne M. Spicer xn8spi...@gmail.comwrote:

 The summary function behaves inconsistently with data frame columns, e.g.

 summary(rock)   #max of area 12212, correct
 summary(rock$area)  #max of area 12210, incorrect max

 I know that
 summary(rock$area, digits=5)
 will correct the error (I DID read the manual). But my point is the
 inconsistency, because I get the correct answer without having to add the
 digits option in the first statement when referring to the full dataframe.
 This is one of the first functions that beginners use and if they have to
 RTM and tinker with options before they can get a consistent value for the
 max of an integer column, it is off-putting to say the least. At worst it
 confirms the skeptic's suspicion that open-source software is a bit flaky.
  Would it be out of line to report this to r-bugs -- at least to improve on
 the documentation?

 -jms
 r2.13.1 maclion


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting a polygon with xyplot

2011-10-04 Thread Ken Knoblauch
markm0705 markm0705 at gmail.com writes:
 I would like to plot a string of points as a polygon in xyplot.  I'm a bit
 lost as to how to get the points plotting in the correct order.  I would
 also like some hints on how to render or fill the polygon.
 
 Scrpt below and data file attached
 
 Thanks
 
 Markm
 
 library(lattice)
 
 # set size of the window
 windows(height=7, width=10,rescale=c(fixed))
 
 Data_poly- read.table(111004_Lode_Outlines.csv,header = TRUE,sep = ,,)
 
 xyplot(z~y,
   data=Data_poly,
   type=l
 ) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv
 111004_Lode_Outlines.csv 
 

Before you try this with lattice, you might spend some time
getting your abscissa values in an order that will plot the
contour in a sequential fashion.  It's not obvious how to
do this a priori.  Here is a simple-minded attempt after looking
at your graphic, just using base graphics.  Maybe, it will
be sufficient for you to tweak it a bit further for what you
want.

Data_poly- 
read.table(http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv;,
header = TRUE,sep = ,,)
par(mfrow = c(1, 2), pty = s)
plot(z ~ y, Data_poly, type = l)

fh - with(Data_poly, which(z  240))
D_poly - rbind(Data_poly[fh, ], Data_poly[-rev(fh), ])
D_poly - rbind(D_poly, Data_poly[1, ])

plot(z ~ y, D_poly, type = n)
with(D_poly, polygon(y, z, col = lightblue))

-- 
Ken Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package.skeleton generates .env = environment

2011-10-04 Thread Duncan Murdoch

On 04/10/2011 6:40 AM, pedabreu wrote:

Hello,

i trying to create a package using package.skeleton. I use R.oo package to
create oriented-object classes. When i use package.skeleton, this creates
the following file:

classA-
structure(function()
{

  extend(Object(),Class A,
 .var1= NULL)


}
, .env =environment, class = c(Class, Object), formals = c(public,
class), modifiers = c(public, class))

Then i compile using R CMD build myPkg.

when i try to install.package and give this error:

  /tmp/RtmpaOZ7IQ/R.INSTALL412da433/JSSbase/R/GTHeuristic.R:7:10:
unexpected ''
6:}
7: , .env =

why the package.skeleton creates .env =environment??


package.skeleton tries to deparse your code, but in some cases, that 
can't be done.  As ?deparse says, However, not all objects are 
deparse-able even with this option and a warning will be issued if the 
function recognizes that it is being asked to do the impossible.


What you need to do is to copy your original source code that created 
classA into the package source.  Presumably it uses some functions from 
R.oo to construct the object properly.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Hadley Wickham
On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov wrote:
  I'm interested in creating a graphic -like- this:

 c - ggplot(mtcars, aes(qsec, wt))
 c + geom_point() + stat_smooth(fill=blue, colour=darkblue, size=2, alpha
 = 0.2)

 but I need to show 2 sets of bands (with different shading) using 5%, 25%,
 75%, 95% limits that I specify and where the heavy blue line is the median.
 I don't understand how to do this with ggplot2.

Exactly what sort of limits do you want?  It sounds like maybe you are
looking for smoothed quantile regression.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding multiple gates/filters in densityplot

2011-10-04 Thread Michael Jahn
Hi R-Users,

I posted this question a while ago on the bioconductor mailing list but got no 
answers. Maybe here is somebody who might know a solution:

I failed at drawing multiple filters in a densityplot() using the 
FlowCore/FlowViz packages.
I
 found a way to draw multiple filters in xyplot(), using the glpolygon 
method within the panel-function, but some similar attempts for 
densityplot failed.
I could draw simply some vertical lines using 
panel.abline, but this doesn't look as appealing as the original method 
when using a single filter with the standard filter=xyz argument.
I 
bet there is a method to draw multiple gates through the panel-function,
 as curv1filter can also identify multiple peaks automatically and 
draw them into a densityplot...


This script works for  xyplot but not for densitylot:

    library(flowCore)
    library(flowViz)


    data(GvHD)
    Filter1        -    rectangleGate(filterId=Filter1, FSC-H = c(0, 200))
    Filter2        -    rectangleGate(filterId=Filter1, FSC-H = c(300, 
400))


    xyplot( `SSC-H` ~ `FSC-H` , data=GvHD[[1]],
        panel = function(...) { 
            panel.xyplot.flowset(...)
           glpolygon( Filter1 )
           glpolygon( Filter2 )
        }
    )
    

    densityplot( ~ `FSC-H`, data=GvHD[[1]],
        panel = function(...) { 
            panel.densityplot.flowset(...)
            glpolygon( Filter1 )
            glpolygon( Filter2 )
        }
    )

The glpolygon method yields not the typical look of the densityplot filters, 
but red lined gate boundaries. The desired look of the filter is a lighter 
color and dotted lines as limits.
Thank you in advance!

All the best,
Michael


--
Michael Jahn
PhD student
Helmholtz-Centre for Environmental Research
Leipzig, Germany
http://www.ufz.de(http://www.ufz.de/)


(http://www.ufz.de/)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting a polygon with xyplot

2011-10-04 Thread Bert Gunter
?? Use the appropriate panel function, not panel.xyplot().
If you don't know what this means, you need to read up on lattice/trellis
graphics.

?panel.polygon

-- Bert

On Tue, Oct 4, 2011 at 7:14 AM, Ken Knoblauch ken.knobla...@inserm.frwrote:

 markm0705 markm0705 at gmail.com writes:
  I would like to plot a string of points as a polygon in xyplot.  I'm a
 bit
  lost as to how to get the points plotting in the correct order.  I would
  also like some hints on how to render or fill the polygon.
 
  Scrpt below and data file attached
 
  Thanks
 
  Markm
 
  library(lattice)
 
  # set size of the window
  windows(height=7, width=10,rescale=c(fixed))
 
  Data_poly- read.table(111004_Lode_Outlines.csv,header = TRUE,sep =
 ,,)
 
  xyplot(z~y,
data=Data_poly,
type=l
  ) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv
  111004_Lode_Outlines.csv
 

 Before you try this with lattice, you might spend some time
 getting your abscissa values in an order that will plot the
 contour in a sequential fashion.  It's not obvious how to
 do this a priori.  Here is a simple-minded attempt after looking
 at your graphic, just using base graphics.  Maybe, it will
 be sufficient for you to tweak it a bit further for what you
 want.

 Data_poly-
 read.table(
 http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv;,
 header = TRUE,sep = ,,)
 par(mfrow = c(1, 2), pty = s)
 plot(z ~ y, Data_poly, type = l)

 fh - with(Data_poly, which(z  240))
 D_poly - rbind(Data_poly[fh, ], Data_poly[-rev(fh), ])
 D_poly - rbind(D_poly, Data_poly[1, ])

 plot(z ~ y, D_poly, type = n)
 with(D_poly, polygon(y, z, col = lightblue))

 --
 Ken Knoblauch
 Inserm U846
 Stem-cell and Brain Research Institute
 Department of Integrative Neurosciences
 18 avenue du Doyen Lépine
 69500 Bron
 France
 tel: +33 (0)4 72 91 34 77
 fax: +33 (0)4 72 91 34 61
 portable: +33 (0)6 84 10 64 10
 http://www.sbri.fr/members/kenneth-knoblauch.html

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about linear mixed effects model (nlme)

2011-10-04 Thread Panagiotis
Hi,

I applied a linear mixed effect model in my data using the nlme package.
lme2-lme(distance~temperature*condition, random=~+1|trial, data) and then
anova. 
I want to ask if it is posible to get the least squares means for the
interaction effect and the corresponding 95%ci. And then plot this values.

Thank you 
Panagiotis

--
View this message in context: 
http://r.789695.n4.nabble.com/Question-about-linear-mixed-effects-model-nlme-tp3871203p3871203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] `partykit': A Toolkit for Recursive Partytioning

2011-10-04 Thread Torsten Hothorn


New package `partykit': A Toolkit for Recursive Partytioning

The purpose of the package is to provide a toolkit with infrastructure for
representing, summarizing, and visualizing tree-structured regression and
classification models. Thus, the focus is not on _inferring_ such a
tree structure from data but to _represent_ a given tree so that
printing/plotting and computing predictions can be performed in a
standardized  way. In particular, this unified infrastructure can be
used for reading/coercing tree models from different sources
(packages `rpart', `RWeka', `PMML') yielding objects that share
functionality for `print()', `plot()', and `predict()' methods.

The impatient users will hopefully have fun with

install.packages(partykit)
library(partykit)
library(rpart)
### from ?rpart
fit - rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
plot(as.party(fit))


Best,

Torsten  Achim

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Hadley Wickham
You need to set the labels...

Hadley

On Sat, Sep 24, 2011 at 3:49 AM, Casper Ti. Vector
caspervec...@gmail.com wrote:
 Is there any way to use expression() in legend labels with ggplot2?

 It seems that things like
 scale_shape_manual(value = c(
   x = expression(italic(x)),
   y = expression(italic(y))
 ))
 don't work.

 Thanks very much :)

 --
    Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134,
 valid from 2010 to 2013) from a key server.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about linear mixed effects model (nlme)

2011-10-04 Thread Bert Gunter
Below.

On Tue, Oct 4, 2011 at 7:34 AM, Panagiotis p...@hi.is wrote:

 Hi,

 I applied a linear mixed effect model in my data using the nlme package.
 lme2-lme(distance~temperature*condition, random=~+1|trial, data) and then
 anova.
 I want to ask if it is posible to get the least squares means for the
 interaction effect and the corresponding 95%ci. And then plot this values.


Uh-Oh. You may have unloosed The Wrath of Khan -- or at least of Venables.
(An explanation of this cryptic remark should follow from others, so please
do not ask me what it means if you do not know).
:-)

-- Bert



 Thank you
 Panagiotis

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Question-about-linear-mixed-effects-model-nlme-tp3871203p3871203.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how do i put two scatterplots on same graph

2011-10-04 Thread William Revelle


If the data are from one data.frame (e.g., the iris data set), then simply 
label the red and white flowers with different colors:
e.g.,

with the iris data set

plot(iris$Sepal.Length,iris$Sepal.Width,col=c(red,blue,black)[iris$Species],pch=c(16:18)[iris$Species])

Bill




On Oct 4, 2011, at 4:20 AM, Paul Hiemstra wrote:

 On 10/04/2011 06:19 AM, jricci wrote:
 Have two sets of scatterplot data
 hypothetically  
 a) stem lenght vs number of petals in red flowers
 b) stem lenght vs number of petals in white flowers
 
 want to place on same scatter plot with same x,y axis but different collored
 markers
 
 How do I do this in R
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Hi,
 
 You could take a look at the ggplot2 package.
 
 good luck,
 Paul
 
 -- 
 Paul Hiemstra, Ph.D.
 Global Climate Division
 Royal Netherlands Meteorological Institute (KNMI)
 Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
 P.O. Box 201 | 3730 AE | De Bilt
 tel: +31 30 2206 494
 
 http://intamap.geo.uu.nl/~paul
 http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

William Revellehttp://personality-project.org/revelle.html
Professor  http://personality-project.org
Department of Psychology   http://www.wcas.northwestern.edu/psych/
Northwestern Universityhttp://www.northwestern.edu/
Use R for psychology http://personality-project.org/r
It is 6 minutes to midnighthttp://www.thebulletin.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Bert Gunter
On Tue, Oct 4, 2011 at 7:42 AM, Jeanne M. Spicer xn8spi...@gmail.comwrote:

 I'm not sure how returning an incorrect result is ever a 'positive' feature



It is **not** incorrect; perhaps unexpected, but that is not the same.


 but at least the documentation could more clearly warn users that this
 method behaves differently in these cases -- summary(rock[,1]) vs
 summary(rock[,1:2]) -- and that the method can and *does* return incorrect
 results without any warning messages.


What is (in)adequate in documentation is often in the mind of the beholder.

Note:
 class(rock[,1])
[1] integer

 class(rock[,1:2])
[1] data.frame

This means that different methods are dispatched, leading to the different
results. Morever,
 summary(rock[,1,drop=FALSE])
  area
 Min.   : 1016
 1st Qu.: 5305
 Median : 7487
 Mean   : 7188
 3rd Qu.: 8870
 Max.   :12212

... and that is because
 class(rock[,1,drop=FALSE])
[1] data.frame

So the relevant Help file is ?[.data.frame





 I would encourage anyone teaching introductory R to look at the 'epicalc'
 package.  The re-vamped function 'summ' in that package returns correct
 results regardless - summ(rock), summ(rock$area).  In addition, when you
 only ask for one column you not only get the correct results, you also get a
 bonus distribution plot.

 I'd would like all of our students to use R, but little things like this
 are huge stumbling blocks for them.


I have no doubt that this is true. R is powerful, flexible and, as an
inevitable result, complex. To master it, honest effort is required,
probably a somewhat scarce commodity in introductory classes, especially for
non-statisticians. For that reason, there are numerous learning resources
available, to be found on CRAN. Have you looked at them? Moreover,there are
several R GUI's that attempt to shield the beginner from the initial shock,
to be found in the R-GUIs link under Other Projects. Have you considered
those?

So I think something more than righteous indignation is called for here.
Nevertheless, the bottom line is that you get what you pay for: R **IS**
hard -- but for many serious data analysts of all stripes, worth the effort.

Cheers,
Bert



 -jeanne





-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] file input with readLines

2011-10-04 Thread Uwe Ligges



On 03.10.2011 19:19, Cable, Sam B Civ USAF AFMC AFRL/RVBXI wrote:

I am using readLines to read a fairly large ASCII file.  readLines reads
a fixed number of lines, then other R code processes the data, then
readLines reads the same number of lines again, then other R code
processes the data, then 



Sort of like:



conn-file('filename','r')

for (chunk in 1:10) {

Lines-readLines(conn,n=25)

   # process Lines

}



The code is working, but I notice that it slows down greatly as time
progresses.  It took 2 seconds to read my first chunk of data, 4 seconds
to read the next chunk, 10 after that.  The quasi-exponential trend has
slowed, thank goodness, but after about a hundred reads, the read time
for the next chunk is over a minute.  Let me stress that the number of
lines read in each chunk of data is absolutely fixed.



The only processing I am doing at the point is to parse the new data,
and rbind the results to an existing data frame.


And that's may be the interesting point.
Have you tried to allocate the whole data.frame and assign into it 
later? It is probbaly not readLines() slowing you down.
A minute seems to be quite a lot for resonable sized data. How many 
columns are we talking about?.


Uwe Ligges





 Processing of new data
in no way depends on earlier data.



So, my question is why is the reading taking longer as time goes on?  Is
there a way to fix this?  Is there a better method than readLines?



Thanks.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding a dummy variable...

2011-10-04 Thread grazia
Hi all,

I have a dataset of individuals where the variable ID corresponds to the
identification of the household where the individual lives. rel.head stands
for the relationship with the household head. so rel.head=1 is the household
head, rel.head=2 is the spouse, rel.head=3 is the children.

Here is an example to see how it looks like:

df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103, 17103,
 17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


I want to add a dummy variable that is equal to 1 when these conditions
held simultaneously :

a) the number of rows with same ID is equal to 2
b) the variable rel.head=1 and rel.head=3


So my ideal output is:

   ID  rel.head   added.dummy
1  171001   1
2  171003   1
3  171011   0
4  171021   0
5  171031   0
6  171032   0
7  171041   0
8  171042   0
9  171043   0
10 171051   1
11 171053   1

Is there a simple way to do that?
Can somebody help?

Thanks in advance,
Grazia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installation from local Compiled directory

2011-10-04 Thread Uwe Ligges



On 03.10.2011 18:16, Sandeep Patil wrote:

Hello everyone

I have manually compiled directory of gstat in a particular folder of my
Unix system.
I want to install this and am unable to use either of the following two
commands

1. R CMD INSTALL
2. Install.packages





If this is a precompiled (i.e. binary) package produced for this R 
version and this OS, then the magic is to just copy the directory into 
your library.


best,
Uwe Ligges



I do not understand how to coax above commands to locate the directory that
i have
compiled.

Please understand that i have solved a number of related issues concerning
this
installation and it is a special case where

1. I cannot use CRAN mirror to download and install
2. Install from TAR file

Essentially this is the only option i have.

Thank you

Sandeep

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [Workshop] Finance with R

2011-10-04 Thread Peter Ruckdeschel
The Financial Mathematics department of Fraunhofer ITWM 
is offering a two-days workshop on Finance with R:

%-
[Workshop] Finance with R
%-

Oct 20, 2011, 10:00-17:00 and
Oct 21, 2011,  9:00-16:00

Fraunhofer ITWM, Fraunhofer-Platz 1, 67663 Kaiserslautern,
Germany

%-
Scope and purpose
%-

This workshop provides an introduction to R for professionals 
and academics in Finance. 

It gives an insight into possibilities of data analysis and 
statistics with R, import of data sets, generation of graphics 
and preparation of reports, according to their relevance 
in Finance.

Besides providing insight into financial modeling in R, in 
particular we demonstrate the use of the Rmetrics family 
of R packages as well as an R bridge to the Quantlib library. 
We also cover integration of R into Excel, interaction with 
Matlab, and import from Bloomberg.

%
Benefits of attending
%

The workshop provides insight into statistical models and 
concepts in R which are useful for various problems arising 
in Finance. 

The attendees will be able to import datasets into R, analyze 
them statistically and apply concepts from time series modeling. 

In practical sessions, the attendees will learn and practice 
how to use R.

The fee for the workshop is 500 EUR.

For further details, see
http://www.itwm.fraunhofer.de/en/departments/financial-mathematics/events/2011-workshop-series.html

Peter Ruckdeschel

-- 
Dr. habil. Peter Ruckdeschel, Abteilung Finanzmathematik, F3.17
Fraunhofer ITWM, Fraunhofer Platz 1, 67663 Kaiserslautern
Telefon:  +49 631/31600-4699   Fax:  +49 631/31600-5699
E-Mail :  peter.ruckdesc...@itwm.fraunhofer.de
http://www.itwm.fraunhofer.de/abteilungen/finanzmathematik/mitarbeiterinnen/mitarbeiter/dr-peter-ruckdeschel.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] handling constant factors in prediction using svm

2011-10-04 Thread Uwe Ligges



On 04.10.2011 08:53, Divyam wrote:

Hi users!

I am fitting a model with several factor variables as independents using
svm. since there are lots of categorical variables,the training and test
data sets have been created using dummy.data.frame option from dummies
package. I have a factor A in the training data set with 2 levels (0,1).In
the test set, this factor A has only 1 level (1) and hence when applying
dummy.data.frame, the variable gets dropped(and that's how i want it too).
The problem comes when I am trying to predict the test data as an error is
thrown saying A0 object is not found. Is there anyway  to solve this
problem?


Errr, if you learned a model that predicts based on several variables, 
including A0, what do you expect what happens if A0 is not given? Well, 
you cannot predict. So if A0 is constant in your test cases, just supply it!


To simplify, consider a linear model y=bX+e. Now one column of X is 
missing for prediction. y will be undefined, obviously.


Uwe Ligges






Thanks
Divya

--
View this message in context: 
http://r.789695.n4.nabble.com/handling-constant-factors-in-prediction-using-svm-tp3870093p3870093.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The use of period in function names and variable names

2011-10-04 Thread jdospina
Hello.

Not at all in the way you have shown. Just to improve your code
readability, try to avoid naming your variables beginning with period
(example: .hello).

In contrast with Matlab (for example) the period in R is not to have access
to an object property.

--
View this message in context: 
http://r.789695.n4.nabble.com/The-use-of-period-in-function-names-and-variable-names-tp3869913p3871407.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Jeanne M. Spicer
I'm not sure how returning an incorrect result is ever a 'positive' feature but 
at least the documentation could more clearly warn users that this method 
behaves differently in these cases -- summary(rock[,1]) vs summary(rock[,1:2]) 
-- and that the method can and does return incorrect results without any 
warning messages.   

I would encourage anyone teaching introductory R to look at the 'epicalc' 
package.  The re-vamped function 'summ' in that package returns correct results 
regardless - summ(rock), summ(rock$area).  In addition, when you only ask for 
one column you not only get the correct results, you also get a bonus 
distribution plot.  

I'd would like all of our students to use R, but little things like this are 
huge stumbling blocks for them. 
-jeanne 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Casper Ti. Vector
Hmm, that's my fault when composing this mail, but the problem was
really encountered at that time.
Nevertheless, neither can I reproduce the problem now, perhaps I just
made another mistake at that time.
Thanks all the same, and sorry for the disturbance anyway :|

On Tue, Oct 04, 2011 at 10:10:56AM -0500, Hadley Wickham wrote:
 You need to set the labels...

-- 
Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134,
valid from 2010 to 2013) from a key server.



signature.asc
Description: Digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about linear mixed effects model (nlme)

2011-10-04 Thread Ben Bolker
Bert Gunter gunter.berton at gene.com writes:

 
 Below.
 
 On Tue, Oct 4, 2011 at 7:34 AM, Panagiotis pat2 at hi.is wrote:
 
  Hi,
 
  I applied a linear mixed effect model in my data using the nlme package.
  lme2-lme(distance~temperature*condition, random=~+1|trial, data) and then
  anova.
  I want to ask if it is posible to get the least squares means for the
  interaction effect and the corresponding 95%ci. And then plot this values.
 
 
 Uh-Oh. You may have unloosed The Wrath of Khan -- or at least of Venables.
 (An explanation of this cryptic remark should follow from others, so please
 do not ask me what it means if you do not know).
 

   You should probably ask (a version of) this question on the
r-sig-mixed-models list instead.
  What do you mean by the least squares means for the interaction effect?
How is it different from the estimate of the interaction parameter?
You can use the predict() function if you want to calculate predicted
values for any particular combination of predictors (you probably want
to specify level=0 to get the population-level effects).  Getting 'good'
confidence intervals for mixed-effect models is surprisingly difficult.
If you are willing to ignore the uncertainty of the among-trial variance,
you can use a modification of the recipe found at http://glmm.wikidot.com/faq

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: expression() in legend labels?

2011-10-04 Thread Dennis Murphy
Hi:

Here's a reproducible example:

d - data.frame(grp = factor(rep(c('x', 'y'), each = 5)),
 ev = rnorm(10), dv = rnorm(10))
labl - list(expression(italic('x')), expression(italic('y')))

ggplot(d, aes(x = ev, y = dv, shape = grp)) + geom_point() +
   scale_shape_manual('Group', breaks = levels(d$grp),
   values = 1:2,
   labels = labl)

HTH,
Dennis

On Tue, Oct 4, 2011 at 8:59 AM, Casper Ti. Vector
caspervec...@gmail.com wrote:
 Hmm, that's my fault when composing this mail, but the problem was
 really encountered at that time.
 Nevertheless, neither can I reproduce the problem now, perhaps I just
 made another mistake at that time.
 Thanks all the same, and sorry for the disturbance anyway :|

 On Tue, Oct 04, 2011 at 10:10:56AM -0500, Hadley Wickham wrote:
 You need to set the labels...

 --
    Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134,
 valid from 2010 to 2013) from a key server.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The use of period in function names and variable names

2011-10-04 Thread Steve Lianoglou
Hi,

On Tue, Oct 4, 2011 at 11:39 AM, jdospina jdosp...@gmail.com wrote:
 Hello.

 Not at all in the way you have shown. Just to improve your code
 readability, try to avoid naming your variables beginning with period
 (example: .hello).

Well, that's not exactly true.

It's common practice to name variables with a leading period if you
want them to be considered hidden, in some respect. See the
`all.names` argument to the `ls` function, for instance.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rug plot curve reversal

2011-10-04 Thread Uwe Ligges



On 04.10.2011 13:30, Peter Minting wrote:


Dear R-help
Can anyone tell me why my curve appears the wrong way round on a rug plot?
I am using the same code as on pg 596 of the Crawley R-book.



mod-glm(mort~logBd,binomial)


What is mort, what is logBd? I don't have access to the book. I have 
hidden it in my other office so that nobody can find it anymore.




par(mfrow=c(2,2))
xv-seq(0,8,0.01)
yv-predict(mod,list(logBd=xv),type=response)
plot(logBd,mort)
lines(xv,yv)
I've tried swapping xv and yv around but no luck.


Hopefully mort is a binary factor, i.e. with two levels. I that case 
they are at positions 1 and 2 on the y axis in plot().
yv is the reponse, i.e. is in the interval (0,1) if the binomial glm was 
successful. So a different scale.


So I guess
 lines(xv,yv+1)
could help.

Whatelse I think about The R Book can be found in my book review 
published in Statistical Papers.


Best,
Uwe Ligges







Thanks,
Pete
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The use of period in function names and variable names

2011-10-04 Thread Duncan Murdoch

On 04/10/2011 7:04 AM, S Ellison wrote:

See para 10.3.2 'Identifiers' in the R language definition (always distributed 
with R in the html help system), or ?make.names, for a concise statement of 
what constitutes a valid variable name in R.

It's actually underscores that might give trouble with older versions, not '.'. 
But they'd have to be a lot older by R standards (pre 1.9.0).

I am not sure why there has been a recent shift away from periods and towards 
camelCase in some R packages;


Presumably the authors of those packages prefer camelCase.  I don't 
think it's any more complicated than that.


Duncan Murdoch



personally I find a period or underscore much more useful for making a variable 
name readable. And a mix of camelCase and period.breaks makes it a lot harder 
to guess which case-sensitive string to use. The number of different 
combinations of case and period I end up trying for R.Version (occasionally 
used, never quite often enought to be automatic) defies belief ;-).


S Ellison

  From: r-help-boun...@r-project.org On Behalf Of Smart Guy
  Sent: 04 October 2011 05:20
  To: r-help@r-project.org
  Subject: [R] The use of period in function names and variable names

  Hi,
   I am looking for some guidance on whether I can use the
  period(.) in function names and variable names.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Uwe Ligges



On 04.10.2011 16:42, Jeanne M. Spicer wrote:

I'm not sure how returning an incorrect result is ever a 'positive' feature but 
at least the documentation could more clearly warn users that this method 
behaves differently in these cases -- summary(rock[,1]) vs summary(rock[,1:2]) 
-- and that the method can and does return incorrect results without any 
warning messages.



What are you talking about? Probably it appeared prior in this thread? 
Please always cite.


Anyway, I guess you werre looking for

summary(rock[,1, drop=FALSE])

rock[,1] is implified to a vector whle rock[,1:2] is still a matrix or 
data.frame (and since this is not cited, I do not know).




I would encourage anyone teaching introductory R to look at the 'epicalc' 
package.  The re-vamped function 'summ' in that package returns correct results 
regardless - summ(rock), summ(rock$area).  In addition, when you only ask for 
one column you not only get the correct results, you also get a bonus 
distribution plot.

I'd would like all of our students to use R, but little things like this are 
huge stumbling blocks for them.


Then you told them about summary() before telling how to deal with data 
structures correctly. And that is te m,ost important part in learning R. 
I know from my courses that applied people do not like that, but I 
always managed to convince them this is the most impoertant topic to 
learn about R.


Best,
Uwe Ligges




-jeanne



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The use of period in function names and variable names

2011-10-04 Thread Uwe Ligges



On 04.10.2011 18:18, Duncan Murdoch wrote:

On 04/10/2011 7:04 AM, S Ellison wrote:

See para 10.3.2 'Identifiers' in the R language definition (always
distributed with R in the html help system), or ?make.names, for a
concise statement of what constitutes a valid variable name in R.

It's actually underscores that might give trouble with older versions,
not '.'. But they'd have to be a lot older by R standards (pre 1.9.0).

I am not sure why there has been a recent shift away from periods and
towards camelCase in some R packages;


Presumably the authors of those packages prefer camelCase. I don't think
it's any more complicated than that.


I switched to that when I realized that it is somewhat dangerous to 
conflict with S3 naming conventions and R CMD check yelled correctly 
because I used a generic.class notation where either generic or 
class was really the name of a generic or class but I had not realized 
before.


Uwe





Duncan Murdoch



personally I find a period or underscore much more useful for making a
variable name readable. And a mix of camelCase and period.breaks makes
it a lot harder to guess which case-sensitive string to use. The
number of different combinations of case and period I end up trying
for R.Version (occasionally used, never quite often enought to be
automatic) defies belief ;-).


S Ellison

 From: r-help-boun...@r-project.org On Behalf Of Smart Guy
 Sent: 04 October 2011 05:20
 To: r-help@r-project.org
 Subject: [R] The use of period in function names and variable names

 Hi,
 I am looking for some guidance on whether I can use the
 period(.) in function names and variable names.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with .C

2011-10-04 Thread Uwe Ligges
Without knowing that C code, we cannot know. Have you read Writing R 
Extensions carefully? I.e. take care with memory allocation and printing 
as mentioned in the manual.


Uwe Ligges


On 04.10.2011 14:04, Grigory Alexandrovich wrote:

Hello,

I wrote a function in C, which works fine if called from the
main-function in C.

But as soon as I try to call this function from R like .C('foo',
as.double(x), as.integer(y)), the programm crashes.

I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and
loaded it into R with dyn.load().

What can be the cause of such behaviour?
Again, the C-funcion itself works, but not if called from R.

Thanks
Grigory Alexandrovich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding a dummy variable...

2011-10-04 Thread Martyn Byng
Hi,

I am sure there are better / more efficient ways of doing this, but the
following seems to work ...

ids - sapply(split(df,df$ID),function(x) {length(x$rel.head)==2  
any(x$rel.head==1)  any(x$rel.head==3)})
ids - as.numeric(names(ids)[ids])
added.dummy - as.numeric(df$ID%in%ids)
cbind(df,added.dummy)

Martyn

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of gra...@stat.columbia.edu
Sent: 04 October 2011 16:45
To: r-help@r-project.org
Subject: [R] adding a dummy variable...

Hi all,

I have a dataset of individuals where the variable ID corresponds to the
identification of the household where the individual lives. rel.head
stands
for the relationship with the household head. so rel.head=1 is the
household
head, rel.head=2 is the spouse, rel.head=3 is the children.

Here is an example to see how it looks like:

df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103,
17103,
 17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


I want to add a dummy variable that is equal to 1 when these conditions
held simultaneously :

a) the number of rows with same ID is equal to 2
b) the variable rel.head=1 and rel.head=3


So my ideal output is:

   ID  rel.head   added.dummy
1  171001   1
2  171003   1
3  171011   0
4  171021   0
5  171031   0
6  171032   0
7  171041   0
8  171042   0
9  171043   0
10 171051   1
11 171053   1

Is there a simple way to do that?
Can somebody help?

Thanks in advance,
Grazia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with .C

2011-10-04 Thread Jeff Newmiller
This looks like a classic case of not reading the manual, and then compounding 
it by not reading the posting guide. The manual would be the Writing R 
Extensions pdf that comes with R or you can google it. The posting guide is 
referenced at the bottom of this and every other posting on this mailing list.
There are nearly an infinite variety of errors that can lead to a crash, so 
it is really unreasonable of you to pose this question this way and expect 
constructive assistance.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Grigory Alexandrovich alexandrov...@mathematik.uni-marburg.de wrote:

Hello,

I wrote a function in C, which works fine if called from the 
main-function in C.

But as soon as I try to call this function from R like .C('foo', 
as.double(x), as.integer(y)), the programm crashes.

I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and 
loaded it into R with dyn.load().

What can be the cause of such behaviour?
Again, the C-funcion itself works, but not if called from R.

Thanks
Grigory Alexandrovich

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Giant font on the R plots...

2011-10-04 Thread Uwe Ligges



On 04.10.2011 11:10, D.Emad wrote:

Hello,

I've been facing a really stupid problem... When I try to plot using
heatplot or hclust or any similar function, the labels of the x-axis - which
are the samples names - are giant  overlapping. I can't even read the
samples names!


R heatplot
Error: object 'heatplot' not found

R hclust(dist(USArrests), ave)
# does not plot anything

So let m try

R plot(hclust(dist(USArrests), ave))
# no x axis

Do you mean the labels at the dendrogram? These are controlled by cex 
(rather than cex.lab).


Uwe Ligges




I tried  cex.lab = 0.5, it helped only with the y axis and not the x-axis...
Any help please?!

--
View this message in context: 
http://r.789695.n4.nabble.com/Giant-font-on-the-R-plots-tp3870335p3870335.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] number of analogs in significance test of MAT reconstructions using randomTF from palaeoSig

2011-10-04 Thread Jason Paul Joines
I'm trying to use the randomTF function from package palaeoSig to 
test the significance of a MAT reconstruction with nine analogs and a 
WA-PLS reconstruction with four components.  I'm probably missing 
something obvious here but how do I make sure that randomTF is testing 
the reconstruction based on the desired number of analogs / components?


In:
fitmap.wapls = WAPLS( lumapspc, lumap)
sig.wapls = randomTF( spp = sqrt( lumapspc ), env = lumapenv, fos = 
sqrt( hcspc ), n = 999, fun = WAPLS, col = 4 )
I assume col = 4 tells randomTF to test the reconstruction based 
on the four component WA-PLS model as that's what the documentation 
seems to indicate.


However, in:
fitmap.mat = MAT( lumapspc, lumap, dist.method = chord, k = 20 )
sig.mat = randomTF( spp = lumapspc, env = lumapenv, fos = hcspc, n = 
999, fun = MAT, col = 9 )
it seems that col = 9 does not tell randomTF to test the 
reconstruction based on the 9 analog MAT model.  If I give col a value 
other than one or two, I get a subscript out of bounds error.  So I 
assume the col argument in this case selects between the mean and 
weighted mean predictions.
If I pass additional arguments, k = 9 and dist.method = chord to 
randomTF, then the values of sig.mat$preds do not match the values 
obtained from:

predmap.mat = predict( fitmap.mat, hcspc, k = 9 )
Also, if I give randomTF a k value less than 5, I get the error k 
out of range.  So, passing k to randomTF must not be telling randomTF 
to use that number of analogs as I would not be able to select a four 
analog model.



Jason
===

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Thomas . Adams
Hadley,

Thanks for responding. No, not smoothed quantile regression. If you go here: 
http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored 
squares, you can see we have 'boxplots'. What I want to express is the 
uncertainty as depicted in the example from my previous email where I can 
specify the limits calculated for the 'boxplots' using  5%, 25%,75%, 95% limits 
as we have with the 'boxplots'.

Tom

- Original Message -
From: Hadley Wickham had...@rice.edu
Date: Tuesday, October 4, 2011 10:23 am
Subject: Re: [R] Question about ggplot2 and stat_smooth
To: Thomas Adams thomas.ad...@noaa.gov
Cc: R-help forum r-help@r-project.org


 On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov 
 wrote:
   I'm interested in creating a graphic -like- this:
 
  c - ggplot(mtcars, aes(qsec, wt))
  c + geom_point() + stat_smooth(fill=blue, colour=darkblue, 
 size=2, alpha
  = 0.2)
 
  but I need to show 2 sets of bands (with different shading) using 
 5%, 25%,
  75%, 95% limits that I specify and where the heavy blue line is the 
 median.
  I don't understand how to do this with ggplot2.
 
 Exactly what sort of limits do you want?  It sounds like maybe you are
 looking for smoothed quantile regression.
 
 Hadley
 
 -- 
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding a dummy variable...

2011-10-04 Thread Dennis Murphy
Hi:

Here's another way to do it with the plyr package, also not terribly
elegant. It assumes that rel.head is a factor in your original data
frame:
 str(df)
'data.frame':   11 obs. of  2 variables:
 $ ID  : Factor w/ 6 levels 17100,17101,..: 1 1 2 3 4 4 5 5 5 6 ...
 $ rel.head: Factor w/ 3 levels 1,2,3: 1 3 1 1 1 2 1 2 3 1 ...

If this is not the case in your data, then you need to modify the
function f below accordingly. (This is why use of dput() is preferred
when sending example data to R-help, BTW.)

library('plyr')
f - function(d) {
tvec - factor(c(1, 3), levels = 1:3)   # target vector
if(nrow(d) != 2L) {d$dummy - rep(0, nrow(d)); return(d)}
# If the first if statement is FALSE, then the following code is run:
   d$dummy - ifelse(!identical(d[, 2], tvec), 0, 1)
   d
   }

ddply(df, .(ID), f)

  ID rel.head dummy
1  171001 1
2  171003 1
3  171011 0
4  171021 0
5  171031 0
6  171032 0
7  171041 0
8  171042 0
9  171043 0
10 171051 1
11 171053 1

HTH,
Dennis

On Tue, Oct 4, 2011 at 8:44 AM,  gra...@stat.columbia.edu wrote:
 Hi all,

 I have a dataset of individuals where the variable ID corresponds to the
 identification of the household where the individual lives. rel.head stands
 for the relationship with the household head. so rel.head=1 is the household
 head, rel.head=2 is the spouse, rel.head=3 is the children.

 Here is an example to see how it looks like:

 df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103, 17103,
                     17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


 I want to add a dummy variable that is equal to 1 when these conditions
 held simultaneously :

 a) the number of rows with same ID is equal to 2
 b) the variable rel.head=1 and rel.head=3


 So my ideal output is:

   ID      rel.head   added.dummy
 1  17100        1           1
 2  17100        3           1
 3  17101        1           0
 4  17102        1           0
 5  17103        1           0
 6  17103        2           0
 7  17104        1           0
 8  17104        2           0
 9  17104        3           0
 10 17105        1           1
 11 17105        3           1

 Is there a simple way to do that?
 Can somebody help?

 Thanks in advance,
 Grazia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading stopwords from a csv file

2011-10-04 Thread vioravis
I am using the tm package to do text miniing:

I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:

stopwordlist - read.csv(stopwords to be Removed 10042011.csv)
myStopwords - as.character(stopwordlist$stopwords)

When try removing the stopwords using 

tr1=tm_map(tr1,removeWords,myStopwords)

I am getting the following error:

Error in gsub(sprintf(\\b(%s)\\b, paste(words, collapse = |)), ,  : 
  internal error in compiling regexp

However, this works fine when I define myStopwords = c() instead of
reading from the csv file.

Can someone please help me to resolve this issue?

Thank you.

Ravi

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Dennis Murphy
Hi:

The smooth is not going to replicate the quantile estimates you get
from the 'boxplots'; the smooth is estimating a conditional mean using
loess, with confidence limits associated with uncertainty in the
estimate of the conditional mean function, which are almost certainly
going to be narrower than the corresponding quantiles of the data
distributions.  If you want to mimic the behavior in the 'boxplots', I
would save the information from them into a data frame with columns
for each quantile, assign variable names to the quantiles, melt the
corresponding data frame so that the quantile names become factor
levels (with whatever variable is used to distinguish the 'boxplots'
as the ID variable in melt()), and then use ggplot2 or lattice to plot
the corresponding sets of lines.

Here's an example:

library('plyr')
library('reshape')

# Toy data frame
dd - data.frame(year = rep(2000:2008, each = 500), y = rnorm(4500))

# Function to compute quantiles and return a data frame
g - function(d) {
   qq - as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95
   names(qq) - paste('Q', c(5, 25, 50, 75, 95), sep = '')
   qq   }

# Apply function to each year of data in dd:
qdf - ddply(dd, .(year), g)
# melt to produce a factor variable whose levels are quantiles
qdfm - melt(qdf, id = 'year')

# Use ggplot() to plot the boxplots and quantile lines:
ggplot() +
geom_boxplot(data = dd, aes(x = factor(year), y = y)) +
geom_line(data = qdfm, aes(x = factor(year), y = value,
   group = variable, colour = variable),
  size = 1) +
labs(x = 'Year', colour = 'Quantile')

The idea of superimposing the lines over the boxplots is to show that
the default method of quantile() corresponds to the quantile() method
used to generate boxplots in ggplot2.

Is that closer to what you're after? If you want, you can always use
geom_ribbon() to shade the areas between the lines and
scale_colour_manual() to manually specify the line colors. Using the
above example, here's one way, using the unmelted quantile data:

ggplot(qdf, aes(x = year, y = Q50)) +
geom_line(size = 2, color = 'navyblue') +
geom_ribbon(aes(ymin = Q25, ymax = Q75), fill = 'blue', alpha = 0.4) +
geom_ribbon(aes(ymin = Q5, ymax = Q25), fill = 'blue', alpha = 0.2) +
geom_ribbon(aes(ymin = Q75, ymax = Q95), fill = 'blue', alpha = 0.2) +
labs(x = 'Year', y = 'Y')

Dennis

On Tue, Oct 4, 2011 at 10:01 AM,  thomas.ad...@noaa.gov wrote:
 Hadley,

 Thanks for responding. No, not smoothed quantile regression. If you go here: 
 http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored 
 squares, you can see we have 'boxplots'. What I want to express is the 
 uncertainty as depicted in the example from my previous email where I can 
 specify the limits calculated for the 'boxplots' using  5%, 25%,75%, 95% 
 limits as we have with the 'boxplots'.

 Tom

 - Original Message -
 From: Hadley Wickham had...@rice.edu
 Date: Tuesday, October 4, 2011 10:23 am
 Subject: Re: [R] Question about ggplot2 and stat_smooth
 To: Thomas Adams thomas.ad...@noaa.gov
 Cc: R-help forum r-help@r-project.org


 On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams thomas.ad...@noaa.gov
 wrote:
   I'm interested in creating a graphic -like- this:
 
  c - ggplot(mtcars, aes(qsec, wt))
  c + geom_point() + stat_smooth(fill=blue, colour=darkblue,
 size=2, alpha
  = 0.2)
 
  but I need to show 2 sets of bands (with different shading) using
 5%, 25%,
  75%, 95% limits that I specify and where the heavy blue line is the
 median.
  I don't understand how to do this with ggplot2.

 Exactly what sort of limits do you want?  It sounds like maybe you are
 looking for smoothed quantile regression.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to subset() from data frame using specific rows

2011-10-04 Thread Rich Shepard

  I have a data frame called chemdata with this structure:


str(chemdata)

'data.frame':   14886 obs. of  4 variables:
 $ site: Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114 128 
124 2 3 3 ...
 $ sampdate: Date, format: 1996-12-27 1996-08-22 ...
 $ param   : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ quant   : num  0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

  I've looked in the R Cookbook and Dalgaard's intro book without finding a
way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
when subdsetting a data frame..

  I need to create subsets (as data frames) based on sites, but including
all sites on each stream. For example, using the initial site factor shown
above, I want a subset containing all data for sites BC-0.5, BC-1.
BC-2, BC-3, BC-4, BC-5, and BC-6.

Pointers appreciated,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Sarah Goslee
Hi Rich,

You can use something like this:

 testdata - c(A1, A2, A3, B1, B2, B3)
 grep(^A, testdata)
[1] 1 2 3
 grepl(^A, testdata)
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Sarah

On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I have a data frame called chemdata with this structure:

 str(chemdata)

 'data.frame':   14886 obs. of  4 variables:
  $ site    : Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114
 128 124 2 3 3 ...
  $ sampdate: Date, format: 1996-12-27 1996-08-22 ...
  $ param   : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ quant   : num  0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

  I've looked in the R Cookbook and Dalgaard's intro book without finding a
 way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
 when subdsetting a data frame..

  I need to create subsets (as data frames) based on sites, but including
 all sites on each stream. For example, using the initial site factor shown
 above, I want a subset containing all data for sites BC-0.5, BC-1.
 BC-2, BC-3, BC-4, BC-5, and BC-6.

 Pointers appreciated,

 Rich


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread R. Michael Weylandt
This isn't going to be the most elegant, but it should work:

## Get the factors as characters

ff - as.character(chemdata$site)

## Identify those that match what you want
ff - grepl(ff, BC-)

now use this logical vector to subset

chemdata[ff, ]

Can't test, but should be good to go assuming that BC- entirely
identifies those sites you want. If you have other BC- things read
through the ?regex documentation and I think it describes how to do
selective wildcards

Michael

On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
  I have a data frame called chemdata with this structure:

 str(chemdata)

 'data.frame':   14886 obs. of  4 variables:
  $ site    : Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114
 128 124 2 3 3 ...
  $ sampdate: Date, format: 1996-12-27 1996-08-22 ...
  $ param   : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ quant   : num  0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

  I've looked in the R Cookbook and Dalgaard's intro book without finding a
 way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
 when subdsetting a data frame..

  I need to create subsets (as data frames) based on sites, but including
 all sites on each stream. For example, using the initial site factor shown
 above, I want a subset containing all data for sites BC-0.5, BC-1.
 BC-2, BC-3, BC-4, BC-5, and BC-6.

 Pointers appreciated,

 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Rich Shepard

On Tue, 4 Oct 2011, Sarah Goslee wrote:


You can use something like this:


testdata - c(A1, A2, A3, B1, B2, B3)
grep(^A, testdata)

[1] 1 2 3

grepl(^A, testdata)

[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE


Sarah,

  I don't see how this gives me a data frame containing only those sites I
specify. I want to plot by sites-within-streams specifying which param
factor to use.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Sarah Goslee
Hi Rich,

On Tue, Oct 4, 2011 at 2:58 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
 On Tue, 4 Oct 2011, Sarah Goslee wrote:

 You can use something like this:

 testdata - c(A1, A2, A3, B1, B2, B3)
 grep(^A, testdata)

 [1] 1 2 3

 grepl(^A, testdata)

 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

 Sarah,

  I don't see how this gives me a data frame containing only those sites I
 specify. I want to plot by sites-within-streams specifying which param
 factor to use.


You asked for pointers, and didn't provide a reproducible example, so
I offered a
pointer.

If you have a logical vector that specifies whether to include or omit
a row, you
can use that to subset your data frame.

sitesToUse - grepl(firstsite, mydata$mysitenames)
dataframeForThatSite - mydata[sitesToUse, ]

If you want real worked results, you'll need to provide a reproducible example
of your own.

Sarah
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Rich Shepard

On Tue, 4 Oct 2011, R. Michael Weylandt wrote:


This isn't going to be the most elegant, but it should work:
## Get the factors as characters
ff - as.character(chemdata$site)

## Identify those that match what you want

ff - grepl(ff, BC-)


Michael,

  Apparently grep works differently in R than it does on the command line:

bf - grep(ff, BC-)
Warning message:
In grep(ff, BC-) :
  argument 'pattern' has length  1 and only the first element will be used

  I understand what you suggest but it does not appear to work for me.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Jeff Newmiller
?grep
?names
Use indexing by name [, namevector]
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Rich Shepard rshep...@appl-ecosys.com wrote:

I have a data frame called chemdata with this structure:

 str(chemdata)
'data.frame':   14886 obs. of 4 variables:
$ site : Factor w/ 148 levels BC-0.5,BC-1,..: 104 145 126 115 114 128 124 2 
3 3 ...
$ sampdate: Date, format: 1996-12-27 1996-08-22 ...
$ param : Factor w/ 8 levels As,Ca,Cl,..: 1 1 1 1 1 1 1 1 1 1 ...
$ quant : num 0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...

I've looked in the R Cookbook and Dalgaard's intro book without finding a
way to use wildcards (e.g., like BC-*) or explicitly witing each site ID
when subdsetting a data frame..

I need to create subsets (as data frames) based on sites, but including
all sites on each stream. For example, using the initial site factor shown
above, I want a subset containing all data for sites BC-0.5, BC-1.
BC-2, BC-3, BC-4, BC-5, and BC-6.

Pointers appreciated,

Rich

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] adding a dummy variable...

2011-10-04 Thread baptiste auguie
Hi,

Using ddply,

ddply(df, .(ID), mutate, nrows=length(rel.head), test = nrows==2 
all(rel.head %in% c(1,3)))

HTH,

baptiste


On 5 October 2011 06:02, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Here's another way to do it with the plyr package, also not terribly
 elegant. It assumes that rel.head is a factor in your original data
 frame:
 str(df)
 'data.frame':   11 obs. of  2 variables:
  $ ID      : Factor w/ 6 levels 17100,17101,..: 1 1 2 3 4 4 5 5 5 6 ...
  $ rel.head: Factor w/ 3 levels 1,2,3: 1 3 1 1 1 2 1 2 3 1 ...

 If this is not the case in your data, then you need to modify the
 function f below accordingly. (This is why use of dput() is preferred
 when sending example data to R-help, BTW.)

 library('plyr')
 f - function(d) {
    tvec - factor(c(1, 3), levels = 1:3)   # target vector
    if(nrow(d) != 2L) {d$dummy - rep(0, nrow(d)); return(d)}
    # If the first if statement is FALSE, then the following code is run:
       d$dummy - ifelse(!identical(d[, 2], tvec), 0, 1)
       d
   }

 ddply(df, .(ID), f)

      ID rel.head dummy
 1  17100        1     1
 2  17100        3     1
 3  17101        1     0
 4  17102        1     0
 5  17103        1     0
 6  17103        2     0
 7  17104        1     0
 8  17104        2     0
 9  17104        3     0
 10 17105        1     1
 11 17105        3     1

 HTH,
 Dennis

 On Tue, Oct 4, 2011 at 8:44 AM,  gra...@stat.columbia.edu wrote:
 Hi all,

 I have a dataset of individuals where the variable ID corresponds to the
 identification of the household where the individual lives. rel.head stands
 for the relationship with the household head. so rel.head=1 is the household
 head, rel.head=2 is the spouse, rel.head=3 is the children.

 Here is an example to see how it looks like:

 df-data.frame(ID=c(17100, 17100, 17101, 17102, 17103, 17103,
                     17104, 17104, 17104, 17105, 17105),
  rel.head=c(1,3,1,1,1, 2, 1, 2, 3, 1, 3))


 I want to add a dummy variable that is equal to 1 when these conditions
 held simultaneously :

 a) the number of rows with same ID is equal to 2
 b) the variable rel.head=1 and rel.head=3


 So my ideal output is:

   ID      rel.head   added.dummy
 1  17100        1           1
 2  17100        3           1
 3  17101        1           0
 4  17102        1           0
 5  17103        1           0
 6  17103        2           0
 7  17104        1           0
 8  17104        2           0
 9  17104        3           0
 10 17105        1           1
 11 17105        3           1

 Is there a simple way to do that?
 Can somebody help?

 Thanks in advance,
 Grazia

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Bert Gunter
... and, as an aside, if you had simply searched within R for (the
obvious?!)

??wildcard

you would have received the suggestion for glob2rx() in utils, which
actually would have enabled you to use a familiar wildcard expression.
However, the answers you've already received are simpler and more
straightforward.

-- Bert

On Tue, Oct 4, 2011 at 12:03 PM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi Rich,

 On Tue, Oct 4, 2011 at 2:58 PM, Rich Shepard rshep...@appl-ecosys.com
 wrote:
  On Tue, 4 Oct 2011, Sarah Goslee wrote:
 
  You can use something like this:
 
  testdata - c(A1, A2, A3, B1, B2, B3)
  grep(^A, testdata)
 
  [1] 1 2 3
 
  grepl(^A, testdata)
 
  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
 
  Sarah,
 
   I don't see how this gives me a data frame containing only those sites I
  specify. I want to plot by sites-within-streams specifying which param
  factor to use.


 You asked for pointers, and didn't provide a reproducible example, so
 I offered a
 pointer.

 If you have a logical vector that specifies whether to include or omit
 a row, you
 can use that to subset your data frame.

 sitesToUse - grepl(firstsite, mydata$mysitenames)
 dataframeForThatSite - mydata[sitesToUse, ]

 If you want real worked results, you'll need to provide a reproducible
 example
 of your own.

 Sarah
 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often be
impatient with elementary studies or fight shy of them. If it were possible
to reach the ultimate truths without the elementary studies usually prefixed
to them, these would not be preparatory studies but superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread R. Michael Weylandt
No, that was just a typo on my end:

the correct order of arguments should have been

ff - grepl(BC-, ff)

On Tue, Oct 4, 2011 at 3:07 PM, Rich Shepard rshep...@appl-ecosys.com wrote:
 On Tue, 4 Oct 2011, R. Michael Weylandt wrote:

 This isn't going to be the most elegant, but it should work:
 ## Get the factors as characters
 ff - as.character(chemdata$site)

 ## Identify those that match what you want

 ff - grepl(ff, BC-)

 Michael,

  Apparently grep works differently in R than it does on the command line:

 bf - grep(ff, BC-)
 Warning message:
 In grep(ff, BC-) :
  argument 'pattern' has length  1 and only the first element will be used

  I understand what you suggest but it does not appear to work for me.

 Thanks,

 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ggplot2 and stat_smooth

2011-10-04 Thread Hadley Wickham
 # Function to compute quantiles and return a data frame
 g - function(d) {
   qq - as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95
   names(qq) - paste('Q', c(5, 25, 50, 75, 95), sep = '')
   qq   }

You could cut out the melt step by making this return a data frame:

g - function(df, qs = c(.05, .25, .50, .75, .95)) {
  data.frame(q = qs, quantile(d$y, qs))
}

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tinn-R

2011-10-04 Thread Charles McClure
I am new to R and have recently tried Tinn-R with very mixed and unexpected
results.  Can you point me to a Tinn-R tutorial on the web or a decent
reference book?

Thank you for your help;

Charles McClure
cmccl...@atrcorp.com
cfmccl...@verizon.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading stopwords from a csv file

2011-10-04 Thread vioravis
The following for loops does the work but it takes a good 30 minutes to run:

for(i in 1:length(myStopwords))
{
  currentWord - myStopwords[i]
  tr1=tm_map(tr1,removeWords,currentWord)
}

Are there any faster alternatives?? Thank you.

Ravi



--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] F-values in nested designs

2011-10-04 Thread Marcus Nunes
Hello all

I'm trying to learn how to fit a nested model in R. I found a toy
example on internet where a dataset that have 3 areas and 4 sites
within these areas. When I use Minitab to fit a nested model to this
data, this is the ANOVA table that I got:

Nested ANOVA: y versus areas, sites

Analysis of Variance for y
Source  DFSS   MS  F  P
areas24.5000   2.2500  0.158  0.856
sites9  128.2500  14.2500  3.167  0.012
Error   24  108.   4.5000
Total   35  240.7500

When I use R, this is the ANOVA table that I got:

summary(aov(y ~ areas + Error(areas%in%sites)))

Error: areas:sites
  Df Sum Sq Mean Sq F value Pr(F)
areas  2   4.502.25  0.1579 0.8563
Residuals  9 128.25   14.25

Error: Within
  Df Sum Sq Mean Sq F value Pr(F)
Residuals 24108 4.5
Warning message:
In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular

The results are the same, except for one F-value and I don't
understand why. Hence, these are my questions:

1) I searched google and I can't find a reason to have this warning in
my code. Why is this happening?

2) why I don't have an F-value for the nested effect? I realize that R
call it as Residuals in the first part of the summary, but there is a
way to make R consider it s another factor?

INB4: if I have a nested design with treatment A and treatment B
within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I
make R give these values directly, without further coding?

Thanks for your help.

Below is my code and information about my system.
--
y = c(10, 12, 8, 13, 14, 8, 10, 12, 9, 10, 12, 11, 11, 13, 9, 10, 14,
11, 10, 9, 8, 9, 8, 8, 13, 14, 7, 10, 10, 13, 9, 7, 16, 12, 5, 4)
areas = as.factor(rep(c(m1, m2, m3), each=12))
#sites = as.factor(c(rep(c(1, 2, 3, 4), 3), rep(c(5, 6, 7, 8), 3),
rep(c(9, 10, 11, 12), 3)))
sites = as.factor(c(rep(c(1, 2, 3, 4), 9)))
repl  = as.factor(rep(c(1, 2, 3), each=4, 3))

summary(aov(y ~ areas + Error(areas%in%sites)))

summary(aov(y ~ areas + Error(areas%in%sites)))
Error: areas:sites
          Df Sum Sq Mean Sq F value Pr(F)
areas      2   4.50    2.25  0.1579 0.8563
Residuals  9 128.25   14.25
Error: Within
          Df Sum Sq Mean Sq F value Pr(F)
Residuals 24    108     4.5
Warning message:
In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular



sessionInfo()
R version 2.13.1 Patched (2011-08-25 r56798)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] car_2.0-11 survival_2.36-9nnet_7.3-1
[4] MASS_7.3-14lme4_0.999375-40   Matrix_0.999375-50
[7] lattice_0.19-33nlme_3.1-102

loaded via a namespace (and not attached):
[1] grid_2.13.1   stats4_2.13.1 tools_2.13.1
--
Marcus Nunes
marcus.nu...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question about sort and BH

2011-10-04 Thread R. Michael Weylandt
On Mon, Oct 3, 2011 at 10:08 PM, chunjiang he camel...@gmail.com wrote:
 Hi,

 I have two questions want to ask.

 1. If I have a matrix like this, and I want to figure out the rows whose
 value in the 3rd column are less than 0.05. How can I do it with R.
 hsa-let-7a--MBTD1    0.528239197    2.41E-05
 hsa-let-7a--APOBEC1    0.507869409    5.51E-05
 hsa-let-7a--PAPOLA    0.470451884    0.000221774
 hsa-let-7a--NF2    0.469280186    0.000231065
 hsa-let-7a--SLC17A5    0.454597978    0.000381713
 hsa-let-7a--THOC2    0.447714054    0.000479322
 hsa-let-7a--SMG7    0.444972282    0.000524129


Suppose your data is d: then try which(d[,3]  0.05)

 2. I got the p.adjust.R from R source. In the method BH, I am not clear
 with the code:
           i - lp:1L


# Just the same as seq(lp, 1 , by = -1)


           o - order(p, decreasing = TRUE)
           ro - order(o)
           pmin(1, cummin( n / i * p[o] ))[ro]

# pmin does parallel minimums, p[o] is the same as sort(p) and
ordering by [ro] puts the outputted values in reverse order than the
went in.

As an exercise, I'd suggest you get the original paper, see how the
calculation is done there, implement it in R as best you can, even if
it seems loop-y, and refine it down to R Core's implementation. One of
the best ways I know to learn to think vectorwise.

Sorry I can't help more, but I don't know the method so I dont want to
read too much into the code and say something that I havent thought
through (Lord knows I do that enough on this list!!)

Michael




 How to explain the first and the fourth row.
 p.adjust.R===
 p.adjust.methods -
    c(holm, hochberg, hommel, bonferroni, BH, BY, fdr, none)
 p.adjust - function(p, method = p.adjust.methods, n = length(p))
 {
    ## Methods 'Hommel', 'BH', 'BY' and speed improvements contributed by
    ## Gordon Smyth sm...@wehi.edu.au.
    method - match.arg(method)
    if(method == fdr) method - BH # back compatibility
    nm - names(p)
    p - as.numeric(p); names(p) - nm
    p0 - p
    if(all(nna - !is.na(p))) nna - TRUE
    p - p[nna]
    lp - length(p)
    stopifnot(n = lp)
    if (n = 1) return(p0)
    if (n == 2  method == hommel) method - hochberg
    p0[nna] -
  switch(method,
        bonferroni = pmin(1, n * p),
        holm = {
     i - seq_len(lp)
     o - order(p)
     ro - order(o)
     pmin(1, cummax( (n - i + 1L) * p[o] ))[ro]
        },
        hommel = { ## needs n-1 = 2 in for() below
     if(n  lp) p - c(p, rep.int(1, n-lp))
     i - seq_len(n)
     o - order(p)
     p - p[o]
     ro - order(o)
     q - pa - rep.int( min(n*p/i), n)
     for (j in (n-1):2) {
         ij - seq_len(n-j+1)
         i2 - (n-j+2):n
         q1 - min(j*p[i2]/(2:j))
         q[ij] - pmin(j*p[ij], q1)
         q[i2] - q[n-j+1]
         pa - pmax(pa,q)
     }
     pmax(pa,p)[if(lp  n) ro[1:lp] else ro]
        },
        hochberg = {
     i - lp:1L
     o - order(p, decreasing = TRUE)
     ro - order(o)
     pmin(1, cummin( (n - i + 1L) * p[o] ))[ro]
        },
        BH = {
     i - lp:1L
     o - order(p, decreasing = TRUE)
     ro - order(o)
     pmin(1, cummin( n / i * p[o] ))[ro]
        },
        BY = {
     i - lp:1L
     o - order(p, decreasing = TRUE)
     ro - order(o)
     q - sum(1L/(1L:n))
     pmin(1, cummin(q * n / i * p[o]))[ro]
        },
        none = p)
    p0
 }
 


 I wrote a code to do my work in BH correction like the following:

 rm(list=ls())
 a-read.csv(test.txt,sep=\t,header=F,quote=)
 b-a[order(a[,3],decreasing=TRUE),]
 c-p.adjust(b[,3],method=BH)
 b[,4]-c
 write.table(b,zz.txt,sep=\t)

 Is that right? Thanks for all.

 Jiang

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Rich Shepard

On Tue, 4 Oct 2011, Sarah Goslee wrote:


You asked for pointers, and didn't provide a reproducible example, so I
offered a pointer.


Sarah,

  I did not realize that your pointer was to the factor component of the
subset() command.

  I think the most parsimonious thing for me to do is to modify the database
table with a new column of the full stream name, then re-export and re-read
into R.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to subset() from data frame using specific rows

2011-10-04 Thread Rich Shepard

On Tue, 4 Oct 2011, R. Michael Weylandt wrote:


No, that was just a typo on my end:
the correct order of arguments should have been
ff - grepl(BC-, ff)


Michael,

  Thank you.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] joining tables

2011-10-04 Thread Jose Bustos Melo
Hello everyone,

I know this is very basic question for you people. I'm working with mani 
diferent tables, but everyone has the same variables. (V1, V2, V3). The only 
think that I need to do is to put together this tables. In other words, 
creating just one big table with all the cases showed in the smaller tables. 
For example:

tabla1-data.frame(v1,v2,v3)
tabla2-data.frame(v1,v2,v3)
tabla3-data.frame(v1,v2,v3)
tabla4-data.frame(v1,v2,v3)

Just want to join it together in just one table. By the way, are more that 3 
Millon cases.
Thank you in advance!
José
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] distance coefficient for amatrix with ngative valus

2011-10-04 Thread R. Michael Weylandt
You are, of course, entirely correct and, once again, I tip my hat to
the erudition of those who comment on this list. My initial
formulation, for a distance on a normed space inherited from the norm,
stands trivially, but as you rightly point out, I'm excluding many
interesting and possibly useful norms.

Follies of youth and all that

Michael

On Tue, Oct 4, 2011 at 2:06 AM, Rolf Turner rolf.tur...@xtra.co.nz wrote:
 On 04/10/11 17:05, R. Michael Weylandt wrote:

 SNIP

 More importantly, as I said in my initial response, any distance
 metric worth its salt is translation invariant.

 SNIP

 Point of order, Mr. Chairman.  (This is really *toadally* off topic;
 my apologies, but I couldn't resist --- I trained as a pure mathematician).

 A *metric* need not in general be translation invariant.  Indeed a metric
 need not be defined on a space in which translation makes any sense.

 A metric defined in terms of a *norm* (on a normed vector space)  by
 rho(x,y) = ||x - y|| is of course by definition translation invariant, and
 that's
 what most of us think in terms of.

 But there are perfectly ``reasonable''  metrics, defined on vector spaces,
 which are not translation invariant.  Whether these are ``worth their salt''
 is I suppose a matter of taste.  (You should pardon the expression. :-) )

 A simple e.g. of a non-translation-invariant metric is

    rho(x,y) = |x - y|/(1 + |x| + |y|)

 (defined on the real line).  It is easily checked that rho(.,.) satisfies
 the
 four conditions that a metric must satisfy.  (Exercise for the interested
 reader.)

 Note that rho(1,2) = 1/4  but rho(2,3) = 1/6, ergo not translation
 invariant.

    cheers,

        Rolf Turner


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] joining tables

2011-10-04 Thread R. Michael Weylandt
Perhaps rbind?

Michael

On Tue, Oct 4, 2011 at 3:48 PM, Jose Bustos Melo jbustosm...@yahoo.es wrote:
 Hello everyone,

 I know this is very basic question for you people. I'm working with mani 
 diferent tables, but everyone has the same variables. (V1, V2, V3). The only 
 think that I need to do is to put together this tables. In other words, 
 creating just one big table with all the cases showed in the smaller tables.
 For example:

 tabla1-data.frame(v1,v2,v3)
 tabla2-data.frame(v1,v2,v3)
 tabla3-data.frame(v1,v2,v3)
 tabla4-data.frame(v1,v2,v3)

 Just want to join it together in just one table. By the way, are more that 3 
 Millon cases.
 Thank you in advance!
 José
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adonis and nmds help and questions for a novice.

2011-10-04 Thread Gavin Simpson
On Tue, 2011-10-04 at 08:45 +, Ashley Houlden wrote:
 Hi,
 
 forgive me if someone has already posted about this but I have had a
 look and cannot find the answer, also I am very new to R and been
 getting the grips with this.
 
 I have been trying to use Adonis to find out if there are significant
 difference between groups on data that I have analyses with NMDS, and
 have been struggling with getting this to work and understanding what
 is going on.  I am looking at diversity in different soils with either
 woodland or grassland habitats.
 
 I have run the scripts
 
 library(vegan)
 library(ecodist)
 library(MASS)
 mydata - read.table(ash_data.csv, header=TRUE, sep=,,
 row.names=Site)
 
 envdata_fit - read.table(ash_env.csv, header=TRUE, sep=,,
 row.names=Site)
 
 #distance matrix of samples using bray curtis
 d= bcdist(mydata, rmzero=FALSE)
 
 And then using the distance matrix from this to use for adoins? Is
 this correct.
 
 With this I have then run Adonis
 
 results = adonis(d ~ wood, envdata_fit, permutations = 1000)
 
 and get significant values to see if sig diff in diversity between
 wood and grass habitat.
 
 However I have been reading about combining the variables, but there
 seems to be different ways for example
 
 results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000)
 
 so get sig values for Wood and soil
 
 or
 
 results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000)
 
 And I get sig values for wood, soil, and wood soil interaction.
 
 This seems to make sense, however for both if I put the variable the
 other way around (soil+wood or soil*wood) I get very different sig
 values, even accounting for the fact they vary slightly due to the
 permutations. So whats is going on and why to the the values change so
 much?

You can isolate the effects due to different permutations being used by
setting a seed via set.seed().

As ?adonis says, sequential sums of squares are used. If there is
imbalance in your design it isn't surprising that the results are not
invariant to the ordering of terms in the formula.

 I was also wondering in Adonis, can you nest treatments, so see effect
 of soil removing the effect of woodland as you can with anova?

Not 100% sure what you mean by nested, but adonis() uses the full
functionality of R's formula interface. See The R manual for details
or ?formula. ?adonis also has details of how you might test a nested
design in the Details section - this might not be what you want but it
does allow you to test for an effect of one variable by conditioning the
permutations on another.

 Another general questions as well, if I have more than two groups in a
 treatment, say for soil, clay, sand, loam and do the stats, and I get
 a significant value, what does it actually mean, is it that soil
 generally has an effect, with each group separate, or there are
 general differences between soils which may be one group is very
 different to the other two?

The permutation test, test at the level of the factor, not pairwise
comparisons of the levels within the factor. So you get information on
Soil, not on Clay, Sand, Loam levels. This is the same as you would get
if you did anova(mod) where mod was a linear model with a factor
predictor.

betadisper() the sister function to adonis() which tests for differences
of multivariate dispersions, not differences of multivariate means, does
allow the sorts of pairwise tests you are thinking of, but we haven't
implemented this in adonis yet I'm afraid.

HTH

G

 Many many thanks to anyone who can help me as I have asked people who
 use R near me and no-one is sure and uses Adonis..
 
 Ash

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adonis and nmds help and questions for a novice.

2011-10-04 Thread Gavin Simpson
On Tue, 2011-10-04 at 08:45 +, Ashley Houlden wrote:
 Hi,
snip /
 #distance matrix of samples using bray curtis
 d= bcdist(mydata, rmzero=FALSE)

In addition, you don't necessarily need ecodist for the bray curtis
distance. vegdist() in vegan will compute this for you.

Not that there is anything wrong with ecodist I hasten to add - just
that you can do this all in vegan if you wanted.

G

 And then using the distance matrix from this to use for adoins? Is this 
 correct.
 
 With this I have then run Adonis
 
 results = adonis(d ~ wood, envdata_fit, permutations = 1000)
 
 and get significant values to see if sig diff in diversity between wood and 
 grass habitat.
 
 However I have been reading about combining the variables, but there seems to 
 be different ways for example
 
 results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000)
 
 so get sig values for Wood and soil
 
 or
 
 results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000)
 
 And I get sig values for wood, soil, and wood soil interaction.
 
 This seems to make sense, however for both if I put the variable the other 
 way around (soil+wood or soil*wood) I get very different sig values, even 
 accounting for the fact they vary slightly due to the permutations. So whats 
 is going on and why to the the values change so much?
 
 I was also wondering in Adonis, can you nest treatments, so see effect of 
 soil removing the effect of woodland as you can with anova?
 
 Another general questions as well, if I have more than two groups in a 
 treatment, say for soil, clay, sand, loam and do the stats, and I get a 
 significant value, what does it actually mean, is it that soil generally has 
 an effect, with each group separate, or there are general differences between 
 soils which may be one group is very different to the other two?
 
 Many many thanks to anyone who can help me as I have asked people who use R 
 near me and no-one is sure and uses Adonis..
 
 Ash
 
 
 No virus found in this message.
 Checked by AVG - www.avg.comhttp://www.avg.com
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tinn-R

2011-10-04 Thread David Scott

On 5/10/2011 7:25 a.m., Charles McClure wrote:

I am new to R and have recently tried Tinn-R with very mixed and unexpected
results.  Can you point me to a Tinn-R tutorial on the web or a decent
reference book?

Thank you for your help;

Charles McClure
cmccl...@atrcorp.com
cfmccl...@verizon.net



There is a free eBook on tinn-R available from Rmetrics:

https://www.rmetrics.org/ebooks-tinnr

Written by the authors of tinn-R.

Please consider a donation to the Rmetrics Association.



--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adonis and nmds help and questions for a novice.

2011-10-04 Thread Sarah Goslee
On Tue, Oct 4, 2011 at 4:41 PM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:
 On Tue, 2011-10-04 at 08:45 +, Ashley Houlden wrote:
 Hi,
 snip /
 #distance matrix of samples using bray curtis
 d= bcdist(mydata, rmzero=FALSE)

 In addition, you don't necessarily need ecodist for the bray curtis
 distance. vegdist() in vegan will compute this for you.

 Not that there is anything wrong with ecodist I hasten to add - just
 that you can do this all in vegan if you wanted.

Because ecodist is awesome. :) But there's no need to mix and match; vegan
and ecodist do many of the same things (for historical reasons).

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2: changing default colors of boxplot

2011-10-04 Thread Brian Smith
Hi,

I wanted to change the default colors appearing in boxplot. For example, the
following code (from the package/documentation):

===
library(ggplot2)

p - ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot(aes(fill = factor(am)))

===

Gives the default colors. What do I need to do to modify this so that:

1. Change the colors from green and red to blue and black
2. Only have the outline of the boxplot colored (and not fill in the box)


thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question about sort and BH

2011-10-04 Thread chunjiang he
Dear Michael,

Thanks very much.

Jiang
On Tue, Oct 4, 2011 at 2:39 PM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 On Mon, Oct 3, 2011 at 10:08 PM, chunjiang he camel...@gmail.com wrote:
  Hi,
 
  I have two questions want to ask.
 
  1. If I have a matrix like this, and I want to figure out the rows whose
  value in the 3rd column are less than 0.05. How can I do it with R.
  hsa-let-7a--MBTD10.5282391972.41E-05
  hsa-let-7a--APOBEC10.5078694095.51E-05
  hsa-let-7a--PAPOLA0.4704518840.000221774
  hsa-let-7a--NF20.4692801860.000231065
  hsa-let-7a--SLC17A50.4545979780.000381713
  hsa-let-7a--THOC20.4477140540.000479322
  hsa-let-7a--SMG70.4449722820.000524129
 

 Suppose your data is d: then try which(d[,3]  0.05)

  2. I got the p.adjust.R from R source. In the method BH, I am not clear
  with the code:
i - lp:1L


 # Just the same as seq(lp, 1 , by = -1)


o - order(p, decreasing = TRUE)
ro - order(o)
pmin(1, cummin( n / i * p[o] ))[ro]

 # pmin does parallel minimums, p[o] is the same as sort(p) and
 ordering by [ro] puts the outputted values in reverse order than the
 went in.

 As an exercise, I'd suggest you get the original paper, see how the
 calculation is done there, implement it in R as best you can, even if
 it seems loop-y, and refine it down to R Core's implementation. One of
 the best ways I know to learn to think vectorwise.

 Sorry I can't help more, but I don't know the method so I dont want to
 read too much into the code and say something that I havent thought
 through (Lord knows I do that enough on this list!!)

 Michael

 


  How to explain the first and the fourth row.
  p.adjust.R===
  p.adjust.methods -
 c(holm, hochberg, hommel, bonferroni, BH, BY, fdr,
 none)
  p.adjust - function(p, method = p.adjust.methods, n = length(p))
  {
 ## Methods 'Hommel', 'BH', 'BY' and speed improvements contributed by
 ## Gordon Smyth sm...@wehi.edu.au.
 method - match.arg(method)
 if(method == fdr) method - BH # back compatibility
 nm - names(p)
 p - as.numeric(p); names(p) - nm
 p0 - p
 if(all(nna - !is.na(p))) nna - TRUE
 p - p[nna]
 lp - length(p)
 stopifnot(n = lp)
 if (n = 1) return(p0)
 if (n == 2  method == hommel) method - hochberg
 p0[nna] -
   switch(method,
 bonferroni = pmin(1, n * p),
 holm = {
  i - seq_len(lp)
  o - order(p)
  ro - order(o)
  pmin(1, cummax( (n - i + 1L) * p[o] ))[ro]
 },
 hommel = { ## needs n-1 = 2 in for() below
  if(n  lp) p - c(p, rep.int(1, n-lp))
  i - seq_len(n)
  o - order(p)
  p - p[o]
  ro - order(o)
  q - pa - rep.int( min(n*p/i), n)
  for (j in (n-1):2) {
  ij - seq_len(n-j+1)
  i2 - (n-j+2):n
  q1 - min(j*p[i2]/(2:j))
  q[ij] - pmin(j*p[ij], q1)
  q[i2] - q[n-j+1]
  pa - pmax(pa,q)
  }
  pmax(pa,p)[if(lp  n) ro[1:lp] else ro]
 },
 hochberg = {
  i - lp:1L
  o - order(p, decreasing = TRUE)
  ro - order(o)
  pmin(1, cummin( (n - i + 1L) * p[o] ))[ro]
 },
 BH = {
  i - lp:1L
  o - order(p, decreasing = TRUE)
  ro - order(o)
  pmin(1, cummin( n / i * p[o] ))[ro]
 },
 BY = {
  i - lp:1L
  o - order(p, decreasing = TRUE)
  ro - order(o)
  q - sum(1L/(1L:n))
  pmin(1, cummin(q * n / i * p[o]))[ro]
 },
 none = p)
 p0
  }
  
 
 
  I wrote a code to do my work in BH correction like the following:
 
  rm(list=ls())
  a-read.csv(test.txt,sep=\t,header=F,quote=)
  b-a[order(a[,3],decreasing=TRUE),]
  c-p.adjust(b[,3],method=BH)
  b[,4]-c
  write.table(b,zz.txt,sep=\t)
 
  Is that right? Thanks for all.
 
  Jiang
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inconsistent behavior of summary function

2011-10-04 Thread Brian Diggs

I'm going to put on my fire suit and wade in (see inline)

On 10/4/2011 8:11 AM, Bert Gunter wrote:

On Tue, Oct 4, 2011 at 7:42 AM, Jeanne M. Spicerxn8spi...@gmail.comwrote:


I'm not sure how returning an incorrect result is ever a 'positive' feature


It is **not** incorrect; perhaps unexpected, but that is not the same.



You are technically correct -- the best kind of correct -- Futurama

The results (using the built-in data set rock)

 summary(rock[area])
  area
 Min.   : 1016
 1st Qu.: 5305
 Median : 7487
 Mean   : 7188
 3rd Qu.: 8870
 Max.   :12212
 summary(rock[[area]])
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
   10165305748771888870   12210

differ for exactly the reason you say (dispatching to different methods 
of summary), and the different values of max are both correct given the 
documentation.  However, let's walk through what it takes to show that.


In the help page for summary, an option digits is described, which has 
the default value max(3, getOption(digits)-3).  Executing this (or 
getOption(digits) alone and doing the math) results in the default 
value of digits being 4 (at least for me; and I do not believe that I 
have changed the option).


So what is this option used for?  In the documentation, it says: 
integer, used for number formatting with signif() (for summary.default) 
or format() (for summary.data.frame).  Let's assume that we realize 
that rock[area] is a data frame, which would be handled by 
summary.data.frame, and rock[[area]] is a vector, and further 
determine that summary.default is what will handle it (having not found 
summary.vector or summary.integer).


Let's dive into the help page for signif and format, since they are 
listed as relevant to the use of digits in the two different cases.


signif tells us that digits is integer indicating the number of ... 
significant digits (signif) to be used.  Looking at Details, the last 
sentence says Each element of the vector is rounded individually, 
unlike printing.  So in the case of a vector, each value is separately 
rounded to 4 significant digits (max of 12212 is rounded to 12210)


format tells us that digits is how many significant digits are to be 
used for numeric and complex x. ... This is a suggestion: enough decimal 
places will be used so that the smallest (in magnitude) number has this 
many significant digits, and also to satisfy nsmall.


So the difference is that if it is a vector, each part (min, quartiles, 
mean, and max) is rounded to 4 significant digits individually, while if 
it is a column of a data frame, the set is collectively rounded so that 
the smallest has 4 significant digits and the rest are carried out to 
the same decimal place.


Some points:

1) Both of these functions are in base, so I would expect the same 
behavior using the same (default) arguments.  Yes, the key word is 
expect.  Hopefully I have demonstrated that I understand why they 
differ.  I would not anticipate rounding, and when only one value has 
only one digit rounded, it is not really obvious that it happened.  (As 
compared to say, summary(1*rock$area), if I knew the data was not 
all rounded to the nearest 10,000).  So this is not just a matter of 
realizing that different methods are being dispatched, but reading 
through three different help pages (at least three, assuming I started 
at the right place and realized which other two were the relevant ones) 
to see that the end results are presented differently WHICH I WOULD NOT 
REALIZE THAT I EVEN NEED TO DO.


2) rock$area is an integer vector, so even if I realize that rounding 
would be done on floating point numbers, I would not expect (yes, again, 
expect) that integers would need to be rounded to some lesser number 
of significant digits.


3) The documentation for summary is actually wrong about digits for the 
case of summary.data.frame.  Consider:

 summary(rock[area], digits=17)
  area
 Min.   : 1016.0
 1st Qu.: 5305.25000
 Median : 7487.0
 Mean   : 7187.729166673
 3rd Qu.: 8869.5
 Max.   :12212.0

In particular, note the mean.  It is wrong (mathematically incorrect AND 
not consistent with the documentation).

 dput(mean(rock[area]))
structure(7187.7291667, .Names = area)

Why?  Internally, summary.data.frame calls summary.default on 
rock[[area]] with a hard coded digits value of 12.  Then takes this 
value, and formats it with 17 digits of precision as requested.  That's 
why there are the four zeros in the middle (the last digit being 
numerical imprecision due to binary representation of floating point 
values).


4) summary.default does not necessarily honor the number of significant 
digits either:


 for(i in 1:9) print(summary(rock[[area]], digits=i))
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
   10005000700070009000   1
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
   10005300750072008900   12000
 

  1   2   >