from:"Stephan Kolassa"

Re: [R] time series processing - count of datestamp delta's, per group

2014-03-23 Thread Stephan Kolassa


Hi Martin,

it sounds like you want the difference between the first and the last 
observation per user, not, e.g., all the date differences between 
successive observations of each separate user. Correct me if I'm wrong. 
That said, let's build some toy data:


set.seed(1)
dataset - data.frame(User=sample(LETTERS[1:5],100,replace=TRUE),
Date=sample(as.Date(2014-01-01)+0:364,100,replace=TRUE))

Now we can calculate these differences and plot a histogram or tabulate:

foo - with(dataset,by(Date,User,function(xx)diff(range(xx
hist(foo)
table(foo)

The key here is really the by() function, which calculates a function 
(here an anonymous function function(xx)diff(range(xx))) applied to 
some data (here dataset$Date) separately for each level of a grouping 
factor (here dataset$User).


HTH,
Stephan


On 23.03.2014 01:32, Martin Tomko wrote:

Apologies if the question is a but naïve, I am a novice in time series data 
handling in R

I have the following type of data, in a long format ( as called by the 
spacetime vignette – the table contains also space, not noted here):

User |  Date | Otherdata |
A | 01/01/2014 | aa
A | 01/01/2014 | bb
A | 01/01/2014 | cc
B | 01/01/2014 | aa
B | 05/01/2014 | cc
A | 07/01/2014 | aa
C | 05/02/2014 | xx
C | 20/02/2014 | yy

Etc
[A,B,C,…] are user Ids (some strings).
Date is converted into a Date format (2013-10-15)

The table is sorted by User and then by Date, and is over 800K records long. 
There are about 20K users.

User |  Date | Otherdata |
A | 2014-01-01 | aa
A | 2014-01-01  | bb
A | 2014-01-01  | cc
A | 2014-01-07  | aa
B | 2014-01-01  | aa
B | 2014-01-05  | cc
C | 2014-02-05  | xx
C | 2014-02-20  | yy

I want to:
Get a frequency table ( and ultimately plot) of the count of differences (in 
days) between records of a user. Meaning, I would first get the unique days 
recorded:

A | 2014-01-01
A | 2014-01-07
B | 2014-01-01
B | 2014-01-05
C | 2014-02-05
C | 2014-02-20

And then want to run the differences between timestamps within a group defined 
by the user, in days:
A| 6
B| 4
C|15

Imagining that I have tens of thousands of records, I then want the table with 
the counts of differences ( across all users) ( in our case it would be 6, 4 
and 15, all counte = 1)
IN the larger sample, something like this:
DeltaDays | Count
1 | 150
2 | 320
…
N | X

I know there are all sorts of packages for time analysis, but I could not find 
a simple function like this (incl searching here 
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf
 ). I assume that something working on a simple data frame would be sufficient, 
but I am happy ( prefer?) to use TS. I would appreciate any hints. The ultimate 
analysis involves also space, so hints in the direction of space-time are 
welcome. Ultimately, I would like to separate records for each user into a 
dataset that can be handled separately, but splitting it into a large number of 
files does not seem wise. Any hint also appreciated.

Thanks,
Martin



[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Citing Package Contributing Authors

2013-08-26 Thread Stephan Kolassa


Hi,

it usually is a good idea to look at the output of citation() (which, 
however, also often is auto-generated) or at the authors listed in 
package vignettes.


And thanks for citing R package authors. When I review papers, I often 
have to remind authors of this...


Best
Stephan

On 26.08.2013 21:56, Charles Determan Jr wrote:

Greetings,

I am familiar with the function cite('packageName') which provides the
output generated from the DESCRIPTION file.  In most cases this is
sufficient but I was wondering if there are contributing authors (in
addition to the primary) also listed on the CRAN page.  Is there a proper
way to account for them or are they generally not listed?

Regards,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving model and other objects from caret

2013-02-07 Thread Stephan Kolassa


Have you looked at ?save and ?load?

As I already wrote here: 
http://stackoverflow.com/questions/14761496/saving-and-loading-a-model-in-r


Best,
Stephan



On 07.02.2013 22:33, James Jong wrote:

Say I train a model in caret, e.g.:

RFmodel - train(X,Y,method='rf',trControl=myCtrl,tuneLength=1)

How can I save this to disk and load it later in R?

How about an object of the class resamples?

resamps - resamples(
 list(   RF = RFmodel,
 SVM = SVMmodel,
 KNN = KNNmodel,
 NN = NNmodel
 ))

Thanks,

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with stepAIC

2012-09-29 Thread Stephan Kolassa


Hi Katja,

try fitting the original model using ML (not REML) with the parameter 
method = ML:


PModell1 -lme(sqrt(Earthwormsm.2)~ 
Treatment+Pflanzenfrischmasse+aBodenfeuchte+bBodenfeuchte+Gfrischmasse+Ltrockenmasseanteil+KCN+I+Eindringtiefe,
random=~1|Block/Treatment/Cluster/Patch, data=Test1,na.action = na.omit, 
method = ML)


Good luck!
Stephan


On 29.09.2012 09:36, Katja wrote:

Dear help community,

I'm a R-beginner and use it for my master thesis.
I've got a mixed model and want to analyse it with lme. There are a lot
Cofactors that coult be relevant. To extract the important ones I want to do
the stepAIC, but always get an error warning.

Structure of my data:
data.frame':   72 obs. of  54 variables:
  $ Block  : Factor w/ 3 levels A,B,C: 1 1 1 1 1 1
1 1 1 1 ...
  $ Treatment  : Factor w/ 3 levels 1,2,3: 1 1 1 1 1 1
1 1 2 2 ...
  $ Paddock: Factor w/ 9 levels A1,A2,A3,..: 1 1 1
1 1 1 1 1 2 2 ...
  $ Cluster: Factor w/ 4 levels 1,2,3,4: 1 1 2 2
3 3 4 4 1 1 ...
  $ Growth : Factor w/ 2 levels s,t: 1 2 1 2 1 2 1 2
1 2 ...
  $ Patch  : Factor w/ 72 levels A1_1s,A1_1t,..: 1 2
3 4 5 6 7 8 9 10 ...
  $ Lufttemperatur : num  17.3 17.3 17.3 17.3 17.3 ...
  $ NSmm   : num  0 0 0 0 0 0 0 0 0 0 ...
  $ Luftfeuchte: num  79.2 79.2 79.2 79.2 79.2 ...
  $ Windstärke : num  0.96 0.96 0.96 0.96 0.96 0.96 0.96 0.96
1.27 1.27 ...
  $ Bodentemperatur: num  18.1 18.1 18.1 18.1 18.1 ...
  $ Earthwormsm.2  : num  0 0 0 20 24 60 12 20 0 4 ...
  $ Biomassgm.2: num  0 0 0 24.5 17.8 ...
  $ meanBiomassg   : num  0 0 0 1.227 0.743 ...
  $ Species: num  0 0 0 1 2 2 0 0 0 1 ...
  $ Juvenilem.2: num  0 0 0 4 4 1 2 3 0 0 ...
  $ JuvenileRate   : num  NA NA NA 80 80 ...
  $ AdultsRate : num  NA NA NA 20 20 ...
  $ Pflanzenfrischmasse: num  139 803 189 739 261 ...
  $ Pflanzentrockenmasseanteil : num  27.8 29.3 28.7 40 27.7 ...
  $ NAnteilGesamt  : num  2.38 3.35 2.47 2.21 2.55 4.45 2.33 2.23
2.3 1.68 ...
  $ CAnteilGesamt  : num  43.9 45 45 43.6 43.8 ...
  $ aBodenfeuchte  : num  24.4 21.8 14.7 18.2 23.9 ...
  $ bBodenfeuchte  : num  17.3 15.6 NA NA 19.1 ...
  $ cBodenfeuchte  : num  NA NA NA NA NA ...
  $ Gfrischmasse   : num  84.6 739.3 130.8 566.8 94 ...
  $ Gtrockenmasse  : num  25.6 215.4 41.7 239.4 30.5 ...
  $ Gtrockenmasseanteil: num  30.2 29.1 31.9 42.2 32.5 ...
  $ GtrockenmasseanteilanGesamt: num  78.8 98.3 86 96 76.5 ...
  $ GN : num  2.1 3.35 2.31 2.17 2.28 ...
  $ GC : num  43.8 45 44.9 43.6 43.5 ...
  $ GCN: num  20.9 13.4 19.4 20.1 19.1 ...
  $ Lfrischmasse   : num  12.1 0 0 0 19 ...
  $ Ltrockenmasse  : num  2.56 0 0 0 3.92 1.84 0 0 2.2 0 ...
  $ Ltrockenmasseanteil: num  21.2 0 0 0 20.7 ...
  $ LtrockenmasseanteilanGesamt: num  7.89 0 0 0 9.83 1.06 0 0 3.73 0 ...
  $ LN : num  3.96 NA NA NA 4.09 ...
  $ LC : num  43.1 NA NA NA 45.6 ...
  $ LCN: num  10.9 NA NA NA 11.1 ...
  $ Kfrischmasse   : num  20.16 8.96 38.48 56.24 31.2 ...
  $ Ktrockenmasse  : num  4.32 3.64 6.8 10 5.44 ...
  $ Ktrockenmasseanteil: num  21.4 40.6 17.7 17.8 17.4 ...
  $ KTrockenmasseanteilanGesamt: num  13.32 1.66 14.01 4.01 13.64 ...
  $ KN : num  3.09 3.11 3.43 3.37 2.95 ...
  $ KC : num  45 48.3 46 44 44.3 ...
  $ KCN: num  14.6 15.5 13.4 13.1 15 ...
  $ I  : num  2.12 1.4 0.92 0.49 1.03 0.53 1.36 0.51
1.1 2.31 ...
  $ II : num  2.61 3.97 2.82 NA 1.94 1.98 2.84 1.66
2.38 1.99 ...
  $ III: num  2.24 3.51 3.17 NA 2.95 3.12 2.16 3.65
3.14 2.83 ...
  $ IV : num  2.52 2.91 3.11 NA 1.97 4.39 3.22 4.61
2.26 2.77 ...
  $ V  : num  2.38 2.91 2.61 NA 3.34 3.32 3.21 NA
2.44 2.26 ...
  $ VI : num  2.01 2.61 2.61 NA 3.61 3.57 NA NA 3.06
2.18 ...
  $ VII: num  NA 3.73 2.61 NA 3.25 0.27 NA NA NA 3.04
...
  $ Eindringtiefe  : int  64 80 80 8 80 61 55 32 73 80 ...

model fo the first parameter I want to check:

PModell1 -lme(sqrt(Earthwormsm.2)~
Treatment+Pflanzenfrischmasse+aBodenfeuchte+bBodenfeuchte+Gfrischmasse+Ltrockenmasseanteil+KCN+I+Eindringtiefe,
random=~1|Block/Treatment/Cluster/Patch, data=Test1,na.action = na.omit)

Everytime I try stepAIC with different settings:


Modell2- stepAIC(PModell1, method=lme)
Fehler in extractAIC.lme(fit, scale, k = k, ...) :

   AIC für REML-Näherung undefiniert

Re: [R] percentile of a given value: is there a reverse quantile function?

2012-03-03 Thread Stephan Kolassa


?ecdf

Best,
Stephan


On 03.03.2012 13:37, drflxms wrote:

Dear all,

I am familiar with obtaining the value corresponding to a chosen
probability via the quantile function.
Now I am facing the opposite problem I have a value an want to know it's
corresponding percentile in the distribution. So is there a function for
this as well?

Thank you for your support in advance, Felix

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot many x and y

2011-06-06 Thread Stephan Kolassa


Hi Alex,

could you be a little more specific as to what exactly you mean by 
plotting many x's and y's with one legend per plot?


Please note what appears at the bottom of every R-help mail:
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.

Following this piece of advice usually increases your chances for a 
helpful answer.


Best,
Stephan


Am 06.06.2011 21:46, schrieb Alaios:

Dear all

could you please plot many x's and y's with one legend per plot?

I would like to thank you in advance for your help

Best Regards
Alex.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force the for loop to stop

2011-06-01 Thread Stephan Kolassa


Hi,

you could set a dummy variable to FALSE outside the outermost loop. If 
the break condition is met in the inner loop, set the dummy variable to 
TRUE before breaking and test its truth status in the outer loop.


HTH
Stephan


Am 01.06.2011 21:25, schrieb Salih Tuna:

Hi,
I am looking for a command in R that would force the for loop to stop after
it finds what it is looking for.
As an example
for(i in 1:5){
for(j in 3:6){
   if(i==j)
 # do something...
break;
}
}
And i don't want the loop to execute once i = 3 and stop.

Is there a way to do this?

best,
salih

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force the for loop to stop

2011-06-01 Thread Stephan Kolassa


Hi Salih,

here you go:


dummy - FALSE
for ( ii in 1:5 ) {
  for ( jj in 3:6 ) {
cat(ii=,ii,; jj=,jj,\n,sep=)
if ( ii == jj ) {
  dummy - TRUE
  break
}
  }
  if ( dummy ) break
}
###

Note that I am using ii and jj as loop indices, not i and j. 
This makes it a lot easier to search for the loop counter in more 
complex scripts - if you just search for i, most of your hits will be 
something else than the loop counter.


HTH,
Stephan


Am 01.06.2011 22:06, schrieb Salih Tuna:

Hi Stephan,
Thanks a lot. But i am not very good at R yet so i dont know how to set the
dummy variable to FALSE. Can you please help me with that as well?

On Wed, Jun 1, 2011 at 8:34 PM, Stephan Kolassastephan.kola...@gmx.dewrote:


Hi,

you could set a dummy variable to FALSE outside the outermost loop. If the
break condition is met in the inner loop, set the dummy variable to TRUE
before breaking and test its truth status in the outer loop.

HTH
Stephan


Am 01.06.2011 21:25, schrieb Salih Tuna:


Hi,
I am looking for a command in R that would force the for loop to stop
after
it finds what it is looking for.
As an example
for(i in 1:5){
for(j in 3:6){
   if(i==j)
 # do something...
break;
}
}
And i don't want the loop to execute once i = 3 and stop.

Is there a way to do this?

best,
salih

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems Dating....

2011-06-01 Thread Stephan Kolassa


Hi Nat,

I guess something like
as.Date(as.character(3/4/2007),format=%d/%m/%Y)
should work - as.character() coerces the factors to characters, which 
the as.Date() function can work with, given the right format argument.


HTH
Stephan


Am 01.06.2011 22:59, schrieb Struckmeier, Nathanael:

I'm trying to convert a column in a data frame with dates from a
Factor type to a Date Object but I am encountering and error. (I am
having trouble plotting an x,y scatter and I suspect it's something with
my data format). I have a table with two columns and 8,000 rows.


dsort=read.delim(C:\\Documents and Settings\\E066582\\My

Documents\\R\\R-2.13.0\\bin\\dsort.txt)



dsort#name of
data.frame


colnames(dsort)[1] #name of column 1


[1] Date


colnames(dsort)[2] #name of column 2


[1] Qty


class(dsort$Date) #checked data type of column

Date and it came back as a factor

[1] factor


Date2=as.Date(dsort$Date)#attempt at changing the data type from a

factor to a date object (see error below).

Error in charToDate(x) :

   character string is not in a standard unambiguous format



Dates in my table are listed in 3/4/2007 format.

StatBat2




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying a function to a subset

2011-05-29 Thread Stephan Kolassa


Or just include is.na=TRUE in the definition of kurtosis():

kurtosis-function(x) {
  m4-sum((x-mean(x,na.rm=TRUE))^4,na.rm=TRUE)/length(x)
  s4-var(x,na.rm=TRUE)^2
  m4/s4 - 3 }

HTH
Stephan


Am 29.05.2011 11:34, schrieb Jim Holtman:

kurtosis(fem[!is.na(fem)])

Sent from my iPad

On May 29, 2011, at 4:54, gaiarridogaiarr...@usal.es  wrote:


Here´s my problem,
i have developed the function kurtosis (using R-book as a guide) with this
commands:
kurtosis-function(x) {
m4-sum((x-mean(x))^4)/length(x)
s4-var(x)^2
m4/s4 - 3 }

Then create the object fem, which is the difference between the count of a
trait in the left side of the body minus the right side. It´s a numeric
variable. But i´ve got no data for some individuals, so in the variable
fem i got numbers and smoe NA.
When i try to apply kurtosis to fem, the NA, don´t allow me to do that,
that`s what i get:


kurtosis(fem)

[1] NA

I tried with simple tricks, but i´ve got no results:

kurtosis(fem,na.rm=T)

Error en kurtosis(fem, na.rm = T) :
  el argumento(s) no fue utilizado(s) (na.rm = T)

What can i do? the values of fem vary from -4 to 9.

Thanks very much

kurtosis(fem[fem!=NA])

[1] NA


-
Mario Garrido Escudero
PhD student
Dpto. de Biología Animal, Ecología, Parasitología, Edafología y Qca. Agrícola
Universidad de Salamanca
--
View this message in context: 
http://r.789695.n4.nabble.com/Applying-a-function-to-a-subset-tp3558567p3558567.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset

2011-05-25 Thread Stephan Kolassa


Dear all,

may I suggest the acronym IOTT for the inter-ocular trauma test?

Now we just need someone to implement iot.test(). I assume it will 
appear on CRAN within the next 24 hours.


Looking forward to yet another base package,
Stephan



Am 25.05.2011 23:36, schrieb Greg Snow:

How can anyone overlook the intra-ocular trauma test (or sometimes
called the inter-ocular concussion test).  But the i-o trauma test
needs either a small data set or an appropriate graph of the data (or
can you look at a dataset of a hundred columns and a million rows and
do an intra-ocular trauma test?).  We were not told the size of the
dataset or enough information to know what type of graph to make.

You do make a good point though that with minimal additional
information the intra-ocular trauma test can be useful (well if it is
significant, there are many datasets that fail the intra-ocular
trauma test, but still yield interesting results after careful
study).  And for any dataset that has a significant intra-ocular
trauma test result, that should trump the results of
SnowsCorrectlySizedButOtherwiseUselessTestOfAnything.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] minimum distance between line segments

2011-03-11 Thread Stephan Kolassa


Hi,

this sounds like a standard problem in Computational Geometry - I guess 
game developers have to deal with something like this all the time. You 
may want to look at a textbook or two.


An article with the promising title On fast computation of distance 
between line segments can be found here:

http://dx.doi.org/10.1016/0020-0190(85)90032-8

I found that one via Wikipedia:
http://en.wikipedia.org/wiki/Proximity_problems

Good hunting!
Stephan


Am 08.03.2011 22:55, schrieb Darcy Webber:

Dear R helpers,

I think that this may be a bit of a math question as the more I
consider it, the harder it seems. I am trying to come up with a way to
work out the minimum distance between line segments. For instance,
consider 20 random line segments:

x1- runif(20)
y1- runif(20)
x2- runif(20)
y2- runif(20)

plot(x1, y1, type = n)
segments(x1, y1, x2, y2)

Inititally I thought the solution to this problem was to work out the
distance between midpoints (it quickly became apparent that this is
totally wrong when looking at the plot). So, I thought that perhaps
finding the minimum distance between each of the lines endpoints AND
their midpoints would be a good proxy for this, so I set up a loop
that uses pythagoras to work out these 9 distances and find the
minimum. But, this solution is obviously flawed as well (sometimes
lines actually intersect, sometimes the minimum distances are less
etc). Any help/dection on this one would be much appreciated.

Thanks in advance,
Darcy.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Anyone know a forum for stats advice?

2011-03-04 Thread Stephan Kolassa


Hi Clare,

you want to go here:
http://stats.stackexchange.com/questions

HTH
Stephan


Am 04.03.2011 12:08, schrieb Clare Embling:

Hi,

I know this forum is for R-related issues, but the question I have is a statistical 
question  I was wondering if anyone could recommend a good statistics forum 
where I can ask the question?  My question is relating to bootstrapping of binary 
data (ecology data) - I can give more detail, but wasn't sure I could address the 
question here as it is more statistical based than R based (though all the analysis 
is done in R).

Thanks in advance
Clare

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing decimal numbers

2011-02-01 Thread Stephan Kolassa


Hi,

this is R FAQ 7.31.
http://cran.r-project.org/doc/FAQ/R-FAQ.html

HTH,
Stephan


Am 01.02.2011 14:49, schrieb mlancee:


Hi,

I have a seemingly easy question that has been keeping be busy for quite a
while. The problem is the following:

0.1 + 0.1 + 0.1 == 0.3
[1] FALSE

Why is this false? Another example is

0.2 + 0.1 == 0.3
[1] FALSE

or

0.25 + 0.05 == 0.20 + 0.10
[1] FALSE

However, I do get TRUE if I use integers, or for example the following

0.1 + 0.1 == 0.2
[1] TRUE

It is probably something very basic, but I did not manage to find the
answer.

Thanks,
Michael


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Uniform Distribution

2010-09-08 Thread Stephan Kolassa


Hi Alex,

help.search(uniform)

HTH,
Stephan


Am 08.09.2010 15:36, schrieb Alaios:

Hello,
I would like to uniformly distribute values from 0 to 200. Can someone help me
find the appropriate uniform distribution generator?

I would like to thank you in advance for your help.


Best Regards
Alex




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] repeated measurements ANOVA

2010-09-07 Thread Stephan Kolassa


Hi Alex,

I'm slightly unclear as to why you would want to restructure your nice 
six-column data.frame (why six? One column for the data and four for the 
factors should make five, shouldn't it? I guess you have a subject ID in 
one column?) into some monstrosity which I assume you would fill with 
lots of indicator variables. R does all this for you, just do something like


library(nlme)
lme(response~factor1+factor2+factor3+factor4,random=~1|ID,data=dataset)

assuming that your data.frame is called dataset with column names 
response, factor1, ..., factor4 and ID (and that the above is the model 
you want). Take a look at the help page for lme() and the Orthodont data 
set, which is used as an example in the lme() help page. And next time, 
send along a snippet of your data.frame, that would help us help you.


HTH
Stephan


Am 07.09.2010 20:19, schrieb Walther, Alexander:

Dear list,

i am setting up a GLM for a repeated measurement ANOVA using the lm and
ANOVA function. my design contains four factors with 5, 5, 2 and 2 (=
14) levels, respectively. the data are stored in a data.frame with six
columns, one for the data themselves and the remainings for the factors
where strings indicate the factor levels in each row. now i would like
to restructure this data.frame using cbind which yields a 100 x 14
array. so far i only included two subjects in the analysis and the 100
rows emerge because each subject contributes 50 values. for the ANOVA
however, it seems to me that i should create a multi-dimensional array
where each dimension accounts for one specific factor and its levels. is
it possible to do this in R? if so, does the lm or ANOVA function
necessitates this type of array or is there yet another way to continue?


Best

Alex

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R time series analysis

2010-09-05 Thread Stephan Kolassa


Hi,

basically, you know 5 periods later. If you use a good error measure, 
that is.


I am a big believer in AIC for model selection. I believe that arima() 
also gives you the AIC of a fitted model, or try AIC(arima1).


Other ideas include keeping a holdout sample or some such.

I'd recommend looking at a time series textbook.

HTH,
Stephan


Am 05.09.2010 22:37, schrieb lord12:


How do you evaluate the predictive models? For example if I have:

arima1 = arima(training, order = c(1,1,1))
arima2 = arima(training, order = c(0,0,0))
x.fore = predict(arima1, n.ahead=5)
x.fore1 = predict(arima2, n.ahead = 5)

How do I know which arima model is better for prediction?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predict.lm, matrix in formula and newdata

2010-08-17 Thread Stephan Kolassa

Dear all,

I am stumped at what should be a painfully easy task: predicting from an lm 
object. A toy example would be this:

XX - matrix(runif(8),ncol=2)
yy - runif(4)
model - lm(yy~XX)
XX.pred - data.frame(matrix(runif(6),ncol=2))
colnames(XX.pred) - c(XX1,XX2)
predict(model,newdata=XX.pred)

I would have expected the last line to give me the predictions from the model 
based on the new data given in XX.pred... but all I get are in-sample fits 
along with a warning 'newdata' had 3 rows but variable(s) found have 4 rows. 
Why would predict.lm worry about the number of rows in the model matrix?

Unfortunately, ?predict.lm does not seem to be helpful, and neither RSiteSearch 
nor rseek.org have been useful. I'm sure that I am making an elementary error 
somewhere (am I misunderstanding the lm(yy~XX) part?) and would appreciate a 
gentle nudge in the right direction.

Thank you,
Stephan

-- 
GMX DSL SOMMER-SPECIAL: Surf  Phone Flat 16.000 für nur 19,99 ¿/mtl.!*

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with normality req. for ANOVA

2010-08-02 Thread Stephan Kolassa


Hi,

simulating would still require you to operationalize the lack of 
normality. Are the tails too heavy? Is the distribution skewed? Does it 
have multiple peaks? I suspect that the specific choices you would make 
here would *strongly* influence the result.


My condolences on the client you are facing.

Good luck,
Stephan (ex-BAH)



Bert Gunter wrote:


You could try sensitivity analyses via simulation, though.

On Mon, Aug 2, 2010 at 11:31 AM, wwreith reith_will...@bah.com
wrote:


The problem I have is that I am expected, by my client, to find a
similiar formula that states which way the p-value would be pushed
by a lack of normality.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with a problem

2010-07-18 Thread Stephan Kolassa


Hi all,

zoo::rollmean() is a nice idea. But if I understand Mike correctly, he 
wants 5 out of any 7 consecutive logicals to be TRUE, where these 5 do 
not necessarily need to be consecutive themselves. (remaining open 
question: could, e.g., the condition on c1 be TRUE for rows 1,2,3,4,5 
and on c2 for rows 3,4,5,6,7, or would it need to be TRUE for the same 
rows?). Then something like this would make sense:


any(rollmean(dat$c1=100,7)=5/7-.01  rollmean(dat$c2=8,7)=5/7-.01))

or

any(rollmean(dat$c1=100,7)=5/7-.01  dat$c2=8,7)=5/7-.01))

depending on the open question above.

The -.01 above may be necessary in light of FAQ 7.31.

HTH,
Stephan



Joshua Wiley schrieb:

Hi Michael,

The days in your example do not look continuous (at least from my
thinking), so you may have extra requirements in mind, but take a look
at this code.  My general thought was first to turn each column into a
logical vector (c1 = 100 and c2 = 8).  Taking advantage of the fact
that R treats TRUE as 1 and FALSE as 0, compute a rolling mean.  If
(and only if) 5 consecutive values are TRUE, the mean will be 1.  Next
I added the rolling means for each column, and then tested whether any
were 2 (i.e., 1 + 1).

Cheers,

Josh

###
#Load required package
library(zoo)

#Your data with ds converted to Date
#from dput()
dat -
structure(list(ds = structure(c(14702, 14729, 14730, 14731, 14732,
14733, 14734, 14735, 14736, 14737, 14738, 14739, 14740, 14741,
14742, 14743, 14744), class = Date), c1 = c(100L, 11141L, 3L,
7615L, 6910L, 5035L, 3007L, 4L, 8335L, 2897L, 6377L, 3177L, 7946L,
8705L, 9030L, 8682L, 8440L), c2 = c(0L, 15L, 16L, 14L, 17L, 3L,
15L, 14L, 17L, 13L, 17L, 17L, 15L, 0L, 16L, 16L, 1L)), .Names = c(ds,
c1, c2), row.names = c(NA, -17L), class = data.frame)

#Order by ds
dat - dat[order(dat$ds), ]

yourvar - 0

#Test that 5 consecutive values from c1 AND c2 meet requirements
if(any(
 c(rollmean(dat$c1 = 100, 5) + rollmean(dat$c2 = 8, 5)) == 2)
   ) {yourvar - 1}

###

On Sat, Jul 17, 2010 at 2:38 PM, Michael Hess mlh...@med.umich.edu wrote:

Sorry for not being clear.

In the dataset there are around 100 or so days of data (in the case also rows 
of data)

I need to make sure that the person meets that c1 is at least 100 AND c2 is at 
least 8 for 5 of 7 continuous days.

I will play with what I have and see if I can find out how to do this.

Thanks for the help!

Michael


Stephan Kolassa  07/17/10 4:50 PM 

Mike,

I am slightly unclear on what you want to do. Do you want to check rows
1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all*
rows you are looking at, and same for c2?

You can sort your data like this:
data - data[order(data$ds),]

Type ?order for help. But also do this for added enlightenment...:

library(fortunes)
fortune(dog)

Next, your analysis on the sorted data frame. As I said, I am not
entirely clear on what you are looking at, but the following may solve
your problem with choices 1 to 7 and any one above.

foo - 0
for ( ii in 1:(nrow(data)-8) ) {
  if (any(data$c1[ii+seq(0,6)]=100)  any(data$c2[ii+seq(0,6)]=8)) {
foo - 1
break
  }
}

The variable foo should contain what you want it to. Look at ?any
(and, if this does not do what you want it to, at ?all) for further info.

No doubt this could be vectorized, but I think the loop is clear enough.

Good luck!
Stephan



Michael Hess schrieb:

Hello R users,

I am a researcher at the University of Michigan looking for a solution to an R 
problem.  I have loaded my data in from a mysql database and it looks like this


data

   ds c1 c2
1  2010-04-03100   0
2  2010-04-30  11141  15
3  2010-05-01  3  16
4  2010-05-02   7615  14
5  2010-05-03   6910  17
6  2010-05-04   5035  3
7  2010-05-05   3007  15
8  2010-05-06   4  14
9  2010-05-07   8335  17
10 2010-05-08   2897  13
11 2010-05-09   6377  17
12 2010-05-10   3177  17
13 2010-05-11   7946  15
14 2010-05-12   8705  0
15 2010-05-13   9030  16
16 2010-05-14   8682  16
17 2010-05-15   8440  15


What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at least 
100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if not 
there 3,9until it loops over the entire file.   If it finds a set that 
matches, set a new variable equal to 1, if never finds a match, set it equal to 
0.

I have done this in stata but on this project we are trying to use R.  Is this 
something that can be done in R, if so, could someone point me in the correct 
direction.

Thanks,

Michael Hess
University of Michigan
Health System

**
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues

Re: [R] Help with a problem

2010-07-17 Thread Stephan Kolassa


Mike,

I am slightly unclear on what you want to do. Do you want to check rows 
1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all* 
rows you are looking at, and same for c2?


You can sort your data like this:
data - data[order(data$ds),]

Type ?order for help. But also do this for added enlightenment...:

library(fortunes)
fortune(dog)

Next, your analysis on the sorted data frame. As I said, I am not 
entirely clear on what you are looking at, but the following may solve 
your problem with choices 1 to 7 and any one above.


foo - 0
for ( ii in 1:(nrow(data)-8) ) {
  if (any(data$c1[ii+seq(0,6)]=100)  any(data$c2[ii+seq(0,6)]=8)) {
foo - 1
break
  }
}

The variable foo should contain what you want it to. Look at ?any 
(and, if this does not do what you want it to, at ?all) for further info.


No doubt this could be vectorized, but I think the loop is clear enough.

Good luck!
Stephan



Michael Hess schrieb:

Hello R users,

I am a researcher at the University of Michigan looking for a solution to an R 
problem.  I have loaded my data in from a mysql database and it looks like this


data

   ds c1 c2
1  2010-04-03100   0
2  2010-04-30  11141  15
3  2010-05-01  3  16
4  2010-05-02   7615  14
5  2010-05-03   6910  17
6  2010-05-04   5035  3
7  2010-05-05   3007  15
8  2010-05-06   4  14
9  2010-05-07   8335  17
10 2010-05-08   2897  13
11 2010-05-09   6377  17
12 2010-05-10   3177  17
13 2010-05-11   7946  15
14 2010-05-12   8705  0
15 2010-05-13   9030  16
16 2010-05-14   8682  16
17 2010-05-15   8440  15


What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at least 
100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if not 
there 3,9until it loops over the entire file.   If it finds a set that 
matches, set a new variable equal to 1, if never finds a match, set it equal to 
0.

I have done this in stata but on this project we are trying to use R.  Is this 
something that can be done in R, if so, could someone point me in the correct 
direction.

Thanks,

Michael Hess
University of Michigan
Health System

**
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can anybody help me understand AIC and BIC and devise a new metric?

2010-07-12 Thread Stephan Kolassa

Hi,

one comment: Claeskens and Hjort define AIC as 2*log L - 2*p for a model
with likelihood L and p parameters; consequently, they look for models
with *maximum* AIC in model selection and averaging. This differs from
the vast majority of authors (and R), who define AIC as -2*log L + 2*p
and search for the model with *minimum* AIC. Their definition of BIC is
similarly the negative of normal BIC.

I would compare this to defining \pi as the base of the natural
logarithm and e as the ratio of a circle's circumference to its
diameter: of course, you can do perfectly valid mathematics with your
own definitions, but it is a recipe for confusion.

Anyone who only reads Claeskens and Hjort, fires up R and selects the
model with the maximum AIC from the candidate models is in for some
*nasty* surprises.

Worse, as far as I see, Claeskens and Hjort nowhere mention that they
are using a definition that is diametrically opposed to what is
(overwhelmingly) common, and they do not comment on this.

However, Claeskens and Hjort managed to publish a book, which I have yet
to do, so it is quite possible that there is a major flaw in my
thinking. If so, I haven't found it yet, and I would be very grateful if
somebody pointed out what I misunderstand.

Otherwise, I would be *very* careful indeed about basing my analysis
strategy on their book, although the rest of the content is very helpful
indeed - you only need to remember where to switch signs and change
maximize to minimize etc.

For AIC and BIC novices, I would recommend going with Burnham
Anderson, which Kjetil cited below.

Best,
Stephan

Kjetil Halvorsen schrieb:

You should have a look at:

Model Selection and
Model Averaging
Gerda Claeskens
K.U. Leuven
Nils Lid Hjort
University of Oslo

Among other this will explain that AIC and BIC really aims at different goals.

On Mon, Jul 5, 2010 at 4:20 PM, Dennis Murphy djmu...@gmail.com wrote:

Hi:

On Mon, Jul 5, 2010 at 7:35 AM, LosemindL comtech@gmail.com wrote:

Hi all,

Could anybody please help me understand AIC and BIC and especially why do
they make sense?

Any good text that discusses model selection in detail will have some
discussion of
AIC and BIC. Frank Harrell's book 'Regression Modeling Strategies' comes
immediately
to mind, along with Hastie, Tibshirani and Friedman (Elements of Statistical
Learning)
and Burnham and Anderson's book (Model Selection and Multi-Model Inference),
but
there are many other worthy texts that cover the topic. The gist is that AIC
and BIC
penalize the log likelihood of a model by subtracting different functions of
its number
of parameters. David's suggestion of Wikipedia is also on target.

Furthermore, I am trying to devise a new metric related to the model
selection in the financial asset management industry.

As you know the industry uses Sharpe Ratio as the main performance
benchmark, which is the annualized mean of returns divided by the
annualized
standard deviation of returns.

I didn't know, but thank you for the information. Isn't this simply a
signal-to-noise
ratio quantified on an annual basis?

In model selection, we would like to choose a model that yields the highest
Sharpe Ratio.

However, the more parameters you use, the higher Sharpe Ratio you might
potentially get, and the higher risk that your model is overfitted.

I am trying to think of a AIC or BIC version of the Sharpe Ratio that
facilitates the model selection...

You might be able to make some progress if you can express the (penalized)
log likelihood as a function of the Sharpe ratio. But if you have several
years of
data in your model and the ratio is computed annually, then isn't it a
random
variable rather than a parameter? If so, it changes the nature of the
problem, no?
(Being unfamiliar with the Sharpe ratio, I fully recognize that I may be
completely
off-base in this suggestion, but I'll put it out there anyway :)

BTW, you might find the R-sig-finance list to be a more productive resource
in
this problem than R-help due to the specialized nature of the question.

HTH,
Dennis

Anybody could you please give me some pointers?

Thanks a lot!
--
View this message in context:
http://r.789695.n4.nabble.com/Can-anybody-help-me-understand-AIC-and-BIC-and-devise-a-new-metric-tp2278448p2278448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to define a function in R

2010-07-07 Thread Stephan Kolassa


Hi,

I recommend that you look at the following help pages and experiment a 
little (maybe create a toy directory with only three or four files with 
a few lines each):


?files
?dir
?grep
?strsplit

Good luck!
Stephan


jd6688 schrieb:

Here are what i am going to accomplish:

I have 400 files named as xxx.txt. the content of the file looks like the
following:

namecount

1. aaa 100
2. bbb2000
3. ccc300
4. ddd   3000


more that 1000 rows in each files.

these are the areas i need help:
1. how can i only read in the files with the string patterns ggg or fff as
part of the file names?
  for instance, I only need the file names with the ggg or fff in it
 x_ggg_y_1.txt
 _fff__xxx.txt

i don't need to read in the files, such as _aaa_.txt

2.how cam rename the files:

  for instance: x_ggg_y_1.txt==changed to ggg1a.txt


3.after the files read in, how can i only keep the rows with the aaa and
bbb, everything elses show be removed from the files, but the files still
remain the same file name?

   for instance, in the x_ggg_y_1.txt file, it shouls looks like:
 namecount

1. aaa100
2. bbb2000
3. aaa300
4. bbb400


Thanks so lot, I am very new to R, I am looking forward to any helps from
you.


On Tue, Jul 6, 2010 at 7:19 PM, Nordlund, Dan (DSHS/RDA) [via R] 
ml-node+2280308-38709657-312...@n4.nabble.comml-node%2b2280308-38709657-312...@n4.nabble.com

wrote:



  -Original Message-

From: [hidden 
email]http://user/SendEmail.jtp?type=nodenode=2280308i=0[mailto:

r-help-boun...@r-

project.org] On Behalf Of jd6688
Sent: Tuesday, July 06, 2010 3:49 PM
To: [hidden email]http://user/SendEmail.jtp?type=nodenode=2280308i=1
Subject: [R] how to define a function in R


1. how to write a R script?
2.How to write a SAS like macro/generic process to process multiple
files by
using the same funstion in R?

Thanks in advance

Don't thank me too soon. :-)  Your question is equivalent to me going to
SAS-L and asking someone to teach me SAS macro language.  You need to
provide more information about what your task actually is.  Where are these
file names that you want to process coming from?  What do you want to do
with them?

The basic approach would be to put the file names in a list and then pass
the list items one at a time to your function.  But how to do that really
depends on what you are trying to do.  You might also read the posting guide
listed at the bottom of every posting.

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
[hidden email] http://user/SendEmail.jtp?type=nodenode=2280308i=2mailing 
list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
 View message @
http://r.789695.n4.nabble.com/how-to-define-a-function-in-R-tp2280290p2280308.html
To unsubscribe from how to define a function in R, click here (link removed) 
=.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] process of stepwise selection

2010-06-30 Thread Stephan Kolassa


Hi Elaine,

in general, stepwise selection is a very bad idea:

Whittingham, M. J.; Stephens, P. A.; Bradbury, R. B.  Freckleton, R. P. 
Why do we still use stepwise modelling in ecology and behaviour? Journal 
of Animal Ecology, 2006, 75, 1182-1189


HTH
Stephan


elaine kuo schrieb:

Dear list,

I wanna select the significant variables relative to bird distribution,
using stepwise method.
However, the result is always the best-fit model.

Please kindly suggest if it is possible to show the selection process.
Thank you

Elaine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding Lower Envelope of Points on a Plot?

2010-06-30 Thread Stephan Kolassa


Hi,

one possibility would be to calculate the convex hull using chull(). I 
believe that the hull points are returned by chull() in a clockwise 
order (?), so the points between the rightmost and the leftmost point in 
the chull() result are the lower half of the convex hull. Remove these 
points from the original dataset (a variant of peeling convex hulls) 
and iterate until you have removed your prespecified percentage of 
points - these all lie outside of the final lower hull (though the 
percentage will, of course, only be approximated, but you should be able 
to modify this to taste).


HTH
Stephan


David Winsemius schrieb:


On Jun 30, 2010, at 2:05 PM, Asha Sharma wrote:


Hi,

I am looking for a way to find the lower envelope of points on a plot, 
preferably specifying what percentage of points should be allowed to 
lie outside the envelope. There must be a straightforward way to do 
this, but I do not seem to be able to find it. I would greatly 
appreciate any help.


You probably want something like the lower half of the convex hull. 
You should find quite a bit of code with your favorite r search engine 
on the topic of convex hull.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding Lower Envelope of Points on a Plot?

2010-06-30 Thread Stephan Kolassa


Hi,

To be honest, what you mean by a (linear) lower envelope becomes less 
and less clear to me. Take any cloud of 2-d points. The convex hull 
consists of a number of these points - or alternatively, of a number of 
line segments between successive ones of these points. Thus, every one 
of these segments defined by a pair of points on the bottom half of the 
convex hull would, if prolonged to infinity, define a line with all your 
points  in the half-space above the line, which sounds a lot like what 
you seem to look for - in which case, the lower envelope would not be 
well-defined.


Or does your cloud of points have two points A and B such that all 
points are to the right of A and to the left of B and above the line AB? 
In this case, you could call the segment AB the lower envelope in your 
sense, but finding such A and B (if they exist) is trivial: look for the 
leftmost point, if there are multiple ones of these, take the bottom 
one, that's A, and the same for B.


Life would be easier for us if you provided a small reproducible example 
and told us what you expect the result to be.


HTH
Stephan

Asha Sharma schrieb:

Hi,

Thanks to both of you for taking the time to answer my question. I was 
maybe not very clear in the way I framed my question. By plot, I meant 
an x-y plot with a cloud of points which should have a linear lower 
envelope. Is there a way to both plot as well as get the parameters of 
the lower envelope (intercept, slope, etc.) and to also set the 
percentage of points outside it?


Thanks!

Asha


Stephan Kolassa wrote:

Hi,

one possibility would be to calculate the convex hull using chull(). I 
believe that the hull points are returned by chull() in a clockwise 
order (?), so the points between the rightmost and the leftmost point 
in the chull() result are the lower half of the convex hull. Remove 
these points from the original dataset (a variant of peeling convex 
hulls) and iterate until you have removed your prespecified 
percentage of points - these all lie outside of the final lower hull 
(though the percentage will, of course, only be approximated, but you 
should be able to modify this to taste).


HTH
Stephan


David Winsemius schrieb:


On Jun 30, 2010, at 2:05 PM, Asha Sharma wrote:


Hi,

I am looking for a way to find the lower envelope of points on a 
plot, preferably specifying what percentage of points should be 
allowed to lie outside the envelope. There must be a straightforward 
way to do this, but I do not seem to be able to find it. I would 
greatly appreciate any help.


You probably want something like the lower half of the convex hull. 
You should find quite a bit of code with your favorite r search 
engine on the topic of convex hull.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What is wrong with this code?

2010-06-30 Thread Stephan Kolassa


Hi,

your problem is called string matching. Search for that term on 
rseek.org, there are a couple of functions and packages. And Wikipedia 
can tell you everything you ever wanted to know about string matching 
(and more).


HTH,
Stephan


Cable, Samuel B Civ USAF AFMC AFRL/RVBXI schrieb:



Anyway, I still wouldn't mind some advice on character
matching.
Thanks.


If so we need a reproducible example of what you are doing.


OK, let's say I have three strings.  Str1=abc.  Str2=abcd.
Str3=efgh.

I want to compare Str1 and Str2 in such a way that R detects that Str2
indeed contains Str1.  I want something that returns a TRUE value.
Then I want to compare Str1 and Str3 so that R detects that Str3 does
not contain Str1.  I don't want it to give me an error because it can't
find Str1.  I just want to get a FALSE value.  Is there a way to do
this that's less messy than what I came up with?  Thanks again.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exponential Smoothing: Forecast package

2010-06-29 Thread Stephan Kolassa


Hi Phani,

something like this looks promising:

#

library(forecast)
library(Mcomp)

MAPE.for.Holt - function (x,series,bignum=10e6) {
  foo - 
try(ets(series$x,model=AAN,damped=FALSE,alpha=x[1],beta=x[2],restrict=FALSE),silent=TRUE)

  if ( class(foo) == try-error ) {
bignum
  } else {
mean(abs(fitted(foo)-series$x)/series$x,na.rm=TRUE)
  }
}

bar - optim(par=c(.1,.1),fn=MAPE.for.Holt,series=M3[[1]])

#

At least it converges. However, I have had problems with parameters 
leaving the allowed space (that's what the try() and the bignum is for), 
and even with convergence, some unrealistically big smoothing constants 
resulted, which in turn were not very stable for varying starting 
parameters...


HTH,
Stephan


phani kishan schrieb:

Hey,
Thanks for the tip Stephan. But you could tell me how to pass the series to
the function calling ets?
Initially I planned to do it this way:

wrapper-function(x)
{
alpha-x[1]
beta-x[2]
ph-x[3]
series-x[4]
foofit-ets(series,model=AZZ,alpha=alpha,beta=beta,phi=phi,additive.only=T,opt.crit=c(mse))
accuracy(foofit)[5]  ##for MAPE
}

I then planned to use the optim using
optim(c(alpha,beta,phi,series),wrapper)

What I hoped to do is also select MAPE as a criteria for selection of my
alpha and beta.
However I shouldn't pass my series in this form right? As it would be
optimized too in the process? Could you suggest a way around this.
And I did find a way around could this allow me to set MAPE as a criteria?

Phani



On Tue, Jun 29, 2010 at 12:47 AM, Stephan Kolassa stephan.kola...@gmx.dewrote:


Hi Phani,

to get the best Holt's model, I would simply wrap a suitable function
calling ets() within optim() and optimize for alpha and beta - the values
given by ets() without constraints would probably be good starting values,
but you had better start the optimization with a variety of starting values
to make sure you don't end up in a local minimum.

I know of no comparison just between Holt and Brown - but you could use the
above methods and the M3 Competition dataset (in Mcomp) to look how the two
methods compare on a (more or less) benchmark dataset.

HTH
Stephan


phani kishan schrieb:

 Hey,

I am using the ets() function in the forecast package to find out the best
fit parameters for my time-series. I have about 50 sets of time series
data.

I'm currently using the function as follows:

ets(x,model=AZZ,opt.crit=mse)


As to my observation about 5-10 of them have been identified by ets to
have
a trend and an alpha, beta values have been thrown up - which have been
same
in all these cases. When I read up online it came up as a Brown's double
exponential smoothing as opposed to Holt's exponential smoothing (where
alpha and beta differ). I am guessing this is happening as AIC/AICc/BIC
select a model based on accuracy as well as a weight on number of
parameters
(1 in case of brown's, 2 in case of holt's). Now if I want to see results
of
the best parameters from the Holt's method, how should I go about it?

And is there any study comparing the accuracy of brown's double
exponential
model versus holt's exponential model?

Thanks in advance,
Phani







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exponential Smoothing: Forecast package

2010-06-28 Thread Stephan Kolassa


Hi Phani,

to get the best Holt's model, I would simply wrap a suitable function 
calling ets() within optim() and optimize for alpha and beta - the 
values given by ets() without constraints would probably be good 
starting values, but you had better start the optimization with a 
variety of starting values to make sure you don't end up in a local minimum.


I know of no comparison just between Holt and Brown - but you could use 
the above methods and the M3 Competition dataset (in Mcomp) to look how 
the two methods compare on a (more or less) benchmark dataset.


HTH
Stephan


phani kishan schrieb:

Hey,
I am using the ets() function in the forecast package to find out the best
fit parameters for my time-series. I have about 50 sets of time series data.

I'm currently using the function as follows:

ets(x,model=AZZ,opt.crit=mse)


As to my observation about 5-10 of them have been identified by ets to have
a trend and an alpha, beta values have been thrown up - which have been same
in all these cases. When I read up online it came up as a Brown's double
exponential smoothing as opposed to Holt's exponential smoothing (where
alpha and beta differ). I am guessing this is happening as AIC/AICc/BIC
select a model based on accuracy as well as a weight on number of parameters
(1 in case of brown's, 2 in case of holt's). Now if I want to see results of
the best parameters from the Holt's method, how should I go about it?

And is there any study comparing the accuracy of brown's double exponential
model versus holt's exponential model?

Thanks in advance,
Phani



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading file

2010-06-01 Thread Stephan Kolassa


Have you set the correct working directory?

?setwd
?getwd

HTH
Stephan


Robert Tsutakawa schrieb:
I am trying to read a source program into a mac pro laptop, which uses 
Snow Leopard.  R is unable to find the file containing my source 
program.  I'm using the function  source( file name).  I need some 
examples or detailed instructions.  I have no problem reading the file 
using PC.

Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create new variable: percentile value of variable in data frame

2010-05-28 Thread Stephan Kolassa


Hi Jon,

does the empirical cumulative distribution function do what you want?

dat$q.score - ecdf(dat$score)(dat$score)
?ecdf

HTH
Stephan


Jonathan Beard schrieb:

Hello all,

Thanks in advance for you attention.
I would like to generate a third value that represents the quantile
value of a variable in a data frame.


# generating data

x - as.matrix(seq(1:30))
y - as.matrix(rnorm(30, 20, 7))
tmp1 - cbind(x,y)
dat - as.data.frame(tmp1)
colnames(dat) - c(id, score)
dat

#  finding percentiles of score

qs - as.matrix(quantile(dat$score, type=3, probs = seq(0,1,.1)))
colnames(qs) - c( score)
qs

#  is there a way to put the quantile value for a value of 'score'
into a new variable,
#  such that the new data frame would have three variables: id, score
and q.score?

##  running R version 2.8.1 (2008-12-22) on Vista


Thanks so much!

-Jon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shapiro.test

2010-05-27 Thread Stephan Kolassa


Hi,

David Winsemius schrieb:
snip

This would imply that ozon is a list or dataframe.


snip


And you tried to give the whole list to a function that only wants a 
vector.


And whenever you suspect that your data types clash, try str() to find 
out just what kind of thing your data is. Here: str(ozon)


HTH,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding column to data frame conditionally

2010-05-27 Thread Stephan Kolassa


Hi Kristina,

Thierry's solution is certainly the correct one in terms of keeping 
within R's philosophy... but I personally find a series of conditional 
assignments easier to understand - see below for an example.


HTH,
Stephan

#
# Example

freqg - data.frame(mat=c(1,1,2,2),flank=c(1,2,1,2))

freqg$condition[freqg$mat==1  freqg$flank==1] - 1
freqg$condition[freqg$mat==2  freqg$flank==1] - 2

freqg
  mat flank condition
1   1 1 1
2   1 2NA
3   2 1 2
4   2 2NA

#

ONKELINX, Thierry schrieb:

Dear Kristina,

Use ifelse(). Note that you must use '==' to test for equality. '=' is
an assignment.

Freqg$condition - with(freqg, ifelse(mat==1  flank==1, 1, ifelse(mat
== 2  flank == 1, 2, NA)))

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  


-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] Namens Kristina Schmitz

Verzonden: donderdag 27 mei 2010 11:33
Aan: r-help@r-project.org
Onderwerp: [R] adding column to data frame conditionally

Dear all and thanks in advance for helping me with a rather 
stupid question:
I imported a data set (freqg) into R consisting of 14 
variables. Now a want to compute a variable and add it in an 
additional column to my data frame. The value of this new 
variable (condition) depends on the values of two other 
variables (mat and flank) already included in the data frame.

For example:
if mat=1 and flank=1 - condition=1
if mat=2 and flank=1 - condition=2
...

What I got is this code, which doesn't really work (it 
results in a new column called condition which takes only 
the value TRUE):


freqg-transform(freqg,condition=(mat=1)(flank=1))

Thanks in advance and kind regards!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.diag

2010-05-27 Thread Stephan Kolassa


Hi Helga,

did you load the boot library, which contains glm.diag(), by calling 
library(boot)?


HTH
Stephan


Helga margrete holmestad schrieb:
I have made a poisson regression by using glm. 
Then I want to check if I have over-dispersion in my model. I tried to use 
glm.diag(fit.1), but then R told me it couldn't find the function glm.diag. 


Does anybody know why? Or is there an other methode to check if there is 
over-dispersion in my model
 		 	   		  
_



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] From date to week day

2010-05-24 Thread Stephan Kolassa


Or:
weekdays(as.Date(2010-05-24))

HTH,
Stephan


Wu Gong schrieb:

?strptime will helps.



d - as.Date(01/05/2007,%m/%d/%Y)
format(d, %A, %b %d, %Y)

[1] Friday, Jan 05, 2007

-
A R learner.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function rep

2010-04-09 Thread Stephan Kolassa


Hi,

this is FAQ 7.31: pb and pr are floating-point numbers that are coerced 
to integer for rep(), and this does not always work the way you want.


HTH
Stephan


Covelli Paolo schrieb:

Hi,

I've got the following code:

p - 0.34
pb - p*100
pr - (1-p)*100

A - rep(0,pb)  # a vector with 34 zeros
B - rep(1,pr)   # a vector with 66 ones

Now if I type
length(A), R answer correctly 34

but if I type
length(B), R answer 65 instead of 66.

I don't understand why it happens. Can anyone help me? Thanks in advance.

Paolo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help in histogram

2010-04-07 Thread Stephan Kolassa


Hi,

first get the densities without plotting the histogram:

foo - hist(x, plot=FALSE)

then plot the histogram and feed the rounded densities, converted to 
character, to the labels argument (instead of just labels=TRUE):


hist(x, freq=F,  xlab='',ylab=Percent of Total, col=skyblue,
labels=as.character(round(foo$density,2)), right=FALSE,main=Position of 
Hypothetical Protein)


HTH,
Stephan


Changbin Du schrieb:

x- sample(1:14, 319, rep=T)
hist(x, freq=F,  xlab='',ylab=Percent of Total, col=skyblue,
labels=TRUE, right=FALSE,main=Position of Hypothetical Protein)

Is there is way to round the labels to 2 decimal digits, for example, 0.088
is changed to 0.09.

Thanks!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] interpretation of p values for highly correlated logistic analysis

2010-03-31 Thread Stephan Kolassa


Hi Claus,

welcome to the wonderful world of collinearity (or multicollinearity, as 
some call it)! You have a near linear relationship between some of your 
predictors, which can (and in your case does) lead to extreme parameter 
estimates, which in some cases almost cancel out (a coefficient of +/-40 
on a categorical variable in logistic regression is a lot, and the 
intercept and two of the roman parameter estimates almost cancel out) 
but which are rather unstable (hence your high p-values).


Belsley, Kuh and Welsch did some work on condition indices and variance 
decomposition proportions, and variance inflation factors are quite 
popular for diagnosing multicollinearity - google these terms for a bit, 
and enlightenment will surely follow.


What can you do? You should definitely think long and hard about your 
data. Should you be doing separate regressions for some factor levels? 
Should you drop a factor from the analysis? Should you do a categorical 
analogue of Principal Components Analysis on your data before the 
regression? I personally have never done this, but correspondence 
analysis has been recommended as a discrete alternative to PCA on this 
list, see a couple of books by M. J. Greenacre.


Best of luck!
Stephan


claus orourke schrieb:

Dear list,

I want to perform a logistic regression analysis with multiple
categorical predictors (i.e., a logit) on some data where there is a
very definite relationship between one predicator and the
response/independent variable. The problem I have is that in such a
case the p value goes very high (while I as a naive newbie would
expect it to crash towards 0).

I'll illustrate my problem with some toy data. Say I have the
following data as an input frame:

   roman animal colour
1  alphadog black
2   betacat white
3  alphadog black
4  alphacat black
5   betadog white
6  alphacat black
7  gammadog white
8  alphacat black
9  gammadog white
10  betacat white
11 alphadog black
12 alphacat black
13 gammadog white
14 alphacat black
15  betadog white
16  betacat black
17 alphacat black
18  betadog white

In this toy data you can see that roman:alpha and roman:beta are
pretty good predictors of colour

Let's say I perform logistic analysis directly on the raw data with
colour as a response variable:


options(contrasts=c(contr.treatment,contr.poly))
anal1 - glm(data$colour~data$roman+data$animal,family=binomial)


then I find that my P values for each individual level coefficient approach 1:

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)   -41.65   19609.49  -0.0020.998
data$romanbeta 42.35   19609.49   0.0020.998
data$romangamma43.74   31089.48   0.0010.999
data$animaldog 20.48   13866.00   0.0010.999

while I expect the p value for roman:beta to be quite low because it
is a good predictor of colour:white

On the other hand, if I then run an anova with a Chi-sq test on the
result model, I find as I would expect that 'roman' is a good
predictor of colour.


anova(anal1,test=Chisq)

Analysis of Deviance Table

Model: binomial, link: logit

Response: data$colour

Terms added sequentially (first to last)


Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL   1724.7306
data$roman   2  19.323915 5.4067 6.366e-05 ***
data$animal  1   1.587614 3.81910.2077
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Can anyone please explain why my p value is so high for the individual levels?

Sorry for what is likely a stupid question.

Claus

p.s., when I run logistic analysis on data that is more 'randomised'
everything comes out as I expect.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshaping data

2010-03-31 Thread Stephan Kolassa


Hi Kim,

look at the reshape() command with direction=wide. Or at the reshape 
package.


HTH,
Stephan


Kim Jung Hwa schrieb:

Hi All,

Can someone help me reshape following data:

Var1 Var2 Val
A X 1
A Y 2
A Z 3
B X 4
B Y 5
B Z 6

to some kind of matrix/tabular format (preferably as a matrix), may be like

Var1 X Y Z
A 1 2 3
B 4 5 6

Any help would be greatly appreciated,
Kim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ODD and EVEN numbers

2010-03-31 Thread Stephan Kolassa


(foo %% 2) == 0

See ?%%

HTH
Stephan


tj schrieb:

Hi,

anyone here who knows how to determine if an integer is odd or even in
R?
Thanks.


tj


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing data based on certain conditions

2010-03-31 Thread Stephan Kolassa


?by may also be helpful.

Stephan


Steve Murray schrieb:

Dear all,

I have a dataset of 1073 rows, the first 15 which look as follows:


data[1:15,]

date year month day rammday thmmday
1   3/8/1988 1988 3   81.430.94
2  3/15/1988 1988 3  152.860.66
3  3/22/1988 1988 3  225.063.43
4  3/29/1988 1988 3  29   18.76   10.93
5   4/5/1988 1988 4   54.492.70
6  4/12/1988 1988 4  128.574.59
7  4/16/1988 1988 4  16   31.18   22.18
8  4/19/1988 1988 4  19   19.67   12.33
9  4/26/1988 1988 4  263.141.79
10  5/3/1988 1988 5   3   11.516.33
11 5/10/1988 1988 5  105.642.89
12 5/17/1988 1988 5  17   37.46   20.89
13 5/24/1988 1988 5  249.869.81
14 5/31/1988 1988 5  31   13.008.63
15  6/7/1988 1988 6   70.430.00


I am looking for a way by which I can create monthly totals of rammday 
(rainfall in mm/day; column 5) by doing the following:

For each case where the month value and the year are the same (e.g. 3 and 1988, 
in the first four rows), find the mean of the the corresponding rammday values 
and then times by the number of days in that month (i.e. 31 in this case).

Note however that the number of month values in each case isn't always the same 
(e.g. in this subset of data, there are 4 values for month 3, 5 for month 4 and 
5 for month 5). Also the months will of course recycle for the following years, 
so it's not simply a case of finding a monthly total for *all* the 3s in the 
whole dataset, just those associated with each year in turn.

How would I go about doing this in R?

Any help will be gratefully received.

Many thanks,

Steve


 		 	   		  
_

We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding common an unique elements in character vectors

2010-03-29 Thread Stephan Kolassa


Hi Thomas,

%in% does the trick:

vector_1 - c(Belgium, Spain, Greece, Ireland, Luxembourg, 
Netherlands,Portugal)

vector_2 - c(Denmark, Luxembourg)
vector_1[!(vector_1 %in% vector_2)]

HTH,
Stephan


Thomas Jensen schrieb:

Dear R-list,

I have a problem which I think is quite basic, but so far google has not
helped me.

I have two vectors like this:

vector_1 - c(Belgium, Spain, Greece, Ireland, Luxembourg, Netherlands,
Portugal)

vector_2 - c(Denmark, Luxembourg)

I would like to find the elements in vector_1 that are not in vector_2

so that i get a vector with these countries: Belgium, Spain, Greece,
Ireland, Netherlands, Portugal.

Thanks a lot,

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating dataframe of all possible variable combinations

2010-03-25 Thread Stephan Kolassa


Hi Mike,

the following works for me:

SITE - ordered(c(101,102,103,104))
WDAY - 
ordered(c(MON,TUE,WED,THR,FRI),levels=c(MON,TUE,WED,THR,FRI))

TOD - ordered(c(MORN,AFTN),levels=c(MORN,AFTN))

foo - expand.grid(SITE=SITE,WDAY=WDAY,TOD=TOD)
foo[order(foo$SITE),]

If this doesn't solve your problem, perhaps you could give us a minimal 
code snippet that demonstrates what exactly you are doing without 
getting what you are looking for?


Bye,
Stephan


Hosack, Michael schrieb:

Hello,

I need to create a dataframe containing all possible combinations of
three variables: SITE (101,102,103,104), WDAY (MON,TUE,WED,THR,FRI),
and TOD (MORN, AFTN). There should be a total of 40 unique
combinations in my dataframe. I used expand.grid() successfully(?) to
create my dataframe, but then when I went to order it by SITE, the
resultant dataframe only contained four rows, one for each site.
There must be something about this function that I don't understand.
Any advice would be appreciated.

Thank you,

Mike

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Arima forecasting

2010-03-19 Thread Stephan Kolassa


Hi Matteo,

just use forecast.Arima() with h=2 to get forecasts up to 2 steps ahead. 
R will automatically use forecast.Arima() if you call forecast() with an 
Arima object.


library(forecast)
model - auto.arima(AirPassengers)
forecast(model,h=2)

HTH,
Stephan


Matteo Bertini schrieb:

Hello everyone,

I'm doing some benchmark comparing Arima [1] and SVR on time series data.

I'm using an out-of-sample one-step-ahead prediction from Arima using
the fitted method [2].

Do someone know how to have a two-steps-ahead forecast timeseries from Arima?


Thanks,
Matteo Bertini

[1] http://robjhyndman.com/software/forecast
[2] AirPassengers example on page 5

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] for() loop

2010-03-14 Thread Stephan Kolassa


Hi Martin,

it is slightly unclear to me what you are trying to achieve... are you 
trying to tabulate how often each value appears in datjan[,4]? Then 
table(datjan[,4]) may be what you want.


HTH
Stephan


Schmidt Martin schrieb:

Hello

I'm working with R since a few month and have still many trivial 
questions - I guess! Here is what I want:


I have this matrix:

dim(datjan)

[1] 899   4

The first 10 rows looks as follows:

datjan[1:10,]

 V1 V2 V3 V4
1  1961  1  1 24
2  1961  1  2 24
3  1961  1  3 24
4  1961  1  4 24
5  1961  1  5 24
6  1961  1  6 27
7  1961  1  7 27
8  1961  1  8 27
9  1961  1  9 27
10 1961  1 10 27

I tried now to create a for() loop, which gives me the sum of the 30 
different classes (1:30!) in [,4].


for(i in 1:30){
sum(datjan[,4]==i)
}

R is then actually calculating the sum of i which certainly doesn't 
exist and results in a 0 value


t1-sum(datjan[,4]==1)
t2-sum(datjan[,4]==2)
.until '30'
This way its working, but I won't find a end by doing all this by hand, 
because there are many other matrix waiting.


So, how can I make work that loop??

thanks for helping me

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is this a bug (or a feature) in hist(x)$density ??

2010-03-13 Thread Stephan Kolassa


Hi Tal,

basically, by summing over the (pointwise) density, you are 
approximating the integral over the density (which should be around 1) 
- but to really do a rectangular approximation, you will of course need 
to multiply each function value by the width of the corresponding 
rectangle. I'd recommend reading through Wikipedia's article on the 
Riemann integral, you may be enlightened.

http://en.wikipedia.org/wiki/Riemann_integral

HTH,
Stephan


Tal Galili schrieb:

Hi all,

A friend send me a question on why does this:

x-rpois(100,1)
sum( hist(x)$density )

Gives out 2

I tried this:

sum( hist(x, freq =T)$density )

It didn't help.

Then he came back with the following insight:

# with breaks
b-c(0,0.9,1:8)
sum(hist(x,breaks=b)$density) # Much more then 2
# but if we add weights according to the interval length
sum(hist(x,breaks=b)$density * diff(b))
# it works



What do you think ?


Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making multiple columns to a single column

2010-03-13 Thread Stephan Kolassa


Hi Hyo,

how about as.vector(ttx1) or as.vector(t(ttx1))?

HTH
Stephan


Hyo Lee schrieb:

Hi guys,

I have a very simple question.
I'm trying to make multiple columns to a single column.

For example,

*ttx1* is a 46*72 matrix.

so, I tried this.

*d1=ttx1[,1]
d2=ttx1[,2]
...
d72=ttx1[,72]*

now, d1, d2, ...,d72 become a 46*1 matrix.
And then.. I tried..

*dd=rbind(d1, d2, ..., d72)*

I thought *dd* should be 3312*1 matrix; but it becomes 72*46.
I really wanted to make it a  single column (3312*1).

Do you know what is wrong in this code? Or, do you have a better idea in
making multiple columns to a single column?

Thank you so much.

-Hyo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Randomly sampling subsets of dataframe variable

2010-03-12 Thread Stephan Kolassa


Hi Mike,

take an index vector that selects Monday and Tuesday out of each week, 
and then run a restricted random permutation on this vector which only 
permutes indices within each week. rperm() is in the sna package.


library(sna)
foo - rep(c(TRUE,TRUE,FALSE,FALSE,FALSE),26)
your.data[foo[rperm(rep(seq(1,26),each=5))],]

HTH,
Stephan


Hosack, Michael schrieb:

Fellow R users,

I am stumped on what would seem to be something fairly simple. 
I have a dataframe that has a variable named 'WEEK' that takes 
the numbers 1:26 (26 week time-period) with each number repeated 
five times consecutively (once for each weekday, Monday through 
Friday). Ex. 123.2626262626. I would like to
randomly extract two weekdays per five day week for each of 
26 weeks and store this data as a separate dataframe. I have
been unable to get the sample function to work properly. 
I have also tried using the runif function to assign random 
numbers to each row of my dataframe, sort the dataframe first 
by week number then by random number value, and finally select 
the first two elements from each week subset (26 weeks total,

giving 52 randomly selected values).  I can't figure out how
to select the first two elements. My goal is to randomly 
select two weekdays per week (without replacement) for each of 
26 consecutive weeks. Any advice would be greatly appreciated.


Thank you,

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shade area under curve

2010-03-09 Thread Stephan Kolassa


Hi,

use dnorm() for the density and polygon() to shade the area underneath, 
with suitably many x values so your density looks smooth.


HTH,
Stephan


claytonmccandless schrieb:

I want to shade the area under the curve of the standard normal density.
Specifically color to the left of -2 and on. How might i go about doing
this?

Thanks


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data.frame question

2010-03-07 Thread Stephan Kolassa


Hi,

data.frame(x=x,y=as.numeric(x%in%y))

HTH,
Stephan


joseph schrieb:

hello

can you show me how to create a data.frame from two factors x and y. column 1 
should be equal to x and column 2 is 1 if it is common to y and 0 if it is not.

x=factor(c(A,B,C,D,E,F,G))
y=factor(c(B,C,G))

the output should look like this:
A0
B1
C1
D0
E0
F0
G1

Thanks 
Joseph Dhahbi



  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Equation for model generated by auto.arima

2010-03-05 Thread Stephan Kolassa


Hi,

In that case, I'd recommend reading a good book on time series analysis. 
Forecasting: Methods and Applications by Makridakis, Wheelwright and 
Hyndman is very accessible. Alternatively, there are probably tons of 
webpages on ARIMA, so google around.


Best,
Stephan

testuser schrieb:

Thanks for the reply, Stephan. I don't want to use R to predict the future
value. I am looking to write the logic in a programming language like Java
to predict future values using the model coefficients generated by R. For
this, I would like to know what formula to use to estimate the value at any
time t. I looked at the forecast package but cannot find how to calculate
the value. Any help is appreciated.

Thanks


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Equation for model generated by auto.arima

2010-03-04 Thread Stephan Kolassa


Hi,

the help page for arima() suggests looking at predict.Arima(), so take a 
look at ?predict.Arima(). You will probably not use the coefficients, 
but just feed it the output from arima(). And take a look at 
auto.arima() in the forecast package.


HTH
Stephan


testuser schrieb:

I would like to know how to use the coefficients generated by the ARIMA model
to predict future values. What formula should be used with the coeffcients
to determine the future values.

Thanks


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting Graphs

2010-02-21 Thread Stephan Kolassa


Hi Karthik,

I think you will need to do something like

jpeg(histograms.jpg)
hist(rnorm(100))
dev.off()

HTH
Stephan

Karthik schrieb:

Hello Tal,
This is the code.



hist(rnorm(100))
jpeg(histogram.jpeg)

---

Even when I decrease the quality, I still have the same problem.



hist(rnorm(100))
jpeg(histogram.jpeg,quality=30)



Thank you for taking a look.
Karthik

On Sun, Feb 21, 2010 at 11:32 AM, Tal Galili tal.gal...@gmail.com wrote:

Hi Karthik,
Please give a sample code of what it is that you are doing that is causing this.
Also, have a look at:
?pdf
Or
?png

Cheers,
Tal

Contact 
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | 
www.r-statistics.com (English)
--




On Sun, Feb 21, 2010 at 9:27 PM, Karthik kwr...@gmail.com wrote:

I am a beginner to R, and am working on exporting graphs. Even when I
reduce the quality, it takes 30 or 40 minutes to export the graph.
Does anyone have suggestions on how to make it faster?

Thank you!
Karthik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GARCH AIC

2010-01-21 Thread Stephan Kolassa


Hi David,

str(g) gives you a ton of output, and the @fit slot has a $ics 
component, part of which has the promising name of AIC...


(g...@fit)$ics[1]

HTH,
Stephan


David Rubins schrieb:

Hi,

Is there anyway to extract the AIC and BIC from the summary statistics
in fGarch package?

g - garchfit(data)
summary(g) will display the AIC and BIC, but I can't find a way to
store those in a variable (in order to loop through different
parameters, investigating their effect).

Thank you,
David Rubins

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing dates in dataframes

2010-01-16 Thread Stephan Kolassa


Hi,

it looks like when you read in your data.frames, you didn't tell R to 
expect dates, so it treats the Date columns as factors. Judicious use of 
 something along these lines before doing your comparisons may help:


arr$Date - as.Date(as.character(arr$Date),format=something)

Then again, it may be possible to do the actual merging using merge().

HTH
Stephan


James Rome schrieb:

I have two data frames. One (arr) has all arrivals to an airport for a
year, and the other (gw) has the dates and quarter hour of the day when
the weather is good. arr has a Date and quarter hour column.


names(arr)
 [1] Date weekday  hour monthminute 
 [6] quarter  ICAO Flight   AircraftType
Tail   
[11] Arrived  STA  Runway   FromTo  
Delay  
[16] Operator gw 


I added the gw column to arr and initialized it to all FALSE


names(gw)
 [1] Date   minute hour   quarter  
 [5] Efficiency.Val Weekly.Avg Arrival.ValWeekly.Avg.1 
 [9] Departure.Val  Weekly.Avg.2   Num.of.HoldRunway   
[13] Weather 


First point of confusion:

gw[1,1]

[1] 1/1/09
353 Levels: 1/1/09 1/1/10 1/10/09 1/10/10 1/11/09 1/11/10 1/12/09 ... 9/9/09
Why do I get 353 levels?

I am trying to identify the quarter hours with good weather in the arr
data frame. What I want to do is to go through the rows in gw, and to
set arr$gw to TRUE if arr$Date and arr$quarter match those in the gw row.

So I tried
gooddates = function(all, good) {
   la = length(all)   # All the flights
  lw = length(good)  # The good 15-minute periods
  for(j in 1:lw) {
d=good$Date[j]
q=good$quarter[j]
all[all$DateTime==d  all$quarter==q,17]=TRUE
  }
}

but when I run this, I get
Error in Ops.factor(all$DateTime, d) :
  level sets of factors are different

I know the level sets are different, that is what I am trying to find.
But I think I am comparing single elements from the data frames.

So what am I doing wrong? And there ought to be a better way to do this.

Thanks in advance,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bootstrapping

2010-01-15 Thread Stephan Kolassa


Hi Aaron,

try the argument statistic=mean. Then boot() will give you the mean 
turn angle in your actual data (which appears to be 6 degrees, judging 
from what you write), as well as the means of the bootstrapped data. 
Then you can get (nonparametric) bootstrap CIs by 
quantile(boot$t,probs=c(.025,.975)). As far as I can see, there is 
really no need to look at sd().


A more interesting question would be how to deal with the fact that 
-180=+180, there may be something to think about here...


HTH,
Stephan


aaron.fo...@students.tamuk.edu schrieb:

Hi All,

 


I'm new to R so please bear with me.  I have a dataset with 337 turn angles 
ranging from -180 to 180 degrees.  I need to bootstrap (sample with replacement) 
1,000 times to create expected average turn angle with 95% CIs.  The code is 
pretty straightforward (-boot(data =, statistic = ,R =)) but I am unsure how 
to input my observed mean (6 degrees) and standard deviation (66 degrees) into the 
statistic component.   I realize there is a 'function' code but I can't seem to 
carry the results over to the 'boot' code.

 


Thanks,

Aaron M. Foley
PhD Candidate
Caesar Kleberg Wildlife Research Institute
Texas AM University - Kingsville
Cousins Hall, Room 201
Kingsville, TX 78363



 		 	   		  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] = returns wrong result? Why

2010-01-13 Thread Stephan Kolassa


Hi Trafim,

take a look at FAQ 7.31.

HTH
Stephan


Trafim Vanishek schrieb:

Dear all,

Does anybody know the probable reason why = gives false when it should give
true?
These two variables are of the same type, and everything works in the cycle
but then it stops when they are equal.

this is the output result

Rk[47] = RB[21]

[1] FALSE


Rk[47]

[1] 0.002842007


RB[21]

[1] 0.002842007

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Operating on each row of data frame

2010-01-13 Thread Stephan Kolassa


Hi,

does this do what you want?

d - cbind(d,apply(d[,c(2,3,4)],1,mean),apply(d[,c(2,3,4)],1,sd))

HTH,
Stephan


Abhishek Pratap schrieb:

Hi All

I have a data frame in which there are 4 columns .

Column 1 : name

Column 2-4 : values

I would like to calculate mean/Standard error  of values in column 2-4 and
store them in column 5,6 respectively.



I have done the following but doesn't seem to work

mean_N_SE -function(x)
{

name - x[1]
vals - c(x[2:4])
temp_mean - mean(vals)
SE -  sqrt(var(x)/length(x))

}

apply(d,1,mean_N_SE) where d = data frame.


Can someone help me with this.

Thanks!
-Abhi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading a file with mixed cyrillic/latin characters

2010-01-12 Thread Stephan Kolassa


Dear useRs,

I am trying to read a tab-delimited Unicode text file containing both 
latin and cyrillic characters and failing miserably. The file looks like 
this (I hope it comes across right):


A   B   C
3   foo ФОО
5   bar БАР

read.table(foo.txt,sep=\t,header=TRUE)

I am guessing that I can use the fileEncoding argument to read.table() 
to read this, but I can find no list of supported values of 
fileEncoding, and fileEncoding=Unicode gives an error.


The FAQ and the FAQ for Windows don't help. I have searched both the 
list archives and RSeek and am still seeking enlightenment. I am running 
R 2.10.1 on Windows XP, sessionInfo() below.


Cheers
Stephan


R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252 
LC_MONETARY=German_Germany.1252 LC_NUMERIC=C

[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Beginer data.frame

2010-01-12 Thread Stephan Kolassa


Hi Jean-Baptiste,

two points:

1) Your variable df is a *local* variable which you define in your 
function myfunc(), so it is not known outside myfunc(). When you ask 
is.data.frame(df), R looks at the global definition of df - which is the 
density function of the F distribution. To make your function run 
(especially interactively) will require a major rewrite.


2) Generally, using variable names that are already used as R objects 
(like df in your example) is a bad idea. For an example of the 
problems you can run into, see 1) above.


3) Loops are not the R way. Depending on what you want to do with the 
subset of your data.frame, you may want to do something like this:


x.df[substr(x.df$Code,1,1)==R,]

Look at ?substr to learn more - this function is vectorized, meaning 
that it takes a vector input and returns a vector output. Look at 
section 2.7 in An introduction to R.


Good luck!
Stephan


Jean-Baptiste Combes schrieb:

Hello,

I use R 2.10, and I am new in R (I used to use SAS and lately Stata), I am
using XP.

I have a data which has a data.frame format called x.df (read from a csv
file). I want to take from this data observations for which the variable
Code starts with an R. I took all the Code and put them into a vector
vec-grep(R[A-Z][A-Z],x.df$Code,value=TRUE)

Then I created a function that is supposed to take all the lines in the my
data x.df for which Code equals one value of vec. See the code below
where I created a loop to do that.


myfunc-function(data,var2,var1)

+ {
+ i=1
+ while (i632){
+ line-subset(data,var2==var1[i])
+ if (i==1){
+ df-line
+ df-data.frame(df)
+ }
+ else {
+ line-data.frame(line)
+ df-rbind(df,line)
+ }
+ i-i+1
+ }
+ fix(df)
+ }

The results of my program higly depend on the few last lines of the program.
If I put fix(df), as above, the function opens a window with my data and
it seems a sensible results (I have not checked in details but I barely have
what I am suppose to get).

myfunc-function(data,var2,var1)

...
+ }
+ df-data.frame(df)
+ print(is.data.frame(df))
+ }

myfunc(x.df,x.df$Code,vec)

[1] TRUE

print(is.data.frame(df))

[1] FALSE

In the case above I ask whether or not the df is a data.frame and the
answer is true, when the program has ended, I ask again and the answer is
false.

Could anyone tell me what to do to get this data and could anyone tell me
why those differences in the results?


as.data.frame(df)

Erreur dans as.data.frame.default(df) :
  impossible de convertir automatiquement la classe  function en un
tableau de données (data.frame)

[[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coerce vector into array - change filling sequence

2010-01-12 Thread Stephan Kolassa


Hi,

you can permute array dimensions using aperm():

x - 1 : 24
z - array(x, dim=c(6,2,2))
y - aperm(z,perm=c(3,2,1))
y[1,1,]

HTH,
Stephan


Kohleth Chia schrieb:

Dear all,

When I coerce a vector into a multi dimensional array, I would like R to start 
filling the array along the last dimension, then the 2nd last etc.
Let's jump straight into an example.

x - 1 : 24
y - array(dim=c(2,2,6))

I would like to have:
y[1,1,1] = 1
y[1,1,2] = 2
...
y[1,1,6] = 6
y[1,2,1] = 7
y[1,2,2] = 8
...
y[2,1,1] = 13
...
y[2,2,1] = 19

if I do y- array(x, dim=c(2,2,6)), i think I will get
y[1,1,1] = 1
y[2,1,1] = 2
(or something not I want) instead.

Of course, I need a fast solution, as I am actually dealing with array of much 
larger size.
Any input will be appreciated
Thanks a lot
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question regarding if statement in while loop

2009-12-26 Thread Stephan Kolassa


Hi Stephanie,

it sounds like R's exception handling may help, something like this:

foo - try(eblest(i, dir5, sterr5, weight5, aux5))
if ( class(foo) == try-error ) next

Take a look at ?try.

HTH,
Stephan


Stephanie Coffey schrieb:

Hi all,
I'm running R version 2.9.2 on a PC.

I'm having a problem with a loop, and have tried using an if statement
within to fix it, but to no avail.
Any advice would be appreciated.

Here is my code:
*
eblest - function(i,dir, sterr, weight, aux) {
n - nrow(dir)
Y - as.matrix(dir[,i], ncol=1)
sigma2ei - as.matrix(sterr[,i]^2, ncol=1)
w - as.matrix((weight[,3])*(sigma2ei), ncol=1)
X - as.matrix(subset(aux, select=c(3,5:7,9:10,13)))
a - EBLUP.area(Y,cbind(w,1),sigma2ei,n)#The EBLUP.area
function is a function already in R.
}
  # It gives a bunch of output, some of what I need.

#THIS IS THE LOOP I'M HAVING A PROBLEM WITH:
results - data.frame(length=nrow(dir5))
i - 3
while (i =some number) {
eblest(i, dir5, sterr5, weight5, aux5)
out - cbind(i, a$EBLUP, a$mse)
results - cbind(results, out)
i - i+1
}
***
I have tried running the eblest function for a specific set of input
(i, dir5, etc as in the function) This function runs ok.
However, sometimes eblest does not create the expected output,
sometimes due to the solution being singular or other reasons.
I'm not so concerned about that at this point - it's a function of the data.

My problem is that, as you can see, after eblest runs, the out
pieces of information are bound together in the results matrix.
When the eblest does not run correctly, the loop stops.  For example,
when i=3 and i=4, eblest runs ok, and I get a maxtrix with the
following columns:

iEBLUP se.EBLUP i EBLUP se.EBLUP
3   xy4abh

...but when it loops back around to start i=5, I get an error because
the solution is singular.  But I don't want the loop to stop (I have
~100 of these i's for which I need to execute this function.

So I would like to set an if condition that will cause the loop to
step ahead (in this case, to 6), and continue looping...Something like
if exists(out)=FALSE next, else...  However, when I tried that,
the results matrix was empty created at all.

In case it helps, out has a numeric mode, and is a matrix...

I've written these functions and loops using online help and examples
I've found on websites, but clearly I'm missing something.

Thanks for any help you can give!!

~Stephanie Coffey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] boot() with an array-valued statistic?

2009-12-25 Thread Stephan Kolassa


Dear guRus,

is there a version of boot() that deals with array data and array-valued 
statistics? For example:


foo - array(rnorm(120),dim=c(3,5,8))
means - apply(foo, MARGIN=c(2,3), FUN=mean)

means contains the means over the first dimension of foo: a 5x8 array. 
Now I would like to bootstrap this array, perhaps with something along 
the lines of some array.boot(), with an additional MARGIN parameter:


means.boot - array.boot(foo, statistic=mean, MARGIN=c(2,3), R=1000)

I would like means.boot to contain (e.g., in analogy to boot(), in a 
component $t) an array of dimensions 1000x5x8, containing  componentwise 
means of sampled slices of foo, e.g., to get confidence intervals like this:


apply(means.boot$t,MARGIN=c(2,3),FUN=quantile,probs=c(.975,.025))

I have been playing around with plyr and various flavors of apply, and 
searching only yielded lots of hits for boot.array(), which is something 
completely different...


Any help would be greatly appreciated!

Cheers
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting

2009-12-21 Thread Stephan Kolassa


Hi teo,

try lines() instead of points().

HTH
Stephan

teo schrieb:

Hi:
Could you please guide me how to plot weekly.t2. Below are my values and I
am only able to plot weekly t1 but failing to add a second line
representng weekly t2.
Thanks.



weekly.t1
 [1] 228.5204 326.1224 387.2449 415.4082 443.6735 454.6939 451.5306 475.1020
483.2653 483.0612 488.9362 488.3673 493.4694 515.3061 509.7959
[16] 501.0204 513.5714 516.5306 509.4898 516.4286 509.7959 498.6735 492.9592
473.6735 487.7551 482.9293

weekly.t2

 [1]  71.57143  95.41667 110.11905 120.65476 124.3 130.11905 132.44048
137.38095 141.60714 135.35714 131.89873 126.7 125.47619
[14] 133.57143 134.34524 137.38095 134.04762 144.40476 148.27381 147.08333
146.42857 137.5 127.14286 127.20238 130.41667 127.67857


plot(c(1:26), weekly.t1, type=l, xlab=Week since treatment start, ylab=
Daily Treatment Dose, main=RIOTT doses, col=blue, axes = FALSE)
points(c(1:26), weekly.t2, type=l, col=red)
axis(side=1, at= c(1:26), as.character(c(1:26)))
axis(side=2, at = seq(from=70, to=700, by=10))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating permutations that always include one specific element

2009-12-19 Thread Stephan Kolassa


Hi Ray,

First possibility: just select those combinations that contain AL:
combos.with.AL - possible.combos[rowSums(possible.combos==AL)0,]

Second possibility: create all 3-combos *without* AL:
bands.without.AL - c(B, DB, DG, G, K, LB, LG, MG, O, 
P, PI, PK, PU, R, V, W, Y)

possible.combos.without.AL - permutations(17, 3, bands.without.AL)

Then insert AL in the first, second, third and fourth position of this 
matrix:

foo - cbind(AL,possible.combos.without.AL)
combos.with.AL - 
rbind(foo,foo[,c(2,1,3,4)],foo[,c(2,3,1,4)],foo[,c(2,3,4,1)])


The first one is easier to understand but requires you to first build 
the big object possible combos, most of which you discard. May be a 
problem in larger instances.


HTH,
Stephan


Raymond Danner schrieb:

Dear R community,

I am trying to create a matrix of permutations of a vector:
bands - c(AL, B, DB, DG, G, K, LB, LG, MG, O, P,
PI, PK, PU, R, V, W, Y)

Each permutation must be 4 characters long. permutations() from the gtools
package does this easy enough:
possible.combos - permutations(18, 4, bands)

However, ³AL² must be one of the elements in each permutation.
Any ideas?

Thanks in advance,
Ray 



[[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] integer(0) and NA do not equal FALSE

2009-12-19 Thread Stephan Kolassa


Hi Jonathan,

grep() returns  a vector giving either the indices of the elements of 
'x' that yielded a match or, if 'value' is 'TRUE', the matched elements 
of 'x' (quoting from the help page, see ?grep).


So you probably want to test whether this vector is empty or not - in 
other words, whether it has length zero or not:


if( length(grep(hi, hop, fixed = TRUE))  0 )
   print('yes, your substring is in your string') else
   print('no, your substring is not in your string')

(And you can remove the fixed = TRUE if you are only interested in 
whether the expression matches or not.)


One off-topic point: you want your else at the end of the second line, 
not the beginning of the third. R evaluates line by line, and when it 
gets to the end of the second line and doesn't see your else, it has 
no way of knowing that the if is not yet finished. I've found that 
liberal use of curly braces makes life much easier.


HTH,
Stephan



Jonathan schrieb:

Hi,
   A noobie question:  I'm simply trying to run a conditional statement that
evaluates if a substring is found within a larger string.  I find that if it
IS found, my function returns TRUE (great!), but if not, the condition does
not evaluate to FALSE.

ex):

if( grep(hi, hop, fixed = TRUE) )
  print('yes, your substring is in your string')
else print('no, your substring is not in your string')

alternatively, I could replace grep with pmatch:

if (pmatch('hi','hop'))
  print('yes, your substring is in your string')
else print('no, your substring is not in your string')


The first example, using grep, returns logical(0).  The second, using
pmatch, returns NA.  Any idea how to convert either of those to FALSE, or
else a different function that would do the trick?

Thanks,
Jon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] forecasting

2009-12-17 Thread Stephan Kolassa


?predict

HTH
Stephan

DispersionMap schrieb:


I have some data that i ran a regression on and got the usual r output, with
the a intercept and b coefficient for my independent variable.

i want to forecast the number of future events using these parameters.


What function / packages in R would you recommend to be used in good effect
for these purposes??


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Literature analysis

2009-12-11 Thread Stephan Kolassa


Hi,

from what I understand, you may be interested in text mining, so perhaps 
you want to look at the tm package.


Then again, depending on what you are really trying to do, you may be 
better served with perl, awk and similar tools than with R...


HTH,
Stephan


Schwan schrieb:
Dear all, 


i am new in R. I am writing a review paper about batteries. However, i
am interested in analyzing all the papers by keywords, author,
references and year.
This could be done by refviz a software, which is only running on
windows machines and which is not free.

So my question to you is, is it somehow possible to write a script that
can do all of this work?

And if yes, with what i should start?

Thanks a lot in advance,

Schwan 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Packages Crack the 3,000 Mark!

2009-11-25 Thread Stephan Kolassa


Hi Bob,

Muenchen, Robert A (Bob) wrote:


Does anyone have a program that graphs the growth of R packages? I don't
know if that historical data is around.


John Fox had a slide on this in his useR 2008 talk The Social 
Organization of the R Project (page 7), with package counts up to March 
2008. As Source of Data he gave https://svn.r-project.org/R/branches/. 
I've been digging around in there but really have no idea how he found 
the relevant data there.


I'd be quite interested in this, too, so if you find out anything please 
drop me a line...


Best,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] modeling and forecasting commodity time series?

2009-10-29 Thread Stephan Kolassa


Hi Luna,

you may want to look at the IIF website, http://www.forecasters.org

They have a mailing list for forecasters - you may get more of a 
response there than on a dedicated R list.


HTH,
Stephan


Luna Moon schrieb:

Hi all,

Could anybody please shed some lights on me about good books/literature
about modeling and forecasting financial time series in the commodity space?


Thanks so much!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] exclude data for boxplot stats using mathematical operator

2009-10-26 Thread Stephan Kolassa


Try this:

boxplot.stats(x[x[,2]0,2],do.conf=FALSE)

HTH,
Stephan

e-letter schrieb:

Readers,

I have a data set as follows:

1,1
2,2
3,3
4,4
5,3
6,2
7,-10
8,-9
9,-3
10,2
11,3
12,4
13,5
14,4
15,3
16,2
17,1

I entered this data set using the command 'read.csv'. I want to
exclude values fewer than zero in column 2 so then I tried the
following command:

boxplot.stats(x[0,2],do.conf=FALSE)
Error: syntax error, unexpected GT, expecting ',' in boxplot.stats(x[

Any help please?

Yours,

rhelp at conference.jabber.org
r 251

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] State Space models in R

2009-09-12 Thread Stephan Kolassa


Hi,

Rob Hyndman's forecast package does exponential smoothing forecasting 
based on state space models (and lots of other stuff, ARIMA et al.). 
It's not exactly the companion package to his book, but it comes close.


The book's (Forecasting with Exponential Smoothing - The State Space 
Approach) webpage is here:

http://www.exponentialsmoothing.net/

HTH,
Stephan



Giovanni Petris schrieb:

Hello everybody,

I am writing a review paper about State Space models in R, and I would
like to cover as many packages as I reasonably can. 


So far I am familiar with the following tools to deal with SS models:

* StructTS, Kalman* (in stats)
* packages dse[1-2] 
* package sspir

* package dlm

I would like to have some input from users who work with SS models:
are there any other packages for SS models that I am missing?, which
package do you use and why?, what do you think are advantages/
disadvantages of the package you use?

Of course I do have my own preferences (biased, of course) and
opinions about the different packages, but I would also like to
summarize in my paper the feedback I get from the R community.

Thank you in advance.

Best,
Giovanni Petris



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2: deterministic position_jitter geom_line with position_jitter

2009-09-12 Thread Stephan Kolassa


Dear guRus,

I am starting to work with the ggplot2 package and have two very dumb 
questions:


1) deterministic position_jitter - the jittering is stochastic; is there 
any way to get a deterministic jittering? For instance:


example.data - 
data.frame(group=c(foo,bar,foo,bar,foo,bar),x=c(1,1,2,2,3,3),y=c(1,1,0,2,1,1))

set.seed(2009)
qplot(x,y,data=example.data,shape=group,position=position_jitter(w=0.1,h=0))

For x=1, the foo point is to the left of the bar point, and for x=3 the 
other way around. I would like to have all foo points at 
seq(1,3)-epsilon and all bar points at seq(1,3)+epsilon. Do I need to 
manually modify example.data$x groupwise for this?


2) geom_line with position_jitter - when I call multiple geoms with 
position_jitter, each geom gets its own jittering. For example 
(continuing with example.data above):


set.seed(2009)
qplot(x,y,data=example.data,geom=c(point,line),shape=group,position=position_jitter(w=0.1,h=0))

The lines do not connect the points - is there any way to have the 
geom_line connect all the foo points on the one hand and all the bar 
points on the other hand?


--

What I've done: searched through HW's book, googled, searched RSeek. For 
point 2) above, I tried using multiple layers and resetting the seed in 
between, to wit:


pp - ggplot(example.data,aes(x,y,shape=group))
set.seed(2009)
pp - pp+layer(geom=point,position=position_jitter(w=0.1,h=0))
set.seed(2009)
pp - pp+layer(geom=line,position=position_jitter(w=0.1,h=0))
print(pp)

This doesn't do what I want, either...

Why I'm doing this: example.data actually contains group means across a 
covariate x, and they need to be plotted as dots plus error bars 
(psychologists' convention), so use boxplots instead is a perfectly 
correct reply to my questions which unfortunately does not help me.


Thanks for your time!
Stephan

--

sessionInfo():

R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
 [1] grid  grDevices datasets  splines   graphics  stats tcltk 
utils methods   base


other attached packages:
 [1] ggplot2_0.8.3   reshape_0.8.3   plyr_0.1.9  proto_0.3-8 
svSocket_0.9-43 svMisc_0.9-48   TinnR_1.0.3 R2HTML_1.59-1 
Hmisc_3.7-0

[10] survival_2.35-4

loaded via a namespace (and not attached):
[1] cluster_1.12.0  lattice_0.17-25 tools_2.9.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Normalize data set

2009-09-03 Thread Stephan Kolassa


Are you looking for reshape()?

HTH,
Stephan


Edward Chen schrieb:

Hi all,

I have a mxn matrix that consists of 28077 rows of features and 30 columns
of samples. I want to normalize each row for the samples for each feature.
I have tried normalize and scale functions but they don't seem to work out
the way I want to.

Thank you




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting the graph of density with an unknown distribution

2009-08-22 Thread Stephan Kolassa


Hi,

Try kernel smoothing via the density() function. And take a look at ecdf().

HTH,
Stephan


sendona essile schrieb:

How can I plot the graph of a density of a sample with an unknown distribution? 
I can provide any sample size which is required. I want to have a smooth 
density graph of my data.


  New Email addresses available on Yahoo!
Get the Email name you#39;ve always wanted on the new @ymail and @rocketmail. 
Hurry before someone else does!


[[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a string up

2009-08-21 Thread Stephan Kolassa


strsplit(1041281__2009_08_20_.lev, split=_)[[1]][1]

HTH,
Stephan

stephen sefick schrieb:

x - 1041281__2009_08_20_.lev

I would like to split this string up and only extract the leading numbers.

1041281

to use as a label for a data column in a bigger for loop function to
read in data.
regards,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Several simple but hard tasks to do with R

2009-08-20 Thread Stephan Kolassa


Hi Rakknar,

I believe that the menu command File - Save to File (or some such, I 
use the German version) in the R GUI for Windows (I'm unclear on your 
OS) has not yet been suggested. This writes a file containing the entire 
R session. Does this create the kind of log you are looking for?


HTH,
Stephan


Rakknar schrieb:

This is why sink() has the split=TRUE option, which logs output to a
connection and sends it to the screen.
The R
sink(analysis.log,split=TRUE)
is almost identical to the Stata
log using analysis.log

with sink() to close the log and flush to disk at the end of the session.

 I've already used that and it doesn't work. It only register the output,
not the inputs.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why are there small circles in my plot

2009-08-19 Thread Stephan Kolassa


Hi,

Mao Jianfeng schrieb:

plot(c(min(myda$traits),max(myda$traits)),c(-0.03,0.5), xlab='State',
ylab='ylab')


Here, you are plotting two data points: (min(myda$traits),-0.03) and 
(max(myda$traits),0.5). Try this:


 plot(c(min(myda$traits),max(myda$traits)),c(-0.03,0.5), xlab='State',
 ylab='ylab',type='n')

HTH,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Collinearity in Linear Multiple Regression

2009-07-22 Thread Stephan Kolassa


Hi Alex,

I personally have had more success with the (more complicated) 
collinearity diagnostics proposed by Belsley, Kuh  Welsch in their book 
Regression Diagnostics than with Variance Inflation Factors. See also:


Belsley, D. A. A Guide to using the collinearity diagnostics. 
Computational Economics, 1991, 4, 33-50


However, I know of no R package that implements these diagnostics. 
Anyway, it's not hard to do so oneself.


Good luck!
Stephan


Alex Roy schrieb:

Dear all,
  How can I test for collinearity in the predictor data set
for multiple linear regression.

Thanks

Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Collinearity in Linear Multiple Regression

2009-07-22 Thread Stephan Kolassa


Hi Tim,

Variance proportions (and condition indices) are exactly the tools
described in Belsley, Kuh  Welsch, Regression Diagnostics - see my 
previous post. Good to see I'm not the only one to use them! BKW also 
describe in detail how to calculate all this using SVD, so you don't 
need to use SAS...


And I certainly agree that a problematic system means that you need to 
do more work - probably either collect more data or refine your research 
agenda, as collinearity may just be inherent in the independent 
variables you have been collecting.


Best,
Stephan


Tim Paysen schrieb:

Actually, the CI index and VIF are just a start.  It is best to look
at what they call a matrix of variance proportions (found in SAS
and a few other places...)--which hardly anyone understands
(including the SAS folks).  It is a matrix of estimates of what the
variences of the regression coefficients would be if you could figure
them out in the first place.  It shows which factors dominate over
others IN THE PARTICULAR SETUP you are analyzing.  The matrix is
often calculated using eigenvalues, but is best done with Singular
Value Decomposition techniques (you don't have to have a square
matrix, and you maintain better precision).  Analysts will say that
it can display an unstable system -- which is correct, but they
generally say that, if its true, you have bad data and should throw
it out--or collect more.  I suggest care, because it may be
illustrating the nature of the system you are studying.

The only decent reference that I know of is a little book (hard to
read) that I can't remember off the top of my head.  Have to look it
up.

Timothy E. Paysen, Phd Research Forester (ret.)




 From: John Sorkin
jsor...@grecc.umaryland.edu To: Alex Roy alexroy2...@gmail.com;
r-help@r-project.org Sent: Tuesday, July 21, 2009 4:19:11 AM Subject:
Re: [R] Collinearity in Linear Multiple Regression

I suggest you start by doing some reading about Condition index (CI)
and variation inflation factor (VIF). Once you have reviewed the
theory, a search of search.r-project.org (under the help menu in a
windows-based R installation) for VIF will help you obtain values for
VIF, c.f. http://finzi.psych.upenn.edu/R/library/HH/html/vif.html 
John


John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics 
University of Maryland School of Medicine Division of Gerontology 
Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) 
Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913

(Please call phone number above prior to faxing)


Alex Roy alexroy2...@gmail.com 7/21/2009 7:01 AM 
Dear all, How can I test for collinearity in the predictor data set 
for multiple linear regression.


Thanks

Alex

[[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.


Confidentiality Statement: This email message, including any
attachments, is for th...{{dropped:11}}






__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] quadratic programming

2009-05-04 Thread Stephan Kolassa


Hi,

The CRAN Task View on Optimization may help:
http://stat.ethz.ch/CRAN/web/views/Optimization.html

HTH,
Stephan


barbara.r...@uniroma1.it schrieb:

Devo risolvere un problema di minimo vincolato con vincoli di uguaglianza e un 
altro con vincoli di uguaglianza e disuguaglianza.
Cosa posso utilizzare?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deleting rows or cols that do not meet cut off

2009-04-21 Thread Stephan Kolassa


f[rowSums(f=1)0,colSums(f=1)0]

Judging from your result, you want less than or equal to 1.

HTH,
Stephan


Crosby, Jacy R schrieb:

How can I delete both rows and columns that do not meet a particular cut off 
value.
Example:

d - rbind(c(0,1,6,4),

+ c(2,5,   7,5),
+ c(3,6,1,6),
+ c(4,4,4,4))

f - as.matrix(d)
f

 [,1] [,2] [,3] [,4]
[1,]0164
[2,]2575
[3,]3616
[4,]4444

I would like to delete all rows and columns that do not contain at least one 
element with a value less than 1. So I'd end up with:


f

 [,1] [,2] [,3]
[1,]016
 [3,]361

Note: 1 is an arbitrary cut-off value.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deleting rows or cols that do not meet cut off

2009-04-21 Thread Stephan Kolassa


Hi Wacek,

Wacek Kusnierczyk schrieb:

... or gain a bit on performance by doing the threshold comparison on
the whole matrix just once at once:

dd = d = 1
d[apply(d, 1, any), apply(d, 2, any)]


d[apply(dd, 1, any), apply(dd, 2, any)]

Or not?

Cheers,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug when subtracting decimals?

2009-04-20 Thread Stephan Kolassa


Hi,

Gavin Simpson wrote:


I bemoan the apparent inability of those asking such questions to use 
the resources provided to solve these problems for themselves...


Looking at all the people who quite obviously do NOT read the posting 
guide and provide commented, minimal, self-contained, reproducible 
code, I wonder whether the mailing list could be configured to reply to 
each new mail (not replies in a thread) with an automated mail like this:


Have you read the posting guide and the FAQs? If you do not get a reply 
within two days, you may want to look at both and think about 
reformulating your query. Oh, and while you are at it, look through the 
archives, a lot of questions have already been asked and answered before.


Just a thought,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Citing R

2009-04-15 Thread Stephan Kolassa


citation()

HTH,
Stephan

Tom Backer Johnsen schrieb:
What is the correct citation to R in BibTeX format?  I have looked in 
the R pages but so far without any luck.


Tom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time series forecasting

2009-04-05 Thread Stephan Kolassa

Hi Perry,

my impression after a very cursory glance: this looks like noise.
Perhaps you should think a little more about your series - what kind of
seasonality there could be (is this weekly data? or monthly?), whether
the peaks and troughs could be due to some kind of external driver,
whether you really have count data, that kind of thing.

Until then, there is little else to do than to use a very simple method,
e.g. forecast the last observation (random walk) or the mean of the
observations (white noise), or the median. All of these benchmarks can
be surprisingly hard to beat.

If you have seasonality but no external influence, you could look at
smoothing methods, they are nice to interpret and usually perform very
well. I'd recommend Hyndman et al., Forecasting with Exponential
Smoothing: The State Space Approach and the accompanying forecast R
package, mainly with the ets() function.

You could also look at arima(). I fitted an ARIMA model to your data,
and as expected, it returned a simple mean (not that I would recommend
blindly fitting ARIMA to just any data):

Call:
arima(x = foo[, 2])

Coefficients:
intercept
7.2333
s.e. 0.8009

sigma^2 estimated as 19.25: log likelihood = -86.93, aic = 177.85

And for count data, you could use some variants of ARIMA, e.g., INAR.

HTH,
Stephan

pg...@gol.com schrieb:

Dear all:

I'm a newbie and an amateur seeking help with forecasting the next in a non-stationary time series, with constraints of 1 (low) and 27 (high) applicable to all.

What I need help with is the solution concept. The series has 439 observations
as of last week. I'd like to analyze obs 1 - 30 (which are historical and
therefore invariate), to solve for 31.

The history:
Obs 12
21
31
416
59
66
77
811
911
101
1112
1214
1313
142
154
165
1714
186
194
207
215
228
237
2415
2511
263
274
286
298
304
31?? (a known)

For backtesting of forecasting accuracy, I can use either a sliding window ( 1
- 30 to solve for 31, 2 - 31 to solve for 32, 3 - 32 to solve for 33, etc.) OR
a cumulative window (1 - 30 to solve for 31, 1 - 31 to solve for 32, 1 - 32 to
solve for 33, etc.), whichever works better. I can also supply different
windows if deemed appropriate, e.g., 50 or 75 or 100 obs, whatever, in either
configuration.

The 30 obs window is selected for this list query only so as not to take up too
much message space.

Query: How would you solve for ob 31 in the above series, with the constraints stated? (If you need a longer history, say, 50 obs or more, I can supply it off-list.) I've tried all the relevant Excel functions with no success, and suspect that the solution lies in some combination of them. Here I defer to the collective wisdom of you all.

Once the correct concept is established, I can proceed to set it up in R for
this and other similar series.

Many TIA and regards,
Perry E. Gary
Tokyo
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate natural cubic spline in R?

2009-03-31 Thread Stephan Kolassa


Hi,

if you are looking for *natural* cubic splines (linear beyond the outer 
knots), you could use rcs() in Frank Harrell's Design package.


HTH,
Stephan


David Winsemius schrieb:

If one enters:

??spline

... You get quite a few matches. The one in the stats functions that 
probably answers your specific questions is:


splinefun {stats}  R Documentation
Interpolating Splines   Description
Perform cubic (or Hermite) spline interpolation of given data points, 
returning either a list of points obtained by the interpolation or a 
function performing the interpolation.


splinefun returns a function with formal arguments x and deriv, the 
latter defaulting to zero. This function can be used to evaluate the 
interpolating cubic spline (deriv=0), or its derivatives (deriv=1,2,3) 
at the points x, where the spline function interpolates the data points 
originally specified. This is often more useful than spline.


Perhaps you need to review from you basic intro material regarding
help.search(text) # or
??text# possibilities.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does R support double-exponential smoothing?

2009-03-31 Thread Stephan Kolassa


Hi,

ets() in Hyndman's forecast package allows you to specify which one of 
the many smoothing variants (additive/multiplicative season, damped 
trend, additive/multiplicative errors) you want.


HTH,
Stephan


minben schrieb:

I want to use double-exponential smoothing to forecast time series
datas,but I couldn't find it in the document,does R support this
method?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] model selection using ANOVA

2009-03-31 Thread Stephan Kolassa


Hi Alina,

your approach sounds problematic - you can always get a smaller RSS if 
you add terms to your model, so your approach will always go for larger 
models, and you will end up overfitting. Consider information criteria, 
e.g., AIC or BIC, which penalize larger models. References for AIC are 
Burnham  Anderson; other people prefer BIC.


Then you can do something like

models - list()
AICs - rep(NA, n)
models[[1]] - lm(...); AICs[1] - AIC(model[[1]])
...
models[[n]] - lm(...); AICs[n] - AIC(model[[n]])
which.min(AICs)

depending on your specific needs.

HTH,
Stephan


Alina Sheyman schrieb:

I've created a number of models using lm and now want to pick one with the
smallest standard error or the smallest RSS,
I can get a list of RSS using anova function, but is the any way I can then
select one with the smallest RSS from the list?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate natural cubic spline in R?

2009-03-31 Thread Stephan Kolassa


Hi David,

David Winsemius schrieb:
The splinefun documentation indicates that natural is one of the types 
of cubic spline options available.


That sounds good, didn't know that... rcs() has the advantage of coming 
with a book (Harrell's Regression Modeling Strategies).


Does rcs actually do fitting? Such would not be my expectation on 
reading the documentation and I do not see any examples of such 
functionality in the help pages.


Nope, but you can include rcs() within fitting functions, 
lm(foo~rcs(bar,3)), which makes more sense to me than having a spline 
function fit... Looks like better encapsulation to me.


Best,
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] add variable in for loop

2009-03-28 Thread Stephan Kolassa


Hi Paul,

you do *not* want to do this, it takes too long and may lead to rounding 
errors. Vectorize everything, e.g., use sum(meanrotation). And look into 
?apply, and google for the R Inferno.


And no, there is no +=...

Good luck!
Stephan

pgseye schrieb:

Hi,

I'm learning to write some basic functions in R. For some data I have I'd
like to be able to add a variable to itself after each iteration in a for
loop to obtain a grandtotal for that variable so I can calculate a mean.

test-function(data){
for (i in 1:80){

meanrotation-(abs(data[i,3]-data[i,2])+abs(data[i,4]-data[i,2])+abs(data[i,5]-data[i,2])+abs(data[i,6]-data[i,2]))/4
cat(i,meanrotation,\n)
#total+=meanrotation
}
#print (total/80)
}

In perl there's an assignment operator variable+=variable2. Is there
anything like this in R to do as illustrated in the code above.

thanks a lot,

Paul

Edit - I guess the other way to do this which I just realised is to assign
the output of the function to a vector and then do a summary(), but I don't
know how to do this either - help is appreciated


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text matching and substitution

2009-03-28 Thread Stephan Kolassa


Hi Simeon,

?gsub

HTH,
Stephan

simeon duckworth schrieb:

I am trying to simplify a text variable by matching and replacing it with a
string in another vector

so for example in
colours - paste(letters,colours(),stuff,LETTERS)

find and replace with (red,blue,green,gray,yellow,other)  -
irrespective of case

its a large dataset, so i'd like to be able to do this as efficiently as
possible.

thanks for any help

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text matching and substitution

2009-03-28 Thread Stephan Kolassa


Hi Simeon,

I'm slightly unclear on what exactly you are trying to achieve... Are 
you trying to replace every entry of colours which *contains* red by 
red, dropping the rest of the entry? And same with blue?


A short example before  after would be helpful...

Best,
Stephan


simeon duckworth schrieb:

thanks stephan.  i'd been trying to make gsub work, but couldnt make it
replace the whole expression.  so i'd resorted to trying to loop with grep -
but with two problems.   firstly, i cant seem to make the loop 'remember'
the substitutions it makes (see below).  secondly, it feels like this is a
really inefficient way of doing something quite simple anyhow.

colours - as.character(paste(letters,colours(),stuff,LETTERS))
target - c(red,blue,green,gray)
new.colour -colours
for (i in length(target)) {
x - target[i]
new.colour[grep((x),new.colour)] - x
return(new.colour)
}




On Sat, Mar 28, 2009 at 9:45 AM, Stephan Kolassa stephan.kola...@gmx.dewrote:


Hi Simeon,

?gsub

HTH,
Stephan

simeon duckworth schrieb:


I am trying to simplify a text variable by matching and replacing it with
a
string in another vector

so for example in
colours - paste(letters,colours(),stuff,LETTERS)

find and replace with (red,blue,green,gray,yellow,other)  -
irrespective of case

its a large dataset, so i'd like to be able to do this as efficiently as
possible.

thanks for any help

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text matching and substitution

2009-03-28 Thread Stephan Kolassa


Hi Simeon,

I played around a little with Vectorize and mapply, but I couldn't make 
it work :-( So, my best guess would be a simple loop like this:


result - as.character(paste(letters,colours(),stuff,LETTERS))
target - c(red,blue,green,gray)
for ( new.color in target ) { result[grep(new.color,result)] - new.color }

Best of luck,
Stephan


simeon duckworth schrieb:

stephan

sorry for not being clear - but thats exactly what i want.

i'd like to replace every complex string that contains red with just
red, and then so on with blue, yellow etc

my data is of the form

x xx xx x  red xx xxx xx
xx xxx xxx xx  blue xx xx xx xx x
x xx  xx xx xx  red
red xx xx xx xx xx
xx xx xx xx xx xx
xx x x x x 

which i'd like to replace with
red
blue
red
other
other

thanks


On Sat, Mar 28, 2009 at 2:38 PM, Stephan Kolassa stephan.kola...@gmx.dewrote:


Hi Simeon,

I'm slightly unclear on what exactly you are trying to achieve... Are you
trying to replace every entry of colours which *contains* red by red,
dropping the rest of the entry? And same with blue?

A short example before  after would be helpful...

Best,
Stephan


simeon duckworth schrieb:

 thanks stephan.  i'd been trying to make gsub work, but couldnt make it

replace the whole expression.  so i'd resorted to trying to loop with grep
-
but with two problems.   firstly, i cant seem to make the loop 'remember'
the substitutions it makes (see below).  secondly, it feels like this is a
really inefficient way of doing something quite simple anyhow.

colours - as.character(paste(letters,colours(),stuff,LETTERS))
target - c(red,blue,green,gray)
new.colour -colours
for (i in length(target)) {
   x - target[i]
   new.colour[grep((x),new.colour)] - x
   return(new.colour)
   }




On Sat, Mar 28, 2009 at 9:45 AM, Stephan Kolassa stephan.kola...@gmx.de

wrote:

 Hi Simeon,

?gsub

HTH,
Stephan

simeon duckworth schrieb:

 I am trying to simplify a text variable by matching and replacing it

with
a
string in another vector

so for example in
colours - paste(letters,colours(),stuff,LETTERS)

find and replace with (red,blue,green,gray,yellow,other)  -
irrespective of case

its a large dataset, so i'd like to be able to do this as efficiently as
possible.

thanks for any help

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ApEn (Approximate Entropy), Total Corr, Information Interaction

2009-03-26 Thread Stephan Kolassa


Hi Vishal,

re 1]: Ben Bolker very kindly shared an R reimplementation of Kaplan's 
Matlab code a little while ago:

http://www.nabble.com/Approximate-Entropy--to21144062.html#a21149402

Best wishes
Stephan


Vishal Belsare schrieb:

Is there any existing implementation in R/S of :

1] Pincus  Kalman's approximate entropy (ApEn) measure
2] Total Correlation / Multiinformation
3] Information Interaction

A search doesn't quite reveal anything, but I'd be keen to not
reinvent in case someone has worked on it. Many thanks in
anticipation.


Best,

Vishal Belsare

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Stephan Kolassa


Hi Juliet,

Juliet Hannah schrieb:


One simple thing to try would be to form categories



Simple but problematic. Frank Harrell put together a wonderful page 
detailing all the issues with categorizing continuous data:

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous

So: keep your data continuous.

Apart from that, I would second John's recommendation to try to get 
samples at the same point in time (and, if it is cortisol, stay away 
from smokers etc.).


Best wishes
Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Stephan Kolassa


Hi Juliet,

Juliet Hannah schrieb:


I should have emphasized, I do not intend to categorize -- mainly
because of all the discussions I have seen on R-help arguing against
this.


Sorry that we all jumped on this ;-)


I just thought it would be problematic to include the variable by
itself. Take other variables, such as a genotype or BMI. If we measure
this variable the next day, it would be the same. However, a hormone's
level would not be the same. I thought this error must be accounted
for somehow.


You are quite correct that fluctuating hormone levels are a problem 
(although, strictly speaking, measuring BMI and even genotyping will not 
yield exactly the same results the next day, measurement error is always 
present). And there may be methods dealing with this, but I don't know 
of any.


If you have any idea about the variability of your hormone, you could 
always take your data, perturb the hormone levels and run the analysis 
again to get a feeling for the stability of your results. This is quite 
ad hoc, but if I were the reviewer, a perturbation analysis like this 
would greatly reassure me. However, I recently worked with hormones and 
had exactly your problem, and we couldn't find any published data on 
day-to-day variability, so this was not an option - we finally went 
ahead and simply plugged the measurements into R.


Good luck!
Stephan



Thanks again!

Regards,

Juliet

On Sat, Mar 7, 2009 at 1:21 PM, Jonathan Baron ba...@psych.upenn.edu wrote:

If you form categories, you add even more error, specifically, the
variation in the distance between each number and the category
boundary.

What's wrong with just including it in the regression?

Yes, the measure X1 will account for less variance than the underlying
variable of real interest (T1, each individual's mean, perhaps), but
X1 could still be useful in two ways.  One, it might be a significant
predictor of the dependent variable Y despite the error.  Two, it
might increase the sensitivity of the model to other predictors (X2,
X3...) by accounting for what would otherwise be error.

What you cannot conclude in this case (when you measure a predictor
with error) is that the effect of (say) X2 is not accounted for by its
correlation with T1.  Some people try to conclude this when X2 remains
a significant predictor of Y when X1 is included in the model.  The
trouble is that X1 is an error-prone measure of T1, so the full effect
of T1 is not removed by inclusion of X1.

Jon

On 03/07/09 12:49, Juliet Hannah wrote:

Hi, This is not an R question, but I've seen opinions given on non R
topics, so I wanted
to give it a try. :)

How would one treat a variable that was measured once, but is known to
fluctuate a lot?
For example, I want to include a hormone in my regression as an
explanatory variable. However, this
hormone varies in its levels throughout a day. Nevertheless, its levels differ
substantially between individuals so that there is information there to use.

One simple thing to try would be to form categories, but I assume
there are better ways to handle this. Has anyone worked with such data, or could
anyone suggest some keywords that may be helpful in searching for this
topic. Thanks
for your input.

Regards,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to comment in R

2009-02-11 Thread Stephan Kolassa


Hi Mihai,

one (very bad style) way would be

if (FALSE) {
  comment
  comment
  comment
}

But putting a # in front of every line is easier to spot in the code.

HTH,
Stephan


mihai.mira...@bafin.de schrieb:

Hi everybody,

I use for the moment # at the begining of each line for comments.

Is there any possibility to comment more than one line, like something which 
shows the beggingng and the end of the comment? Or is there a possibility to 
comment only a part of a line?

Thanks,

Mihai


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R system command on windows

2009-02-05 Thread Stephan Kolassa


?shell

HTH,
Stephan

Aurelie Labbe, Dr. schrieb:

Hi,
 
I am trying to use the R command system  under windows (XP). If I try the simple command system(mkdir toto) to create a directory toto, it tells me that it cannot find the command mkdir...

Does anybody knows how  it works ? Is it a path problem ? Maybe the answer is 
simple: I am a R user under Unix/Linux and I don't know well R under windows.
 
Thanks !

Aurelie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset problem (reducing from six to two levels)

2009-02-05 Thread Stephan Kolassa


Hi,

does this help?

http://www.nabble.com/factor-question-to18638814.html#a18638814

HTH,
Stephan


Ine schrieb:

Hi all,
I have got a seemingly simple problem (I am an R starter) with subsetting my
data set, but cannot figure out the solution: I want to subset a data set
from six to two levels, so that all analyses are done only with these two
remaining levels.
I tried

TOTAL-read.delim('total.csv',header=T)
SUBSET.OF.TOTAL-subset(TOTAL, FactorX %in% c(Level1,Level2))
attach(SUBSET.OF.TOTAL)

but R does not eliminate the remaining levels of FactorX, just assigns 'not
available' to the data. Like this, the other levels still show up in plots
etc., but without data entries. Anybody got a solution how to subset the
data so that I eliminate the other levels completely?

Thanks a lot for the help,



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 148 matches

Mail list logo