Re: [R] Recursive partitioning algorithms in R vs. alia

2009-06-20 Thread Tobias Verbeke

Wensui Liu wrote:


well, how difficult to code random forest with sas macro + proc split?
if you are lack of sas programming skill, then you are correct that
you have to wait for 8 years :-)


It is true one can use the macro language to obtain some control flow 
the plain SAS language and its PROCs are missing and for manipulating 
matrices there is even a third language (IML), but my customers prefer 
to leverage community-tested open source implementations as building 
blocks rather than spending unnecessary resources in writing things from 
scratch in their corner.



i don't know how much sas experience you have. as far as i know, both
bagging and boosting have been implemented in sas em for a while,
together with other cut-edge modeling tools such as svm / nnet.


Fair enough, but whenever you will need ensemble methods for survival 
data or would like to escape bias in variable importance in presence
of categorical predictors you will (1) not be able to take something off 
the shelf and (2) neither to programmatically tweak SAS EM procedures

(as they are not exposed but locked in the GUI), so there again your
only option is to implement things from scratch.

Best,
Tobias


On Fri, Jun 19, 2009 at 4:18 PM, Tobias
Verbeketobias.verb...@openanalytics.be wrote:

Wensui Liu wrote:


in terms of the richness of features and ability to handle large
data(which is normal in bank), SAS EM should be on top of others.

Should be ? That is not at all my experience.
SAS EM is very much lagging behind current
research. You will find variants of random forests
in R that will not be in SAS for the next 8 years,
to give just one example.


however, it is not cheap.
in terms of algorithm, split procedure in sas em can do
chaid/cart/c4.5, if i remember correctly.

These are techniques of the 80s and 90s
(which proves my point). CART is in rpart and
an implementation of C4.5 can be accessed
through RWeka. For the oldest one (CHAID, 1980),
there might be an implementation soon:

http://r-forge.r-project.org/projects/chaid/

but again there have been quite some improvements
in the last decade as well:

http://cran.r-project.org/web/views/MachineLearning.html

HTH,
Tobias


On Fri, Jun 19, 2009 at 2:35 PM, Carlos J. Gil
Bellostac...@datanalytics.com wrote:

Dear R-helpers,

I had a conversation with a guy working in a business intelligence
department at a major Spanish bank. They rely on recursive partitioning
methods to rank customers according to certain criteria.

They use both SAS EM and Salford Systems' CART. I have used package R
part in the past, but I could not provide any kind of feature comparison
or the like as I have no access to any installation of the first two
proprietary products.

Has anybody experience with them? Is there any public benchmark
available? Is there any very good --although solely technical-- reason
to pay hefty software licences? How would the algorithms implemented in
rpart compare to those in SAS and/or CART?

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.












__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Leave One Out Cross Validation

2009-06-20 Thread muddz

Hi Uwe,

My apologies.

Please if I can be guided what I am doing wrong in the code. I started my
code as such:

#ypred is my leave one out estimator of x
cvhfunc-function(y,x,h){
ypred-0
for (i in 1:n){
for (j in 1:n){
if (j!=i){
ypred-ypred+(y[i]*k((x[j]-x[i])/h))/k((x[j]-x[i])/h)
}
}}
ypred

#CVh is a 
cvh-0
cvh-cvh+((1/n)*(y-ypred)^2
cvh
}
test2-cvhfunc(ydat,xdat,.2);test2

#I was experimenting with the following data:
library(datasets)
data(faithful)
ydat-faithful$eruptions;ydat;plot(ydat);par(new=T)
xdat-faithful$waiting;xdat;plot(xdat,col=blue)

# I want to minimize the CV function with respect to h. Thanks.




Uwe Ligges-3 wrote:
 
 See the posting guide:
 If you provide commented, minimal, self-contained, reproducible code 
 some people may be willing to help on the list.
 
 Best,
 Uwe Ligges
 
 
 muddz wrote:
 Hi All,
 
 I have been trying to get this LOO-Cross Validation method to work on R
 for
 the past 3 weeks but have had no luck. I am hoping someone would kindly
 help
 me. 
 
 Essentially I want to code the LOO-Cross Validation for the 'Local
 Constant'
 and 'Local Linear' constant estimators. I want to find optimal h,
 bandwidth.
 
 Thank you very much!
 -M
 

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting Cumulative Hazard Functions with Strata

2009-06-20 Thread j0645073

Hello:

So i've fit a hazard function to a set of data using
kmfit-survfit(Surv(int, event)~factor(cohort))

this factor variable, cohort has four levels so naturally the strata
variable has 4 values. 

I can use this data to estimate the hazard rate
haz-n.event/n.risk

and calculate the cumulative hazard function by
H--log(haz)

Now, I would like to plot this cumulative hazard function by time and
differentiate the curves by strata. For the life of me I cannot seem to do
it.

Any help would be greatly appreciated.

Regards,

j
-- 
View this message in context: 
http://www.nabble.com/Plotting-Cumulative-Hazard-Functions-with-Strata-tp24121979p24121979.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Maximum number of tables combined with rbind

2009-06-20 Thread gug

Hello,

I have been using read.table to read data files into R, then rbind to
combine them into one table.  The column headings are all identical, so it
should be straightforward, and it seems to be working well so far.

My question is: What is the maximum number of tables that can be combined
with rbind?

Is it driven by the number of rows in the tables, by a constraint with the
syntax of rbind itself, by my PC's memory, or by some other constraint?  At
this point the data files I am reading in are approx 2,500 rows x 150
columns.

Thanks,

Guy Green
-- 
View this message in context: 
http://www.nabble.com/Maximum-number-of-tables-combined-with-rbind-tp24122126p24122126.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting Data by Row

2009-06-20 Thread Jim Lemon

Ross Culloch wrote:

Hello fellow R users!

I wonder if someone can help with what i think should be a simple question
but i can't seem to find the answer or work it out.

My data set is as such:

Day Time ID Behaviour
1  9 A1 2
1  10A2 3
..  ....   ..
4  10   A1 10
4  11   A2  1
..  ....  ..
30 1B1 14
30 2C3 4

So basically i have data for several days, for several times, for several
IDs and for several Behaviours

What i want to do is get an activity budget for ID from these data, e.g:

data - tapply(Behaviour,list(ID,Behaviour),length) 


This will give me a count of the number of times an ID does a certain
behaviour for all the data, which is great - but i want to work out seasonal
and diurnal activity budgets too, therefore i need to break the data down
not just by ID but by day and time, too - I've searched on here and found
nothing i could adapt to my data - it may be that i can't see quite how the
code would work and i've overlooked something of importance!

If anyone can point me in the right direction i'd be most grateful!

  

Hi Ross,
The brkdn function in the prettyR package does this, and I am currently 
working on a more efficient version of this that will drive an improved 
hierobarp function in the plotrix package.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Leave One Out Cross Validation

2009-06-20 Thread David Winsemius


On Jun 19, 2009, at 7:45 PM, muddz wrote:



Hi Uwe,

My apologies.

Please if I can be guided what I am doing wrong in the code. I  
started my

code as such:

# ypred is my leave one out estimator of x


Estimator of x? Really?


cvhfunc-function(y,x,h){
ypred-0
for (i in 1:n){   #how did you define n? It's not in your  
parameter list

   for (j in 1:n){
   if (j!=i){
ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/

  k( (x[j]-x[i])/h )

At this point you are also using k as a function. Have you at any  
point defined k?


Also multiplication of the ratio of two identical numbers would  
generally give a result of y[i] for that second term. Unless, of  
course, any x[j] = x[i] in which case you will throw an error and  
every thing will grind to a halt.


It might help if you now explained what you think CV should be.


}
  }   }
ypred # not sure what that will do inside the function. If  
it's there for debugging you may want to print(ypred)


#CVh is a


# Yes? CVh is a what ?


 cvh-0
 cvh-cvh+((1/n)*(y-ypred)^2# n again. R will still not know  
what that is.

 cvh


# ypred is a scalar, while y is a vector, so cvh will be a vector. Is  
that what you want?



 }



test2-cvhfunc(ydat,xdat,.2);test2

#I was experimenting with the following data:
library(datasets)
data(faithful)
ydat-faithful$eruptions;ydat;plot(ydat);par(new=T)
xdat-faithful$waiting;xdat;plot(xdat,col=blue)

# I want to minimize the CV function with respect to h. Thanks.




Uwe Ligges-3 wrote:


See the posting guide:
If you provide commented, minimal, self-contained, reproducible code
some people may be willing to help on the list.

Best,
Uwe Ligges


muddz wrote:

Hi All,

I have been trying to get this LOO-Cross Validation method to work  
on R

for
the past 3 weeks but have had no luck. I am hoping someone would  
kindly

help
me.

Essentially I want to code the LOO-Cross Validation for the 'Local
Constant'
and 'Local Linear' constant estimators. I want to find optimal h,
bandwidth.

Thank you very much!
-M




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
View this message in context: 
http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] modifying sound package function plot.Sample

2009-06-20 Thread rajesh j
Hi,
I'm trying to modify this function.I want to remove the existing xaxis(the
tick marks and the values below each tick) and make it dynamic so that i can
choose whether i want the xaxis at the top or bottom but i cant seem to
change that.can somebody help me?

plot.Sample - function(x,xlab=NULL,ylab=NULL,...){
  sampletest - is.Sample(x,argname='x' )
  if (!sampletest$test) stop(sampletest$error)
  s - loadSample(x,filecheck=FALSE)
  s - normalize(s)
  if (channels(s)==1) {
#if (is.null(ylab)) ylab - waveform

plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab,axes=FALSE,...)

#axis(1)
#axis(2,at=c(-1,0,1),labels=as.character(c(-1,0,1)))
#abline(h=0)
#abline(h=c(-1,1))
#lines(par()$usr[1:2],y=c(rep(par()$usr[3],2)),xpd=TRUE)


-- 
Rajesh.J

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting Cumulative Hazard Functions with Strata

2009-06-20 Thread David Winsemius


On Jun 20, 2009, at 12:23 AM, j0645073 wrote:



Hello:

So i've fit a hazard function to a set of data using
kmfit-survfit(Surv(int, event)~factor(cohort))

this factor variable, cohort has four levels so naturally the strata
variable has 4 values.

I can use this data to estimate the hazard rate
haz-n.event/n.risk


That's not valid, but it's close. Try instead:


haz - kmfit$n.event/kmfit$n.risk


The problem will be to properly categorize the vectors in the survfit  
object with your covariate factor, cohort. (Strata is a somewhat  
different term in the language of Cox models.)


and calculate the cumulative hazard function by
H--log(haz)


Since that is not valid R syntax at all and you say you want  
cumulative, then why aren't you summing the hazards?




Now, I would like to plot this cumulative hazard function by time and
differentiate the curves by strata. For the life of me I cannot seem  
to do

it.

If you are being asked to do this from scratch and show your work for  
homework grading purposes, then by all means, carry on. If not, then  
you should be trying:


 plot(kmfit, fun=cumhaz)


Any help would be greatly appreciated.

Regards,

j



and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maximum number of tables combined with rbind

2009-06-20 Thread David Winsemius


On Jun 20, 2009, at 1:02 AM, gug wrote:



Hello,

I have been using read.table to read data files into R, then rbind to
combine them into one table.  The column headings are all identical,  
so it

should be straightforward, and it seems to be working well so far.

My question is: What is the maximum number of tables that can be  
combined

with rbind?

Is it driven by the number of rows in the tables, by a constraint  
with the

syntax of rbind itself, by my PC's memory,


Yes, and the OS. PC is not a useful designation for an OS.


or by some other constraint?  At
this point the data files I am reading in are approx 2,500 rows x 150
columns.


That should not be any problem. On an Intel-Mac w/ 8 GB there is no  
problem with dataframes that are 500 times that size.


If this is on a Windows box, there is an RW-FAQ on the topic.




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modifying sound package function plot.Sample

2009-06-20 Thread David Winsemius


On Jun 20, 2009, at 7:05 AM, rajesh j wrote:


Hi,
I'm trying to modify this function.I want to remove the existing  
xaxis(the
tick marks and the values below each tick) and make it dynamic so  
that i can
choose whether i want the xaxis at the top or bottom but i cant seem  
to

change that.can somebody help me?

plot.Sample - function(x,xlab=NULL,ylab=NULL,...){
 sampletest - is.Sample(x,argname='x' )
 if (!sampletest$test) stop(sampletest$error)
 s - loadSample(x,filecheck=FALSE)
 s - normalize(s)
 if (channels(s)==1) {
#if (is.null(ylab)) ylab - waveform

plot(sound(s) 
[1,],type=l,col=black,xlab=xlab,ylab=ylab,axes=FALSE,...)


If I understand you correctly, then try (untested):

plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab,  
xaxt=n, ...)


Although I'm not sure you can use that ellipsis in that context.



#axis(1)
#axis(2,at=c(-1,0,1),labels=as.character(c(-1,0,1)))
#abline(h=0)
#abline(h=c(-1,1))
#lines(par()$usr[1:2],y=c(rep(par()$usr[3],2)),xpd=TRUE)


The locator function would let you get user input that could be used  
to drive alternative specifications to the axis() function



--
Rajesh.J

[[alternative HTML version deleted]]



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] package JM -- version 0.3-0

2009-06-20 Thread Dimitris Rizopoulos

Dear R-users,

I'd like to announce the release of the new version of package JM (soon 
available from CRAN) for the joint modelling of longitudinal and 
time-to-event data using shared parameter models. These models are 
applicable in mainly two settings. First, when focus is in the 
time-to-event outcome and we wish to account for the effect of a 
time-dependent covariate measured with error. Second, when focus is in 
the longitudinal outcome and we wish to correct for nonrandom dropout.


New features include:

  * a relative risk model with a piecewise-constant baseline risk 
function is now available for the event outcome, using option 
'piecewise-PH-GH' in the 'method' argument of jointModel().


  * several types of residuals are supported for the longitudinal and 
time-to-event outcomes. Moreover, for the longitudinal outcome there is 
also the option to compute multiple-imputation-based residuals, as 
described in Rizopoulos, Verbeke and Molenberghs (Biometrics 2009, to 
appear).


  * the Weibull submodel for the time-to-event outcome is now available 
under both the relative risk and accelerated failure time formulations.


  * this new version of the package features new and more robust 
algorithms for numerical integration and optimization -- these updates 
could lead to different results, epsecially for the survival part 
compared to the previous version the package.



As always, any kind of feedback (e.g., questions, suggestions, 
bug-reports, etc.) is more than welcome.


Best,
Dimitris

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MS-VAR introduction

2009-06-20 Thread matifou

Actually there aren't by now  so many packages for markov regime switching
models in R but I just saw that at the userR 2009 conference a talk will be
held about it, maybe that can help you, though the authors don't mention
whether they use a existing package or developed new functionalities. 

http://www2.agrocampus-ouest.fr/math/useR-2009/abstracts/pdf/Fontdecaba_SanchezEspigares_Munoz.pdf

Matthieu

Dear Henrique,

I think that R is not actually the best statistical tool to model MS-VAR.
Indeed, the package msvar only allow a simple specification of the model.
One tool I have ever used is on Ox with the package MSVAR built by Krolzig.
This package allow a large variety of model specifications, you can choose
the number of regimes, the regime dependence etc. You could find more
details on his site:
http://www.krolzig.co.uk/index.html?content=/msvar.html
However, it would be of great interest to develop a package on R. Maybe
soon...

Best regards,

Sandrine Lunven
Economist
TAC financial
www.tac-financial.com




Dear R community,

I'm starting to learn the MS-VAR methodology and I would like to know what I
need to download (e.g. packages) to make MS-VAR estimations using R.

Best,
Henrique C. de Andrade
Doutorando em Economia Aplicada
Universidade Federal do Rio Grande do Sul
www.ufrgs.br/ppge

[[alternative HTML version deleted]] 

Sandrine LUNVEN wrote:
 
 Dear Henrique,
 
 I think that R is not actually the best statistical tool to model MS-VAR.
 Indeed, the package msvar only allow a simple specification of the model.
 One tool I have ever used is on Ox with the package MSVAR built by
 Krolzig.
 This package allow a large variety of model specifications, you can choose
 the number of regimes, the regime dependence etc. You could find more
 details on his site:
 http://www.krolzig.co.uk/index.html?content=/msvar.html
 However, it would be of great interest to develop a package on R. Maybe
 soon...
 
 Best regards,
 
 Sandrine Lunven
 Economist
 TAC financial
 www.tac-financial.com
 
 
 
 
 Dear R community,
 
 I'm starting to learn the MS-VAR methodology and I would like to know what
 I
 need to download (e.g. packages) to make MS-VAR estimations using R.
 
 Best,
 Henrique C. de Andrade
 Doutorando em Economia Aplicada
 Universidade Federal do Rio Grande do Sul
 www.ufrgs.br/ppge
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/MS-VAR-Introduction-tp24038825p24124803.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R-project mailing lists (and SVN) - timeout Sunday 21 June

2009-06-20 Thread Martin Maechler
Dear R users,

unfortunately, tomorrow Sunday will be a longish timeout
of our mail services, notably affecting the  
R-foo and R-SIG-bar mailing lists @r-project.org , i.e.,
hosted by the Stats / Math Department of ETH, stat.math.ethz.ch.

Note that also the svn.r-project.org will suffer from the
timeout.
This should be a  once in 8 years event, hopefully, and we are
sorry for any inconvenience.

Well, enjoy Sunday off-work :-)


--official-text --

There will be upcoming maintenance work of the central computer services
in our server room :

Sunday, 21st June, 9 a.m. - 5 p.m. (MEST = GMT+2)

During this time, we have to switch off all of our servers since the main
power supply will be turned off.
Services like email or web will be down.

This interrupt will probably take less than 8 hours. Please check the website

http://www.math.ethz.ch

As soon as this site is up again, all the servers and dervices should
be back online (within a small delay).

-

Martin Maechler,  ETH Zurich
  http://stat.ethz.ch/people/maechler
  Seminar für Statistik, ETH Zürich, SWITZERLAND

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maximum number of tables combined with rbind

2009-06-20 Thread gug

Thanks David - you're right, PC is not very informative.  I am using XP
Home with 4GB, though I don't think XP accesses more than 3GB.

From following through the FAQ's and memory functions (e.g.
memory.limit(size = NA)) it looks like R is getting about 1535Mb at the
moment.


David Winsemius wrote:
 
 My question is: What is the maximum number of tables that can be 
 combined
 with rbind?
 Is it driven by the number of rows in the tables, by a constraint  with
 the
 syntax of rbind itself, by my PC's memory,
 
 Yes, and the OS. PC is not a useful designation for an OS.
 
 or by some other constraint?  At this point the data files I am reading
 in are approx 2,500 rows x 150
 columns.
 
 That should not be any problem. On an Intel-Mac w/ 8 GB there is no  
 problem with dataframes that are 500 times that size.
 
 If this is on a Windows box, there is an RW-FAQ on the topic.
 
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT
 
 

-- 
View this message in context: 
http://www.nabble.com/Maximum-number-of-tables-combined-with-rbind-tp24122126p24124984.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] string splitting and testing for enrichment

2009-06-20 Thread Iain Gallagher
Hi List

I have data in the following form:

Gene    TFBS
NUDC     PPARA(1) HNF4(20) HNF4(96) AHRARNT(104) CACBINDINGPROTEIN(149) 
T3R(167) HLF(191) 
RPA2     STAT4(57) HEB(251) 
TAF12     PAX3(53) YY1(92) BRCA(99) GLI(101) 
EIF3I     NERF(10) P300(10) 
TRAPPC3     HIC1(3) PAX5(17) PAX5(110) NRF1(119) HIC1(122) 
TRAPPC3     EGR(26) ZNF219(27) SP3(32) EGR(32) NFKAPPAB65(89) NFKAPPAB(89) 
RFX(121) ZTA(168) 
NDUFS5     WHN(14) ATF(57) EGR3(59) PAX5(99) SF1(108) NRSE(146) 
TIE1     NRSE(129) 

I would like to test the 2nd column (each value has letters followed by numbers 
in brackets) here for enrichment via fisher.test.

To that end I am trying to create two factors made up of column 1 (Gene) and 
column 2 (TFBS) where each Gene would have several entries matching each TFBS.

My main problem just now is that I can't split the TFBS column into separate 
strings (at the moment that 2nd column is all one string for each Gene).

Here's where I am just now:

test-as.character(dataIn[,2]) # convert the 2nd column from factor to character
test2-unlist(strsplit(test[1], ' ')) # split the first element into individual 
strings (only the first element just now because I'm joust trying to get things 
working)
test3-unlist(strsplit(test2, '\\([0-9]\\)')) # get rid of numbers and brackets

now this does not behave as I hoped - it gives me:

 test3
[1] PPARA  HNF4(20)   HNF4(96)  
[4] AHRARNT(104)   CACBINDINGPROTEIN(149) T3R(167)  
[7] HLF(191)  

ie it only removes the numbers and brackets from the first entry and not the 
others.

Could someone point out my mistake please?

Once I have all the TFBS (letters only) for each Gene I would then count how 
often a TFBS occurs and use this data for a fisher.test testing for enrichment 
of TFBS in the list I have. I'm a rather muddled here though and would 
appreciate advice on whether this is the right approach.

Thanks

Iain

 sessionInfo()
R version 2.9.0 (2009-04-17) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maximum number of tables combined with rbind

2009-06-20 Thread David Winsemius
You may be able to get more space than what you currently have, but my  
understanding is the XP in a default configuration can only address  
2.5 GB. You also need to consider that you will need open memory in  
order to do anything useful. I am not aware of limitations imposed by  
rbind per se. Tables (if you are using the R termonology) are actually  
special forms of arrays.


I just ran this test and it used about half of my 10 GB according to  
my systemmonitor:


  mtx - matrix(1:(2500*150), ncol=150)
 for (i in 1:10) { mtx - rbind(mtx,mtx); print( 2^i) }
[1] 2
[1] 4
[1] 8
[1] 16
[1] 32
[1] 64
[1] 128
[1] 256
[1] 512
[1] 1024

Running it again only 8 times and looking at the fraction of occupied  
memory with Apple's Activity Monitor makes me think you could get 256  
of such tables into your space without too much difficulty, but the  
number could easily be twice that since my pointer overhead in a 64- 
bit system will be greater than yours. You will probably even have  
room to do something useful. The option of leaving the data in a  
database should not be overlooked if you need more space. sqldf is an  
available package:


http://code.google.com/p/sqldf/

Other informative functions include object.size() and gc(). After  
doubling that 8-fold copy of mtx to get a nine-fold doubling, I see:


 str(mtx)
 int [1:128, 1:150] 1 2 3 4 5 6 7 8 9 10 ...

 object.size(mtx)
[1] 768000200
 gc()
   used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells   114618   6.2 35   18.735   18.7
Vcells 96121989 733.4  309848649 2364.0 576124419 4395.5

 Good luck;

David


On Jun 20, 2009, at 8:49 AM, gug wrote:



Thanks David - you're right, PC is not very informative.  I am  
using XP

Home with 4GB, though I don't think XP accesses more than 3GB.


From following through the FAQ's and memory functions (e.g.
memory.limit(size = NA)) it looks like R is getting about 1535Mb  
at the

moment.


David Winsemius wrote:



My question is: What is the maximum number of tables that can be
combined
with rbind?
Is it driven by the number of rows in the tables, by a constraint   
with

the
syntax of rbind itself, by my PC's memory,


Yes, and the OS. PC is not a useful designation for an OS.

or by some other constraint?  At this point the data files I am  
reading

in are approx 2,500 rows x 150
columns.


That should not be any problem. On an Intel-Mac w/ 8 GB there is no
problem with dataframes that are 500 times that size.

If this is on a Windows box, there is an RW-FAQ on the topic.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to import timestamps from emails into R

2009-06-20 Thread Thomas Levine
Here's what I get
 head(tt)
[1] 2008-02-20 03:09:51 EST 2008-02-20 12:12:57 EST
[3] 2008-03-05 09:11:28 EST 2008-03-05 17:59:40 EST
[5] 2008-03-09 09:00:09 EDT 2008-03-29 15:57:16 EDT

But I can't figure out how to plot this now. plot(tt) does not appear to be
univariate. I get the same plot with plot(as.Date(tt)), which would make
sense if time is used because of the range of the dates and the
insignificance of the times of day.
 head(as.Date(tt))
[1] 2008-02-20 2008-02-20 2008-03-05 2008-03-05 2008-03-09
[6] 2008-03-29

plot(tt) and plot(as.Date(tt)) give something like year as a function of the
rest of the date. Here they are
[image: tt.png]
[image: as.Date.tt.png]
Here are the addresses
http://thomaslevine.org/time/tt.png
http://thomaslevine.org/time/as.Date.tt.png

Tom

On Fri, Jun 19, 2009 at 6:21 PM, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:

 Try this:


 Lines - Sun, 14 Jun 2009 07:33:00 -0700
 Sun, 14 Jun 2009 08:35:10 -0700
 Sun, 14 Jun 2009 21:26:34 -0700
 Mon, 15 Jun 2009 19:47:47 -0700
 Wed, 17 Jun 2009 21:50:41 -0700

 # L - readLines(myfile.txt)
 L - readLines(textConnection(Lines))
 tt - as.POSIXct(L, format = %a, %d %b %Y %H:%M:%S)



 On Fri, Jun 19, 2009 at 6:06 PM, Thomas Levinethomas.lev...@gmail.com
 wrote:
  I am analysing occurrences of a phenomenon by time, and each of these
  timestamps taken from email headers represents one occurrence. (The last
  number is the time zone.) I can easily change the format.
 
  Sun, 14 Jun 2009 07:33:00 -0700
  Sun, 14 Jun 2009 08:35:10 -0700
  Sun, 14 Jun 2009 21:26:34 -0700
  Mon, 15 Jun 2009 19:47:47 -0700
  Wed, 17 Jun 2009 21:50:41 -0700
 
  I've found documentation for a plethora of ways of importing time data,
 but
  I can't decide how to approach it. Any ideas on what may be the cleanest
  way? The only special concern is that I'll want to plot these data by
 date
  and time, meaning that I would rather not bin all of the occurrences from
  one day.
 
  The time zone isn't important as these are all local times; the time zone
  only changes as a function of daylight savings time, so I probably
 shouldn't
  use it at all.
 
  Tom
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error ellipse

2009-06-20 Thread Alexandru T Codilean

Dear All,

I have a data set with the following structure:

[A], [a], [B], [b]

where [A] and [B] are measurements and [a] and [b] are the associated  
uncertainties. I produce [B]/[A] vs. [A] plots in R and would like to  
show uncertainties  as error ellipses (rather than error bars). Would  
this be relatively easy to do in R?

I would appreciate any help on this
Thanks a lot

Tibi





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a difficulty in boot package

2009-06-20 Thread Uwe Ligges



Seunghee Baek wrote:

Hi,
I have a problem in programming for bootstrapping.
I don't know why it show the error message.
Please see my code below:
#'st' is my original dataset. 
#functions of 'fml.mlogl','pcopula.fam4','ltd','invltd' are already defined


boot.OR-function(data,i)
{
E=data[i,]
ml1-glm(c_VAsex90_bf ~ trt,family=binomial,data=E)
ml2-glm(c_VAsex90_bm ~ trt,family=binomial,data=E)
marg.covariates-cbind(rep(1,length(E$trt)),E$trt)
dep.covariates-cbind(rep(1,length(E$age_avr)),E$age_avr)
start-c(ml1$coef,ml2$coef,0,0)
fml1-optim(start,fml.mlogl,control=c(maxit=1),hessian=F)
x-(1+exp(fml1$par[1]))^(-1)
y-(1+exp(fml1$par[3]))^(-1)
b-exp(fml1$par[5]+fml1$par[6]*43)+1
p00-ltd(-log(pcopula.fam4(exp(-invltd(x,b)),exp(-invltd(y,b)),b)),b)
p1-exp(fml1$par[1])/(1+exp(fml1$par[1]))
p2-exp(fml1$par[3])/(1+exp(fml1$par[3]))
OR-p00*(p00+p1+p2-1)/(1-p2-p00)/(1-p1-p00)
OR
}

set.seed(101)

boot(st,boot.OR,R=500)

##
I gives following error message:
Error in fn(par, ...) : object dep.covariates not found



Since you have not specified all function I can only guess that you need 
 dep.covariance in function fml.mlogl() in your optim() call. In that 
case you need to pass it as an argument, otherwise it is not in the 
search path given R's scoping rules.


Hence, for example, define

  fml.mlogl - function(..., dep.covariates)

and call ist with

  optim(, dep.covariates=dep.covariates)



Best,
Uwe Ligges




I hope you can help me in this problem.

Thanks,
Becky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] string splitting and testing for enrichment

2009-06-20 Thread Gabor Grothendieck
Try this.   We read in data and split TFBS on ( or )  or )
giving s and reform s into a matrix prepending the Gene name as
column 1.  Convert that to a data frame and make the third
column numeric.

Lines - Gene,TFBS
NUDC,PPARA(1) HNF4(20) HNF4(96) AHRARNT(104) CACBINDINGPROTEIN(149)
T3R(167) HLF(191)
RPA2,STAT4(57) HEB(251)
TAF12,PAX3(53) YY1(92) BRCA(99) GLI(101)
EIF3I,NERF(10) P300(10)
TRAPPC3,HIC1(3) PAX5(17) PAX5(110) NRF1(119) HIC1(122)
TRAPPC3,EGR(26) ZNF219(27) SP3(32) EGR(32) NFKAPPAB65(89) NFKAPPAB(89)
RFX(121) ZTA(168)
NDUFS5,WHN(14) ATF(57) EGR3(59) PAX5(99) SF1(108) NRSE(146)
TIE1,NRSE(129)

DF - read.csv(textConnection(Lines), as.is = TRUE)

s - strsplit(DF$TFBS, \\(|\\) |\\))
f - function(i) cbind(DF[i, Gene], matrix(s[[i]], nc = 2, byrow = TRUE))
DF2 - as.data.frame(do.call(rbind, lapply(seq_along(s), f)))
DF2[[3]] - as.numeric(DF2[[3]])
View(DF2)


On Sat, Jun 20, 2009 at 10:28 AM, Iain
Gallagheriaingallag...@btopenworld.com wrote:
 Hi List

 I have data in the following form:

 Gene    TFBS
 NUDC     PPARA(1) HNF4(20) HNF4(96) AHRARNT(104) CACBINDINGPROTEIN(149) 
 T3R(167) HLF(191)
 RPA2     STAT4(57) HEB(251)
 TAF12     PAX3(53) YY1(92) BRCA(99) GLI(101)
 EIF3I     NERF(10) P300(10)
 TRAPPC3     HIC1(3) PAX5(17) PAX5(110) NRF1(119) HIC1(122)
 TRAPPC3     EGR(26) ZNF219(27) SP3(32) EGR(32) NFKAPPAB65(89) NFKAPPAB(89) 
 RFX(121) ZTA(168)
 NDUFS5     WHN(14) ATF(57) EGR3(59) PAX5(99) SF1(108) NRSE(146)
 TIE1     NRSE(129)

 I would like to test the 2nd column (each value has letters followed by 
 numbers in brackets) here for enrichment via fisher.test.

 To that end I am trying to create two factors made up of column 1 (Gene) and 
 column 2 (TFBS) where each Gene would have several entries matching each TFBS.

 My main problem just now is that I can't split the TFBS column into separate 
 strings (at the moment that 2nd column is all one string for each Gene).

 Here's where I am just now:

 test-as.character(dataIn[,2]) # convert the 2nd column from factor to 
 character
 test2-unlist(strsplit(test[1], ' ')) # split the first element into 
 individual strings (only the first element just now because I'm joust trying 
 to get things working)
 test3-unlist(strsplit(test2, '\\([0-9]\\)')) # get rid of numbers and 
 brackets

 now this does not behave as I hoped - it gives me:

 test3
 [1] PPARA  HNF4(20)   HNF4(96)
 [4] AHRARNT(104)   CACBINDINGPROTEIN(149) T3R(167)
 [7] HLF(191)

 ie it only removes the numbers and brackets from the first entry and not the 
 others.

 Could someone point out my mistake please?

 Once I have all the TFBS (letters only) for each Gene I would then count how 
 often a TFBS occurs and use this data for a fisher.test testing for 
 enrichment of TFBS in the list I have. I'm a rather muddled here though and 
 would appreciate advice on whether this is the right approach.

 Thanks

 Iain

 sessionInfo()
 R version 2.9.0 (2009-04-17)
 x86_64-pc-linux-gnu

 locale:
 LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base






        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] modifying sound package function plot.Sample

2009-06-20 Thread Uwe Ligges

Change as in:

plot.Sample - function(..., xaxloc = 1){


and in the body replace axis(1) by axis(xaxloc)

then you can call plot(., xaxloc=3) in order to plot the xaxis at 
the top (while bottom is still the default).


Uwe Ligges






David Winsemius wrote:


On Jun 20, 2009, at 7:05 AM, rajesh j wrote:


Hi,
I'm trying to modify this function.I want to remove the existing 
xaxis(the
tick marks and the values below each tick) and make it dynamic so that 
i can

choose whether i want the xaxis at the top or bottom but i cant seem to
change that.can somebody help me?

plot.Sample - function(x,xlab=NULL,ylab=NULL,...){
 sampletest - is.Sample(x,argname='x' )
 if (!sampletest$test) stop(sampletest$error)
 s - loadSample(x,filecheck=FALSE)
 s - normalize(s)
 if (channels(s)==1) {
#if (is.null(ylab)) ylab - waveform

plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab,axes=FALSE,...) 



If I understand you correctly, then try (untested):

plot(sound(s)[1,],type=l,col=black,xlab=xlab,ylab=ylab, xaxt=n, ...)

Although I'm not sure you can use that ellipsis in that context.



#axis(1)
#axis(2,at=c(-1,0,1),labels=as.character(c(-1,0,1)))
#abline(h=0)
#abline(h=c(-1,1))
#lines(par()$usr[1:2],y=c(rep(par()$usr[3],2)),xpd=TRUE)


The locator function would let you get user input that could be used to 
drive alternative specifications to the axis() function



--
Rajesh.J

[[alternative HTML version deleted]]



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to import timestamps from emails into R

2009-06-20 Thread Thomas Levine
This produces the x-axis is the index, and the y-axis is time. It has all of
the time information on the same axis, allowing me to plot cumulative
occurrences by time (my original plan) if the times are sorted, which they
should be.

I think I'll end up using some variant of plot(tt,seq_along(tt)), putting
the time axis along the bottom.

Thanks

Tom

On Sat, Jun 20, 2009 at 11:15 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 Try this:

 plot(seq_along(tt), tt)


 On Sat, Jun 20, 2009 at 10:55 AM, Thomas Levinethomas.lev...@gmail.com
 wrote:
  Here's what I get
  head(tt)
  [1] 2008-02-20 03:09:51 EST 2008-02-20 12:12:57 EST
  [3] 2008-03-05 09:11:28 EST 2008-03-05 17:59:40 EST
  [5] 2008-03-09 09:00:09 EDT 2008-03-29 15:57:16 EDT
 
  But I can't figure out how to plot this now. plot(tt) does not appear to
 be
  univariate. I get the same plot with plot(as.Date(tt)), which would make
  sense if time is used because of the range of the dates and the
  insignificance of the times of day.
  head(as.Date(tt))
  [1] 2008-02-20 2008-02-20 2008-03-05 2008-03-05 2008-03-09
  [6] 2008-03-29
 
  plot(tt) and plot(as.Date(tt)) give something like year as a function of
 the
  rest of the date. Here they are
 
 
  Here are the addresses
  http://thomaslevine.org/time/tt.png
  http://thomaslevine.org/time/as.Date.tt.png
 
  Tom
 
  On Fri, Jun 19, 2009 at 6:21 PM, Gabor Grothendieck
  ggrothendi...@gmail.com wrote:
 
  Try this:
 
 
  Lines - Sun, 14 Jun 2009 07:33:00 -0700
  Sun, 14 Jun 2009 08:35:10 -0700
  Sun, 14 Jun 2009 21:26:34 -0700
  Mon, 15 Jun 2009 19:47:47 -0700
  Wed, 17 Jun 2009 21:50:41 -0700
 
  # L - readLines(myfile.txt)
  L - readLines(textConnection(Lines))
  tt - as.POSIXct(L, format = %a, %d %b %Y %H:%M:%S)
 
 
 
  On Fri, Jun 19, 2009 at 6:06 PM, Thomas Levinethomas.lev...@gmail.com
  wrote:
   I am analysing occurrences of a phenomenon by time, and each of these
   timestamps taken from email headers represents one occurrence. (The
 last
   number is the time zone.) I can easily change the format.
  
   Sun, 14 Jun 2009 07:33:00 -0700
   Sun, 14 Jun 2009 08:35:10 -0700
   Sun, 14 Jun 2009 21:26:34 -0700
   Mon, 15 Jun 2009 19:47:47 -0700
   Wed, 17 Jun 2009 21:50:41 -0700
  
   I've found documentation for a plethora of ways of importing time
 data,
   but
   I can't decide how to approach it. Any ideas on what may be the
 cleanest
   way? The only special concern is that I'll want to plot these data by
   date
   and time, meaning that I would rather not bin all of the occurrences
   from
   one day.
  
   The time zone isn't important as these are all local times; the time
   zone
   only changes as a function of daylight savings time, so I probably
   shouldn't
   use it at all.
  
   Tom
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] typo in Lomb-Scargle periodogram implementation in spec.ls() from cts package?

2009-06-20 Thread Uwe Ligges

Please report problems in the code to the package maintainer (CCing).

Best,
Uwe Ligges



Mikhail Titov wrote:

Hello!

I tried to contact author of the package, but I got no reply. That is why I 
write it here. This might be useful for those who were using cts for spectral 
analysis of non-uniformly spaced data.

In file spec.ls.R from cts_1.0-1.tar.gz lines 59-60 are written as

pgram[k, i, j] - 0.5 * ((sum(x[1:length(ti)]* cos(2 * pi * freq.temp[k] * (ti - tao^2/sum((cos(2 * 
pi * freq.temp[k] * (ti - tao)))^2) + (sum(x[1:length(ti)] *  sin(2 * pi * freq.temp[k] * (ti - tao^2 === ) === /sum((sin(2 * pi * freq.temp[k] * (ti - tao)))^2)


Is there a misplaced bracket (shown like === ) ===)? Should it be like the 
following?

pgram[k, i, j] - 0.5 * ((sum(x[1:length(ti)]* cos(2 * pi * freq.temp[k] * (ti - tao^2/sum((cos(2 * 
pi * freq.temp[k] * (ti - tao)))^2) + (sum(x[1:length(ti)] *  sin(2 * pi * freq.temp[k] * (ti - tao^2/sum((sin(2 * pi * freq.temp[k] * (ti - tao)))^2) === ) ===



Here is quick reference 
http://en.wikipedia.org/wiki/Least-squares_spectral_analysis#The_Lomb.E2.80.93Scargle_periodogram
 . One half coefficient was not applied to entire expression.

Also I find weird next lines (61-62)

pgram[1, i, j] - 0.5 * (pgram[2, i, j] + pgram[N, i, j])

First of all, such things should not be in the for loop. Second, I don't quite 
understand the meaning of it.

P.S. Should I use tapering of my data? If I just try to fit sine and cosine, I 
may not use it, however for FFT windowing is a must. What about Lomb-Scargle?

Mikhail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to import timestamps from emails into R

2009-06-20 Thread Gabor Grothendieck
If that is the situation then plot(tt) in your post could not have been
what you wanted in any case, e.g. plot(10:20)

On Sat, Jun 20, 2009 at 11:49 AM, Thomas Levinethomas.lev...@gmail.com wrote:
 This produces the x-axis is the index, and the y-axis is time. It has all of
 the time information on the same axis, allowing me to plot cumulative
 occurrences by time (my original plan) if the times are sorted, which they
 should be.

 I think I'll end up using some variant of plot(tt,seq_along(tt)), putting
 the time axis along the bottom.

 Thanks

 Tom

 On Sat, Jun 20, 2009 at 11:15 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Try this:

 plot(seq_along(tt), tt)


 On Sat, Jun 20, 2009 at 10:55 AM, Thomas Levinethomas.lev...@gmail.com
 wrote:
  Here's what I get
  head(tt)
  [1] 2008-02-20 03:09:51 EST 2008-02-20 12:12:57 EST
  [3] 2008-03-05 09:11:28 EST 2008-03-05 17:59:40 EST
  [5] 2008-03-09 09:00:09 EDT 2008-03-29 15:57:16 EDT
 
  But I can't figure out how to plot this now. plot(tt) does not appear to
  be
  univariate. I get the same plot with plot(as.Date(tt)), which would make
  sense if time is used because of the range of the dates and the
  insignificance of the times of day.
  head(as.Date(tt))
  [1] 2008-02-20 2008-02-20 2008-03-05 2008-03-05 2008-03-09
  [6] 2008-03-29
 
  plot(tt) and plot(as.Date(tt)) give something like year as a function of
  the
  rest of the date. Here they are
 
 
  Here are the addresses
  http://thomaslevine.org/time/tt.png
  http://thomaslevine.org/time/as.Date.tt.png
 
  Tom
 
  On Fri, Jun 19, 2009 at 6:21 PM, Gabor Grothendieck
  ggrothendi...@gmail.com wrote:
 
  Try this:
 
 
  Lines - Sun, 14 Jun 2009 07:33:00 -0700
  Sun, 14 Jun 2009 08:35:10 -0700
  Sun, 14 Jun 2009 21:26:34 -0700
  Mon, 15 Jun 2009 19:47:47 -0700
  Wed, 17 Jun 2009 21:50:41 -0700
 
  # L - readLines(myfile.txt)
  L - readLines(textConnection(Lines))
  tt - as.POSIXct(L, format = %a, %d %b %Y %H:%M:%S)
 
 
 
  On Fri, Jun 19, 2009 at 6:06 PM, Thomas Levinethomas.lev...@gmail.com
  wrote:
   I am analysing occurrences of a phenomenon by time, and each of these
   timestamps taken from email headers represents one occurrence. (The
   last
   number is the time zone.) I can easily change the format.
  
   Sun, 14 Jun 2009 07:33:00 -0700
   Sun, 14 Jun 2009 08:35:10 -0700
   Sun, 14 Jun 2009 21:26:34 -0700
   Mon, 15 Jun 2009 19:47:47 -0700
   Wed, 17 Jun 2009 21:50:41 -0700
  
   I've found documentation for a plethora of ways of importing time
   data,
   but
   I can't decide how to approach it. Any ideas on what may be the
   cleanest
   way? The only special concern is that I'll want to plot these data by
   date
   and time, meaning that I would rather not bin all of the occurrences
   from
   one day.
  
   The time zone isn't important as these are all local times; the time
   zone
   only changes as a function of daylight savings time, so I probably
   shouldn't
   use it at all.
  
   Tom
  
          [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loading .Rdata files

2009-06-20 Thread jim holtman
also use 'try' to capture the error and continue

On Sat, Jun 20, 2009 at 12:57 AM, Erin Hodgess erinm.hodg...@gmail.comwrote:

 Dear R People:

 I'm loading several thousand .Rdata files in sequence.

 If one of them is empty, the function crashes.

 I am thinking about using system(wc ) etc., and strsplit for the
 results, but was wondering if there is a more clever way via a file
 type command, please.

 Thanks,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to apply the dummy coding rule in a dataframe with complete factor levels to another dataframe with incomplete factor levels?

2009-06-20 Thread Kingsford Jones
Hi Sean,

The levels attribute of a factor can contain levels that are not
represented in the data.  So, in your example we can get the desired
result by adding the missing levels via the levels argument to the
factor function:

 dfB =data.frame(f1=factor(c('a','b','b'), levels=c('a','b','c')), 
 f2=factor(c('aa','bb','bb'), levels=c('aa','bb','cc')))
 model.matrix(~f1+f2, data=dfB)
  (Intercept) f1b f1c f2bb f2cc
1   1   0   000
2   1   1   010
3   1   1   010
attr(,assign)
[1] 0 1 1 2 2
attr(,contrasts)
attr(,contrasts)$f1
[1] contr.treatment

attr(,contrasts)$f2
[1] contr.treatment



hth,
Kingsford Jones



On Fri, Jun 19, 2009 at 10:13 PM, Sean Zhangseane...@gmail.com wrote:
 Dear R helpers:

 Sorry to bother for a basic question about model.matrix.
 Basically, I want to apply the dummy coding rule in a dataframe with
 complete factor levels to another dataframe with incomplete factor levels.
 I used model.matrix, but could not get what I want.
 The following is an example.

 #Suppose I have two dataframe A and B
 dfA=data.frame(f1=factor(c('a','b','c')), f2=factor(c('aa','bb','cc')))
 dfB =data.frame(f1=factor(c('a','b','b')), f2=factor(c('aa','bb','bb')))
 #dfB's factor variables have less number of levels

 #use model.matrix on dfA
 (matA-model.matrix(~f1+f2,data=dfA))
 #use model.matrix on dfB
 (matB-model.matrix(~f1+f2,data=dfB))
 #I actaully like to dummy code dfB using the dummy coding rule defined in
 model.matrix(~f1+f2,data=dfA))
 #matB_wanted  is below
 (matB_wanted-rbind(c(1,0,0,0,0),c(1,1,0,1,0),c(1,1,0,1,0)) )
 colnames(matB_wanted)-colnames(matA)
 matB_wanted
 Can someone kindly show me how to get matB_wanted?
 Many thanks in advance!

 -Sean

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] png() resolution problem {was Silhouette ...}

2009-06-20 Thread Martin Maechler
Hallo Sebastian,

 SP == Sebastian Pölsterl s...@k-d-w.org
 on Sun, 14 Jun 2009 14:04:52 +0200 writes:

SP Hello Martin,
SP I plotting the silhouette of a clustering and storing it as png. When I 
SP try to store the image as png the bars are missing. The bars are 
plotted 
SP when I use x11 or postscript as device. In addition, it seems to work 
SP when I use a smaller matrix (e.g. ruspini).

SP Would be great if you have look at this issue.

Hmm, I've been at a conference in Italy...
The silhouette plot only uses standard R plotting functions,
so any problem with it exposes problems in standard R
graphics.

-- Such a message should really go to R-help.
to which I CC now.

--

 library(cluster)
 nmat - matrix(rnorm(2500*300), ncol=300, nrow=2500)
 rmat - matrix(rchisq(1000, 300, 50), ncol=300, nrow=1000)
 mat - rbind(nmat, rmat)
 pr - pam(mat, 2)
 sil - silhouette(pr)
 png(sil.png)
 #postscript(sil.ps)
 plot(sil)
 dev.off()

--

Anyway, I can confirm the problem,
but of course, it has not much to do with the silhouette
function, but rather with the png() device which produces a
bitmap, and the lines you draw are too fine (in the bitmap
resolution) and so are rounded to invisible.

You can reproduce the problem much more simply:

set.seed(1); x - rlnorm(5000)

png(bar.png);barplot(x,col=gray,border=0,horiz=TRUE);dev.off()
system(eog bar.png  )

## which is also empty, and the completely analogue, replacing 
## png [bitmap] with  pdf [vector graphic]

pdf(bar.pdf);barplot(x,col=gray,border=0,horiz=TRUE);dev.off()
system(evince bar.pdf )

## gives a very nice plot, into which you can zoom and see all details.



Now in principle you should be able to use  png() with a much
higher resolution than the default one,
but replacing the above
png(bar.bng)
with
png(bar.bng, res = 1200)

did not help, as we now get the infamous
Error in plot.new() : figure margins too large

Other R-help readers will be able to make the png() example work
for such cases, where you need so many lines.
{but let's stick with barplot(*, border=0, *)}

Regards,
Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Leave One Out Cross Validation

2009-06-20 Thread muddz

Hi David,

Thanks and I apologize for the lack of clarity.

##n is defined as the length of xdat
n-length(xdat)

#I defined 'k' as the Gaussian kernel function
k-function(v) {1/sqrt(2*pi)*exp(-v^2/2)} #GAUSSIAN kernal

#I believe ypred in my case, was the leave one out estimator (I think its
the variable x, in other words xdat or independent variable). I could be
wrong, but from the text of Davidson and MacKinnon, thats what I got out of
it. 

 cvhfunc-function(y,x,h){
 ypred-0
 for (i in 1:n){   
for (j in 1:n){
if (j!=i){
 ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/
   k( (x[j]-x[i])/h )

#I believe ypred in my case, was the leave one out estimator (I think its
the variable x, in other words xdat or independent variable). I could be
wrong, but from the text of Davidson and MacKinnon, thats what I got out of
it. 

#I totally hear you on the fact that ypred will just equal y[i], however it
should not be the case because for each x[i], I should be running all x[j]
except for when x[i]=x[j]. And I should be mulitiplying the numerator of
ypred (y[i]*k( (x[j]-x[i])/h ). However, its not doing that. 

#I believe CV should be the following: It is used to determine optimal
bandwidth. Steps:
(1)The process involves running x, [j] times for each x[i], except for when
x[j]=x[i]. This is similiar to drawing a histogram and finding how how
large/small the bins should be. ypred should be a vector of nx1 
(2) The second step is similiar to a variance measure, where take the
difference of y and (1), square, and than sum for all n. 

 }
   }   }
 ypred # not sure what that will do inside the function. If  
 it's there for debugging you may want to print(ypred)

  cvh-0
  cvh-cvh+((1/n)*(ydat-ypred)^2


# ypred is a scalar, while y is a vector, so cvh will be a vector. Is  
that what you want?

  }
# I was not sure with this  ypred # not sure what that will do inside
the function. If  
it's there for debugging you may want to print(ypred). 


#As a test run with h=.2; If test values of h= 1.4,15,30,50 show different
and good results (i.e no NaN, 
#massive small numbers, etc)
 test2-cvhfunc(ydat,xdat,.2);test2


Thanks.
-Mudasir

  }

David Winsemius wrote:
 
 
 On Jun 19, 2009, at 7:45 PM, muddz wrote:
 

 Hi Uwe,

 My apologies.

 Please if I can be guided what I am doing wrong in the code. I  
 started my
 code as such:

 # ypred is my leave one out estimator of x
 
 Estimator of x? Really?
 
 cvhfunc-function(y,x,h){
 ypred-0
 for (i in 1:n){   #how did you define n? It's not in your  
 parameter list
for (j in 1:n){
if (j!=i){
 ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/
k( (x[j]-x[i])/h )
 
 At this point you are also using k as a function. Have you at any  
 point defined k?
 
 Also multiplication of the ratio of two identical numbers would  
 generally give a result of y[i] for that second term. Unless, of  
 course, any x[j] = x[i] in which case you will throw an error and  
 every thing will grind to a halt.
 
 It might help if you now explained what you think CV should be.
 
 }
   }   }
 ypred # not sure what that will do inside the function. If  
 it's there for debugging you may want to print(ypred)

 #CVh is a
 
 # Yes? CVh is a what ?
 
  cvh-0
  cvh-cvh+((1/n)*(y-ypred)^2# n again. R will still not know  
 what that is.
  cvh
 
 # ypred is a scalar, while y is a vector, so cvh will be a vector. Is  
 that what you want?
 
  }
 
 test2-cvhfunc(ydat,xdat,.2);test2

 #I was experimenting with the following data:
 library(datasets)
 data(faithful)
 ydat-faithful$eruptions;ydat;plot(ydat);par(new=T)
 xdat-faithful$waiting;xdat;plot(xdat,col=blue)

 # I want to minimize the CV function with respect to h. Thanks.




 Uwe Ligges-3 wrote:

 See the posting guide:
 If you provide commented, minimal, self-contained, reproducible code
 some people may be willing to help on the list.

 Best,
 Uwe Ligges


 muddz wrote:
 Hi All,

 I have been trying to get this LOO-Cross Validation method to work  
 on R
 for
 the past 3 weeks but have had no luck. I am hoping someone would  
 kindly
 help
 me.

 Essentially I want to code the LOO-Cross Validation for the 'Local
 Constant'
 and 'Local Linear' constant estimators. I want to find optimal h,
 bandwidth.

 Thank you very much!
 -M



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 -- 
 View this message in context:
 http://www.nabble.com/Leave-One-Out-Cross-Validation-tp24025738p24120380.html
 

[R] Tree Decison with rpat

2009-06-20 Thread Rafael Marconi Ramos
Dear colleagues in R,

I am using rpart to make treedecision, but what treedecision rpart use ?
j48, id3, c4.5 ?

Tanks

Rafael Marconi Ramos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Customize axis labels in xyplot

2009-06-20 Thread nmset

Hello,

I'm plotting an xyplot where a continuous var recorded every min is plotted
on y, and time expressed as HH:MM:SS on x, as follows :

xaxis=list(tick.number=12,rot=90)
lst=list(x=xaxis)
xyplot(upt$LOAD_1 ~ upt$TIME, data=upt, type=c('g','p', 'r'), scales=lst)

On the x-axis, every time label is drawn and the label quickly become
unreadable as they overlap on each other.

I wished to limit the number of label with 'tick.number=12' but it does not
work as the x values are not treated as a numerical sequence.

How can I limit the number of ticks and labels for a time series expressed
as HH:MM:SS ?

One way might be to convert to mins from an origin, the result would be less
expressive though.

Thank you for any help.

-- 
View this message in context: 
http://www.nabble.com/Customize-axis-labels-in-xyplot-tp24126788p24126788.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)

2009-06-20 Thread Stavros Macrakis
(I am replacing R-devel and r-bugs with r-help as addressees.)

On Sat, Jun 20, 2009 at 9:45 AM, Dr. D. P. Kreil dpkr...@gmail.com wrote:

 So if I request a calculation of 0.3-0.1-0.1-0.1 and I do not get 0,
 that is not an issue of rounding / underflow (or whatever the correct
 technical term would be for that behaviour)?


No.  Let's start from the beginning.

In binary floating point arithmetic, all numbers are represented as a*2^b,
where a and b have a fixed number of digits, so input conversion from
decimal form to binary form inherently loses some precision -- that is, it
rounds to the nearest binary fraction.

For example, representation(0.3) is 5404319552844595 * 2^-54, about 1e-17
less than exactly 3/10, which is of course not representable in the form
a*2^b.

The EXACT difference (calculating with rationals -- no roundoff errors etc.)
between representation(0.3) and 3*representation(0.1) is 2^-55 (about
1e-17); the EXACT difference between representation(0.3) and
representation(3*representation(0.1)) is 2^-54.  As it happens, in this
case, there is no rounding error at all -- the floating-point result of 0.3
- 3*0.1 is exactly -2^-54.

 I thought that guard digits would mean that 0.3-0.1*3 should be calculated
 in higher precision than the final representation of the result, i.e.,
 avoiding that this is not equal to 0?


Guard digits and sticky bits are techniques for more accurate rounding of
individual arithmetic operations, and do not persist beyond each individual
operation.  They cannot create precise results out of imprecise inputs
(except when they get lucky!).  And even with precise inputs, they cannot
create correctly rounded results with multiple operations.  Consider for
example (1.0 + 1.0e-15) - 1.0.  The correctly rounded result of
(1.0+1.0e-15) is 1.0011...  And the correctly rounded result of
(1.0+1.0e-15)-1.0 is 1.11e-15, which is 11% different than the mathematical
result.

Perhaps you are thinking about the case where intermediate results are
accumulated in higher-than-normal precision.  This technique only applies in
very specialized circumstances, and it not available to user code in most
programming languages (including R).  I don't know whether R's sum function
uses this technique or some other (e.g. Kahan summation), but it does manage
to give higher precision than summation with individual arithmetic
operators:

sum(c(2^63,1,-2^63)) = 1
but
   Reduce(`+`,c(2^63,1,-2^63)) = 0

I am sorry if I am not from the field... If you can suggest an online
 resource to help me use the right vocabulary and better understand the
 fundamental concepts, I am of course grateful.


I would suggest What every computer scientist should know about
floating-point arithmetic *ACM Computing Surveys* *23*:1 (March 1991) for
the basics.  Anything by Kahan (http://www.cs.berkeley.edu/~wkahan/) is
interesting.  Beyond elementary floating-point arithmetic, there is of
course the vast field of numerical analysis, which underlies many of the
algorithms used by R and other statistical systems.

-s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Leave One Out Cross Validation

2009-06-20 Thread David Winsemius
You don't seem to be making any corrections or updating your code.  
There remains a syntax error in the last line of cvhfunc because of  
mismatched parens.


On Jun 20, 2009, at 1:04 PM, muddz wrote:



Hi David,

Thanks and I apologize for the lack of clarity.

#n is defined as the length of xdat
n-length(xdat)


Would be better to include that at the top of  the function body.


#I defined 'k' as the Gaussian kernel function
k-function(v) {1/sqrt(2*pi)*exp(-v^2/2)} #GAUSSIAN kernal

#I believe ypred in my case, was the leave one out estimator (I  
think its

the variable x, in other words xdat or independent variable).


It does not make sense to me that ypred or yhat would be an estimator  
of x.


I could be wrong, but from the text of Davidson and MacKinnon, thats  
what I got out of

it.


Don't have whatever text. The Google hits suggest it might be  
Econometric Theory and Methods. Amazon  has one of those look inside  
options. Searching on cv(h) suggests you are trying to replicate  
kernel regression on p 686.


#I totally hear you on the fact that ypred will just equal y[i],  
however it
should not be the case because for each x[i], I should be running  
all x[j]
except for when x[i]=x[j]. And I should be mulitiplying the  
numerator of

ypred (y[i]*k( (x[j]-x[i])/h ). However, its not doing that.


Not doing what? You need to decide what is wrong.

(y[i]*k((x[j]-x[i])/h))/k((x[j]-x[i])/h)  = y[i] * 1  # by definition,  
unless x[j] = x[i], in which case it is NaN


... which is what in the vector that your current function produces  
after fixing some syntax. Doing some debugging it looks as though the  
first time through with i=1 and j=2 that the results of the calucation  
is NaN which means that ypred will always be NaN. Just because it is  
done over a loop does not fix the error in numerical logic and failure  
to check arguments.  So you still need to decide how to trap  
situations where x[j] = x[i]. And you need to re-read DM and  
implement the weights that prevent using using terms when k((x[j]- 
x[i])/h) is very small.


(There is only a single summation index in that CV(h) formula. So  
maybe you could just run predict(lm(y ~ . , data=x[-i]))  instead  of  
that god-awful nested loop.)




#I believe CV should be the following: It is used to determine optimal
bandwidth. Steps:


If CV is a scalar then the last line could not possibly give you what  
you want:


 (1 - 1:10)   #scalar minus vector gives vector.
 [1]  0 -1 -2 -3 -4 -5 -6 -7 -8 -9

Don't you want a sum?

(1)The process involves running x, [j] times for each x[i], except  
for when

x[j]=x[i]. This is similiar to drawing a histogram and finding how how
large/small the bins should be. ypred should be a vector of nx1




(2) The second step is similiar to a variance measure, where take the
difference of y and (1), square, and than sum for all n.


You are going to need to define second step.


# I was not sure with this  ypred


That line looks useless to me at the moment. ypred is whatever it is  
at that point, and it won't return a value from the function because  
you later evaluate other expressions. Functions only return the last  
evaluated expression or what is wrapped in return().




# not sure what that will do inside
the function. If
it's there for debugging you may want to print(ypred).

Thanks.
-Mudasir




}


David Winsemius wrote:



On Jun 19, 2009, at 7:45 PM, muddz wrote:



Hi Uwe,

My apologies.

Please if I can be guided what I am doing wrong in the code. I
started my
code as such:

# ypred is my leave one out estimator of x


Estimator of x? Really?


cvhfunc-function(y,x,h){
   ypred-0
   for (i in 1:n){   #how did you define n? It's not in your
parameter list
  for (j in 1:n){
  if (j!=i){
   ypred-ypred +(y[i]*k( (x[j]-x[i])/h ) )/

  k( (x[j]-x[i])/h )

At this point you are also using k as a function. Have you at any
point defined k?

Also multiplication of the ratio of two identical numbers would
generally give a result of y[i] for that second term. Unless, of
course, any x[j] = x[i] in which case you will throw an error and
every thing will grind to a halt.

It might help if you now explained what you think CV should be.


   }
 }   }
   ypred # not sure what that will do inside the function. If
it's there for debugging you may want to print(ypred)

#CVh is a


# Yes? CVh is a what ?


cvh-0
cvh-cvh+((1/n)*(y-ypred)^2# n again. R will still not know
what that is.
cvh


# ypred is a scalar, while y is a vector, so cvh will be a vector. Is
that what you want?


}



test2-cvhfunc(ydat,xdat,.2);test2

#I was experimenting with the following data:
library(datasets)
data(faithful)
ydat-faithful$eruptions;ydat;plot(ydat);par(new=T)
xdat-faithful$waiting;xdat;plot(xdat,col=blue)

# I want to minimize the CV function with respect 

Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)

2009-06-20 Thread Stavros Macrakis
On Sat, Jun 20, 2009 at 4:10 PM, Dr. D. P. Kreil dpkr...@gmail.com wrote:

 Ah, that's probably where I went wrong. I thought R would take the
 0.1, the 0.3, the 3, convert them to extended precision binary
 representations, do its calculations, an the reduction to normal
 double precision binary floats would only happen when the result was
 stored or printed.


This proposal is problematic in many ways.  For example, it would *still*
not guarantee that 0.3 - 3*0.1 == 0, since extended-precision floats have
the same characteristics as normal-precision floats.  Would you round to
normal precision when passing arguments?  Then sqrt could not produce
extended-precision results. etc. etc.

I suppose R could support an extended-precision floating-point type, but
that would require that the *user* choose which operations were in
extended-precision and which in normal precision. (And of course it would be
a lot of work to add in a complete and consistent way.)

   -s

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation between categorical data

2009-06-20 Thread Marc Schwartz


On Jun 20, 2009, at 2:05 PM, Jason Morgan wrote:


On 2009.06.19 14:04:59, Michael wrote:

Hi all,

In a data-frame, I have two columns of data that are categorical.

How do I form some sort of measure of correlation between these two  
columns?


For numerical data, I just need to regress one to the other, or do
some pairs plot.

But for categorical data, how do I find and/or visualize correlation
between the two columns of data?


As Dylan mentioned, using crosstabs may be the easiest way. Also, a
simple correlation between the two variables may be informative. If
each variable is ordinal, you can use Kendall's tau-b (square table)
or tau-c (rectangular table). The former you can calculate with ?cor
(set method=kendall), the latter you may have to hack something
together yourself, there is code on the Internet to do this. If the
data are nominal, then a simple chi-squared test (large-n) or Fisher's
exact test (small-n) may be more appropriate. There are rules about
which to use when one variable is ordinal and one is nominal, but I
don't have my notes in front of me. Maybe someone else can provide
more assistance (and correct me if I'm wrong :).




I would be cautious in recommending the Fisher Exact Test based upon  
small samples sizes, as the FET has been shown to be overly  
conservative. This also applies to the use of the continuity  
correction for the chi-square test (which replicates the behavior of  
the FET).


For more information see:
Chi-squared and Fisher-Irwin tests of two-by-two tables with small  
sample recommendations

Ian Campbell
Stat in Med 26:3661-3675; 2007
http://www3.interscience.wiley.com/journal/114125487/abstract
and:
How conservative is Fisher's exact test?
A quantitative evaluation of the two-sample comparative binomial trial
Gerald G. Crans, Jonathan J. Shuster
Stat Med. 2008 Aug 15;27(18):3598-611.
http://www3.interscience.wiley.com/journal/117929459/abstract


Frank also has some comments here (bottom of the page):

http://biostat.mc.vanderbilt.edu/wiki/Main/DataAnalysisDisc#Some_Important_Points_about_Cont


More generally, Agresti's Categorical Data Analysis is typically the  
first reference in this domain to reach for. There is also a document  
written by Laura Thompson which provides for a nice R companion to  
Agresti. It is available from:


https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)

2009-06-20 Thread Dr. D. P. Kreil
Dear Stavros,

Thank you very much for your helpful email and your patience.
 Perhaps you are thinking about the case where intermediate results are
 accumulated in higher-than-normal precision.  This technique only applies in
 very specialized circumstances, and it not available to user code in most
 programming languages (including R).

Ah, that's probably where I went wrong. I thought R would take the
0.1, the 0.3, the 3, convert them to extended precision binary
representations, do its calculations, an the reduction to normal
double precision binary floats would only happen when the result was
stored or printed.
Having read your explanations now, I suspect it was unreasonable to expect that.

  I don't know whether R's sum function
 uses this technique or some other (e.g. Kahan summation), but it does manage
 to give higher precision than summation with individual arithmetic
 operators:

     sum(c(2^63,1,-2^63)) = 1
 but
    Reduce(`+`,c(2^63,1,-2^63)) = 0

That is very interesting!

 I would suggest What every computer scientist should know about
 floating-point arithmetic ACM Computing Surveys 23:1 (March 1991) for the
 basics.  Anything by Kahan (http://www.cs.berkeley.edu/~wkahan/) is
 interesting.  Beyond elementary floating-point arithmetic, there is of
 course the vast field of numerical analysis, which underlies many of the
 algorithms used by R and other statistical systems.

Thank you very much for the pointers!

Best regards,
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rd] Floating point precision / guard digits? (PR#13771)

2009-06-20 Thread Dr. D. P. Kreil
Yes, I see the problem now! Thank you for bearing with me and for the
helpful explanations and info.

Best regards,
David.


2009/6/20 Stavros Macrakis macra...@alum.mit.edu:
 On Sat, Jun 20, 2009 at 4:10 PM, Dr. D. P. Kreil dpkr...@gmail.com wrote:

 Ah, that's probably where I went wrong. I thought R would take the
 0.1, the 0.3, the 3, convert them to extended precision binary
 representations, do its calculations, an the reduction to normal
 double precision binary floats would only happen when the result was
 stored or printed.

 This proposal is problematic in many ways.  For example, it would *still*
 not guarantee that 0.3 - 3*0.1 == 0, since extended-precision floats have
 the same characteristics as normal-precision floats.  Would you round to
 normal precision when passing arguments?  Then sqrt could not produce
 extended-precision results. etc. etc.

 I suppose R could support an extended-precision floating-point type, but
 that would require that the *user* choose which operations were in
 extended-precision and which in normal precision. (And of course it would be
 a lot of work to add in a complete and consistent way.)

            -s


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help installing xts package

2009-06-20 Thread R_help Help
Hi,

I am trying to install package xts. I am using OPENSUSE 11 64 bit. When I
invoke:

install.pacakges(xts,dependencies=T)

I received a lot of ERROR: compilation failed for package errors when R
tries to install xts dependencies. I do not understand this behavior.

I notice one thing. It complains.

ERROR: compilation failed for package 'Hmisc'
** Removing '/home/chibi/R/x86_64-unknown-linux-gnu-library/2.8/Hmisc'
* Installing *source* package 'Design' ...
** libs
WARNING: R include directory is empty -- perhaps need to install R-devel.rpm
or similar


Is it because I need R-devel.rpm? What is it? I search google around and
cannot find any explanation. Thank you.

adschai

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Roxygen vs Sweave for S4 documentation

2009-06-20 Thread Ken-JP

Hi,

I have been using R for a while.  Recently, I have begun converting my
package into S4 classes.  I was previously using Rdoc for documentation. 
Now, I am looking to use the best tool for S4 documentation.  It seems that
the best choices for me are Roxygen and Sweave (I am fine with tex).

Are there any users of Roxygen or Sweave who can comment on the strengths or
weaknesses of one or othe other?  Thanks in advance.

- Ken
-- 
View this message in context: 
http://www.nabble.com/Roxygen-vs-Sweave-for-S4-documentation-tp24131590p24131590.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation between categorical data

2009-06-20 Thread J Dougherty
On Saturday 20 June 2009 04:36:55 pm Marc Schwartz wrote:
 On Jun 20, 2009, at 2:05 PM, Jason Morgan wrote:
  On 2009.06.19 14:04:59, Michael wrote:
  Hi all,
 
  In a data-frame, I have two columns of data that are categorical.
 
  How do I form some sort of measure of correlation between these two
  columns?
 
  For numerical data, I just need to regress one to the other, or do
  some pairs plot.
 
  But for categorical data, how do I find and/or visualize correlation
  between the two columns of data?
 
  As Dylan mentioned, using crosstabs may be the easiest way. Also, a
  simple correlation between the two variables may be informative. If
  each variable is ordinal, you can use Kendall's tau-b (square table)
  or tau-c (rectangular table). The former you can calculate with ?cor
  (set method=kendall), the latter you may have to hack something
  together yourself, there is code on the Internet to do this. If the
  data are nominal, then a simple chi-squared test (large-n) or Fisher's
  exact test (small-n) may be more appropriate. There are rules about
  which to use when one variable is ordinal and one is nominal, but I
  don't have my notes in front of me. Maybe someone else can provide
  more assistance (and correct me if I'm wrong :).

 I would be cautious in recommending the Fisher Exact Test based upon
 small samples sizes, as the FET has been shown to be overly
 conservative. 
 
 . . .
There are other ways of regarding the FET.  Since it is precisely what it says 
- an exact test - you can argue that you should avoid carrying over any 
conclusions drawn about the small population the test was applied to and 
employing them in a broader context.  In so far as the test is concerned, the 
sample data and the contingency table it is arrayed in are the entire 
universe.  In that sense, the FET can't be conservative or liberal.  It 
isn't actually a hypothesis test and should not be thought of as one or used 
in the place of one.  

JDougherty


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.