Re: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)

2008-10-13 Thread Pedro.Rodriguez
Hi Maithili,

There are two good papers that illustrate how to compare classifiers
using Sensitivity and Specificity and their extensions (e.g., likelihood
ratios, young index, KL distance, etc).

See: 
1) Biggerstaff, Brad, 2000, Comparing diagnostic tests: a simple
graphic using likelihood ratios, Statistics in Medicine, 19:649-663.

2) Lee, Wen-Chung, 1999, Selecting diagnostic tests for ruling out or
ruling in disease: the use of the Kllback-Leibler distance,
International Epidemiological Association, 28:521-525.

Please let me know if have problems finding the aforementioned papers.

Kind Regards,

Pedro


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Maithili Shiva
Sent: Monday, October 13, 2008 3:28 AM
To: r-help@r-project.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [R] Fw: Logistic regresion - Interpreting (SENS) and (SPEC)


Dear Mr Peter Dalgaard and Mr Dieter Menne,

I sincerely thank you for helping me out with my problem. The thing is
taht I already have calculated SENS = Gg / (Gg + Bg) = 89.97%
and SPEC = Bb / (Bb + Gb) = 74.38%.

Now I have values of SENS and SPEC, which are absolute in nature. My
question was how do I interpret these absolue values. How does these
values help me to find out wheher my model is good.

With regards

Ms Maithili Shiva








 Subject: [R] Logistic regresion - Interpreting (SENS) and (SPEC)
 To: r-help@r-project.org
 Date: Friday, October 10, 2008, 5:54 AM
 Hi
 
 Hi I am working on credit scoring model using logistic
 regression. I havd main sample of 42500 clentes and based on
 their status as regards to defaulted / non - defaulted, I
 have genereted the probability of default.
 
 I have a hold out sample of 5000 clients. I have calculated
 (1) No of correctly classified goods Gg, (2) No of correcly
 classified Bads Bg and also (3) number of wrongly classified
 bads (Gb) and (4) number of wrongly classified goods (Bg).
 
 My prolem is how to interpret these results? What I have
 arrived at are the absolute figures.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dump decision trees of randomForest object

2008-10-09 Thread Pedro.Rodriguez
Hi Chris,

Maybe it is easier if you try the following C++ library

http://mtv.ece.ucsb.edu/benlee/librf.html


Regards,


Pedro



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Christian Sturz
Sent: Thursday, October 09, 2008 4:30 PM
To: Liaw, Andy; r-help@r-project.org
Subject: Re: [R] Dump decision trees of randomForest object

I've tried the getTree() function and printed a decision tree with
print().
However, it seems to me that it's hard to parse this representation and
translate it into equivalent if-then-else C constructs. Are there no
other
ways to dump the trees into a more hierarchical form?

What do you exactly mean with the prediction in the source package?

Maybe what I wanted to ask goes in the same direction: let's say I've
learned
a random forest model from a learning set. Now I would like to use it in
the
future as classifier to predict new examples. How can this be done? Can
I save
a learned model and than invoke R with new examples and applied them to
the saved model without again training the random forest from scratch?
If so,
please give me some hints how to do that.

Regards,
Chris

 Original-Nachricht 
 Datum: Thu, 9 Oct 2008 14:38:44 -0400
 Von: Liaw, Andy [EMAIL PROTECTED]
 An: Christian Sturz [EMAIL PROTECTED], r-help@r-project.org
 Betreff: RE: [R] Dump decision trees of randomForest object

 See the getTree() function in the package.  Also, the source package
 contains C code that does the prediction that you may be able to work
 from.
 
 Andy 
 
 From: Christian Sturz
  
  Hi,
  
  I'm using the package randomForest to generate a classifier 
  for the exemplary
  iris data set:
  
  data(iris)
  iris.rf-randomForest(Species~.,iris)
  
  Is it possible to print all decision trees in the generated forest?
  If so, can the trees be also written to disk?
  
  What I actually need is to translate the decision trees in a 
  random forest
  into equivalent C++ if-then-else constructs to integrate them in a
C++
  project. Have this been done in the past and are there already any
  implemented approaches/parser for that?
  
  Cheers,
  Chris
  --
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 Notice:  This e-mail message, together with any
attach...{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to validate model?

2008-10-07 Thread Pedro.Rodriguez
Hi Frank,

Thanks for your feedback! But I think we are talking about two different
things.

1) Validation: The generalization performance of the classifier. See,
for example, Studies on the Validation of Internal Rating Systems by
BIS. 

2) Calibration: Correct calibration of a PD rating system means that the
calibrated PD estimates are accurate and conform to the observed default
rates. See, for instance, An Overview and Framework for
PD Backtesting and Benchmarking, by Castermans et al. 

Frank, you are referring the #1 and I am referring to #2. 

Nonetheless, I would never create a rating system if my model doesn't
discriminate better than a coin toss.

Regards,

Pedro 







-Original Message-
From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 07, 2008 11:02 AM
To: Rodriguez, Pedro
Cc: [EMAIL PROTECTED]; r-help@r-project.org
Subject: Re: [R] How to validate model?

[EMAIL PROTECTED] wrote:
 Usually one validates scorecards with the ROC curve, Pietra Index, KS
 test, etc. You may be interested in the WP 14 from BIS (www.bis.org).
 
 Regards,
 
 Pedro

No, the validation should be done using an absolute reliability 
(calibration) curve.  You need to verify that at all levels of predicted

risk there is agreement with the true probability of failure.  An ROC 
curve does not do that, and I doubt the others do.  A 
resampling-corrected loess calibration curve is a good approach as 
implemented in the Design package's calibrate function.

Frank

 
 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
 On Behalf Of Maithili Shiva
 Sent: Tuesday, October 07, 2008 8:22 AM
 To: r-help@r-project.org
 Subject: [R] How to validate model?
 
 Hi!
 
 I am working on scorecard model and I have arrived at the regression
 equation. I have used logistic regression using R.
 
 My question is how do I validate this model? I do have hold out sample
 of 5000 customers.
 
 Please guide me. Problem is I had never used Logistic regression
earlier
 neither I am used to credit scoring models.
 
 Thanks in advance
 
 Maithili
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt
University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to validate model?

2008-10-07 Thread Pedro.Rodriguez
Hi,

Yes, from my humble opinion, it doesnt make any sense to use the (2-class) ROC 
curve for a rating system. For example, if the classifier predicts 100% for all 
the defaulted exposures and 0% for the good clients, then even though we have a 
perfect classifier we have a bad rating system. 

However, if we use the multi-class version of Hand and Till (2001), we may test 
how good is the model to discriminate between classes or ratings. 

Hand, David J. and Robert J. Till, A Simple Generalisation of the Area Under 
the ROC Curve for Multiple Class Classification Problems, Machine Learning, 
Vol. 45, No. 2, (November 2001), pp. 171-186.

Regards,

Pedro 


-Original Message-
From: Ajay ohri [mailto:[EMAIL PROTECTED]
Sent: Tue 10/7/2008 6:46 PM
To: Frank E Harrell Jr
Cc: Rodriguez, Pedro; r-help@r-project.org
Subject: Re: [R] How to validate model?
 
the purpose of validating indirect measures such as ROC curves.

Biggest Purpose- It is useful while in more marketing /sales meeting context ;)

Also , Deciles specific performance is easy to explain and monitor for faster 
execution/re modeling.

Regards,

Ajay


On Wed, Oct 8, 2008 at 4:01 AM, Frank E Harrell Jr [EMAIL PROTECTED] wrote:


Ajay ohri wrote:


This is an approach

Run the model variables on hold out sample.

Check and compare ROC curves between build and validation 
datasets.

Check for changes in parameter estimates (co efficients of 
variables) p value and signs.

Check for binning (response versus deciles of individual 
variables).

Check concordance, and KS Statistic.
A decile wise performance of the model in terms of predicted 
versus actual, rank ordering of deciles, helps in explaining the model to 
business audience who generally have some business specific input that may 
require scoring model to be tweaked.

This assumes multicollinearity, outliers and missing value 
treatment have already been done, and holdout sample checks for overfitting. 
You can always rebuild the model using a different random holdout sample.

A stable model would not change too much.

In actual implementation , try and build real time triggers for 
deviations (%) between predicted and actual.

Regards,

Ajay



I wouldn't recommend that approach but legitimate differences of 
opinion exist on the subject.  In particular I fail to see the purpose of 
validating indirect measures such as ROC curves.

Frank




www.decisionstats.com http://www.decisionstats.com

On Wed, Oct 8, 2008 at 1:33 AM, Frank E Harrell Jr [EMAIL 
PROTECTED] mailto:[EMAIL PROTECTED] wrote:


   [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

   Hi Frank,

   Thanks for your feedback! But I think we are talking 
about two
   different
   things.

   1) Validation: The generalization performance of the 
classifier.
   See,
   for example, Studies on the Validation of Internal 
Rating
   Systems by
   BIS.


   I didn't think the desire was for a classifier but instead 
was for a
   risk predictor.  If prediction is the goal, classification 
methods
   or accuracy indexes based on classifications do not work 
very well.



   2) Calibration: Correct calibration of a PD rating 
system means
   that the
   calibrated PD estimates are accurate and conform to the 
observed
   default
   rates. See, for instance, An Overview and Framework for
   PD Backtesting and Benchmarking, by Castermans et al.


   I'm unclear on what you mean here.  Correct calibration of a
   predictive system means that the UNcalibrated estimates are 
accurate
   (i.e., they don't need any calibration).  (What is PD?)



   Frank, you are referring the #1 and I am referring to #2.
   Nonetheless, I would never create a rating system if my 
model
   doesn't
   discriminate better than a coin toss.


   For sure

Re: [R] random normally distributed values within range

2008-10-06 Thread Pedro.Rodriguez
Hi Achaz,

Maybe you are interested in the generalized beta distribution?

To the best of my knowledge, there is no way to restrict the values of
normal deviates, since one may end up with a different distribution.

Regards,

Pedro 

 



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Achaz von Hardenberg
Sent: Monday, October 06, 2008 6:55 PM
To: r-help@r-project.org
Subject: [R] random normally distributed values within range

Hi all,
I need to create 100 normally distributed random values (X) which can  
not exceed a specific range (i.e. 0XY).
With rnorm I cannot specify Max and min  values among which values  
have to stay, like in runif so does some other simple way exist to do  
this with normally distributed random values?

thanks a lot in advance,

Dr. Achaz von Hardenberg



Centro Studi Fauna Alpina - Alpine Wildlife Research Centre
Servizio Sanitario e della Ricerca Scientifica
Parco Nazionale Gran Paradiso, Degioz, 11, 11010-Valsavarenche (Ao),  
Italy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bias in sample - Logistic Regression

2008-10-02 Thread Pedro.Rodriguez
Hi Shiva,

Maybe you are interested in the following paper:

Learning when Training Data are Costly: The Effect of Class Distribution
on Tree Induction. G. Weiss and F. Provost.  Journal of Artificial
Intelligence Research 19 (2003) 315-354.

For validating the models in those enviroments, 

William Elazmeh, Nathalie Japkowicz, Stan Matwin. (2006). A Framework
for Comparative Evaluation of Classifiers in the Presence of Class
Imbalance. Proceedings of the third Workshop on ROC Analysis in Machine
Learning, Pittsburgh, USA.

Regards,

Pedro

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Wensui Liu
Sent: Wednesday, October 01, 2008 7:20 PM
To: [EMAIL PROTECTED]
Cc: r-help@r-project.org
Subject: Re: [R] Bias in sample - Logistic Regression

Hi, Shiva,

The idea of reject inference is very simple. Let's assume a credit card
environment. There are 100 applicants, out of which 50 will be approved
and
booked in. Therefore, we can only observe the adverse behavior, such as
default and delinquency, of 50 booked accounts. Again, let's assume out
of
50 booked cards, 5 are bad(default / delinquency). A normal thought is
to
build a model to cherry pick bad guys and then apply the same model to
all
applicants.

However, we can only observed the behavior of the applicants booked,
which
is 50, but not all applicants, which is 100. Therefore, the model result
looks better than what it is supposed to be. This is so-called 'sample
bias'. The same thing can happen to healthcare or direct marketing as
well.

Luckily enough, many people have done some excellent work on this
problem.
Please do some readings by Heckman. Greene in NYU has paper in this area
as
well. And I believe there is also implementation in R. If you use
SAS(large
in industry), take a look at proc qlim.

HTH.

-- 
===
WenSui Liu
Acquisition Risk, Chase
Email : [EMAIL PROTECTED]
Blog   : statcompute.spaces.live.com
===

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression problem

2008-10-01 Thread Pedro.Rodriguez
Hi Bernardo,

Do you have to use logistic regression? If not, try Random Forests... It has 
worked for me in past situations when I have to analyze huge datasets. 

Some want to understand the DGP with a simple linear equation; others want high 
generalization power. It is your call... See, e.g.,  
www.cis.upenn.edu/group/datamining/ReadingGroup/papers/breiman2001.pdf.

Maybe you are also interested in AD-HOC, an algorithm for feature selection, 
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.99.9130


Regards,

Pedro

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
Sent: Wednesday, October 01, 2008 12:01 PM
To: Frank E Harrell Jr; [EMAIL PROTECTED]
Cc: r-help@r-project.org
Subject: Re: [R] Logistic regression problem

From: Frank E Harrell Jr
 
 Bernardo Rangel Tura wrote:
  Em Ter, 2008-09-30 às 18:56 -0500, Frank E Harrell Jr escreveu:
  Bernardo Rangel Tura wrote:
  Em Sáb, 2008-09-27 às 10:51 -0700, milicic.marko escreveu:
  I have a huge data set with thousands of variable and one binary
  variable. I know that most of the variables are 
 correlated and are not
  good predictors... but...
 
  It is very hard to start modeling with such a huge 
 dataset. What would
  be your suggestion. How to make a first cut... how to 
 eliminate most
  of the variables but not to ignore potential interactions... for
  example, maybe variable A is not good predictor and 
 variable B is not
  good predictor either, but maybe A and B together are good
  predictor...
 
  Any suggestion is welcomed
 
  milicic.marko
 
  I think do you start with a rpart(binary variable~.)
  This show you a set of variables to start a model and the 
 start set to
  curoff  for continous variables
  I cannot imagine a worse way to formulate a regression 
 model.  Reasons 
  include
 
  1. Results of recursive partitioning are not trustworthy 
 unless the 
  sample size exceeds 50,000 or the signal to noise ratio is 
 extremely high.
 
  2. The type I error of tests from the final regression 
 model will be 
  extraordinarily inflated.
 
  3. False interactions will appear in the model.
 
  4. The cutoffs so chosen will not replicate and in effect 
 assume that 
  covariate effects are discontinuous and piecewise flat.  
 The use of 
  cutoffs results in a huge loss of information and power 
 and makes the 
  analysis arbitrary and impossible to interpret (e.g., a 
 high covariate 
  value:low covariate value odds ratio or mean difference is 
 a complex 
  function of all the covariate values in the sample).
 
  5. The model will not validate in new data.
  
  Professor Frank,
  
  Thank you for your explain.
  
  Well, if my first idea is wrong what is your opinion on the 
 following
  approach?
  
  1- Make PCA with data excluding the binary variable
  2- Put de principal components in logistic model
  3- After revert principal componentes in variable (only if is
  interesting for milicic.marko)
  
  If this approach is wrong too what is your approach?
 
 
 Hi Bernardo,
 
 If there is a large number of potential predictors and no previous 
 knowledge to guide the modeling, principal components (PC) is 
 often an 
 excellent way to proceed.  The first few PCs can be put into 
 the model. 
   The result is not always very interpretable, but you can 
 decode the 
 PCs by using stepwise regression or recursive partitioning (which are 
 safer in this context because the stepwise methods are not exposed to 
 the Y variable).  You can also add PCs in a stepwise fashion in the 
 pre-specified order of variance explained.
 
 There are many variations on this theme including nonlinear principal 
 components (e.g., the transcan function in the Hmisc package) 
 which may 
 explain more variance of the predictors.

While I agree with much of what Frank had said, I'd like to add some points.

Variable selection is a treacherous business whether one is interested in
prediction or inference.  If the goal is inference, Frank's book is a
must read, IMHO.  (It's great for predictive model building, too.)

If interaction is of high interest, principal components are not going
to give you that.

Regarding cutpoint selection:  The machine learners have found that
the `optimal' split point for a continuous predictor in tree algorithms
are extremely variable, that interpreting them would be risky at best.
Breiman essentially gave up on interpretation of a single tree when he
went to random forests.

Best,
Andy

 
 Frank
 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   
 Vanderbilt University
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any 

Re: [R] ROC curve from logistic regression

2008-09-08 Thread Pedro.Rodriguez
Hi

Try the following reference:

Comparison of Three Methods for Estimating the
Standard Error of the Area under the Curve in ROC
Analysis of Quantitative Data by Hajian-Tilaki and Hanley, Academic
Radiology, Vol 9, No 11, November 2002.

Below is a simple implementation that will return both the AUC and its
standard error (DeLong et al method). 

Hope this helps...

Pedro


#Input: yreal [-1,1]

auc - function(yreal,forecasts){

sizeT -nrow(yreal)
pos - 0
for(i in 1:sizeT){
if(yreal[i]0) {pos - pos + 1}
}
neg - sizeT-pos
yrealpos - vector(length=pos)
yrealneg - vector(length=neg)
forepos  - vector(length=pos)
foreneg  - vector(length=neg)

controlpos - 1
controlneg - 1
for(i in 1:sizeT){
if(yreal[i]0) {
yrealpos[controlpos] - yreal[i]
forepos[controlpos]  - forecasts[i]
controlpos - controlpos + 1
} else {
yrealneg[controlneg] - yreal[i]
foreneg[controlneg] - forecasts[i]
controlneg - controlneg + 1
}
}
oper - 0
for( i in 1:pos){
for(j in 1:neg){
if(forepos[i]  foreneg[j]) {oper - oper + 1}
if(forepos[i]==foreneg[j]) {oper - oper + 0.50
} 
}
}

area - oper/(pos*neg)
vpj - vector(length=pos)
vqk - vector(length=neg)
oper - 0
for(i in 1:pos){
for(j in 1:neg){
if(forepos[i]  foreneg[j]) {oper -
oper + 1 
} else {if(forepos[i]==foreneg[j]) {oper
- oper + 0.50 }} 
}
division - oper/neg
resta - (division-area)^2
vpj[i] - resta
oper - 0
}
oper  - 0
resta - 0
for(j in 1:neg){
for(i in 1:pos){
if(forepos[i]  foreneg[j]) {oper -
oper + 1 
} else {if(forepos[i]==foreneg[j]) {oper
- oper + 0.50 }} 
}
division - oper/pos
resta - (division-area)^2  
vqk[j] - resta
oper   - 0
}
vpj - vpj/(pos*(pos-1))
vqk - vqk/(neg*(neg-1))
var - sum(vpj)+sum(vqk)
s   - sqrt(var)

return(list(AUC=area, std=s))
}

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Frank E Harrell Jr
Sent: Monday, September 08, 2008 8:22 AM
To: gallon li
Cc: r-help
Subject: Re: [R] ROC curve from logistic regression

gallon li wrote:
 I know how to compute the ROC curve and the empirical AUC from the
logistic
 regression after fitting the model.
 
 But here is my question, how can I compute the standard error for the
AUC
 estimator resulting form logistic regression? The variance should be
more
 complicated than AUC based on known test results. Does anybody know a
 reference on this problem?


The rcorr.cens function in the Hmisc package will compute the std. error

of Somers' Dxy rank correlation.  Dxy = 2*(C-.5) where C is the ROC 
area.  This standard error does not include a variance component for the

uncertainty in the model (e.g., it does not penalize for the estimation 
of the regression coefficients if you are estimating the coefficients 
and assessing ROC area on the same sample).

The lrm function in the Design package fits binary and ordinal logistic 
regression models and reports C, Dxy, and other measures.

I haven't seen an example where drawing the ROC curve provides useful 
information that leads to correct actions.

Frank
-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt
University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpolation Problems

2008-09-02 Thread Pedro.Rodriguez
Hi Steve,

It could be the case that you are trying to find values that are not in
the range of values you are providing.

For example,
 
x - c(1,2,3,4,5)
y - c(10,11,12,13,14)
xout - c(0.01,0.02)
approx(x,y,xout,method=linear)

R's output:
$x
[1] 0.01 0.02

$y
[1] NA NA

If you want to see the value of 10 when you Xs are below 1 and 14 when
the Xs are above 5, then code below may help.

Regards,

Pedro


interpolation_test  - function(data,cum_prob,xout)
{
y   - vector(length=length(xout))
for(i in 1:length(xout))
{
ValueToCheck - xout[i]
j   -1
while(cum_prob[j]  ValueToCheck  j  length(cum_prob) -2)
{
j - j + 1
}

y0  - data[j]
x0  - cum_prob[j]

y1  - data[j+1]
x1  - cum_prob[j+1]

if(x0==ValueToCheck)
{
y[i]- y0 
} else {

y[i]- y0 +  (ValueToCheck-x0)*(y1-y0)/(x1-x0)
}
}
return(y)
}




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Steve Murray
Sent: Monday, September 01, 2008 6:17 PM
To: r-help@r-project.org
Subject: [R] Interpolation Problems


Dear all,

I'm trying to interpolate a dataset to give it twice as many values (I'm
giving the dataset a finer resolution by interpolating from 1 degree to
0.5 degrees) to match that of a corresponding dataset.

I have the data in both a data frame format (longitude column header
values along the top with latitude row header values down the side) or
column format (in the format latitude, longitude, value).

I have used Google to determine 'approxfun' the most appropriate command
to use for this purpose - I may well be wrong here though! Nevertheless,
I've tried using it with the default arguments for the data frame (i.e.
interp - approxfun(dataset) ) but encounter the following errors:

 interp - approxfun(JanAv)
Error in approxfun(JanAv) : 
  need at least two non-NA values to interpolate
In addition: Warning message:
In approxfun(JanAv) : collapsing to unique 'x' values


However, there are no NA values! And to double-check this, I did the
following:

 JanAv[is.na(JanAv)] - 0

...to ensure that there really are no NAs, but receive the same error
message each time.

With regard to the latter 'collapsing to unique 'x' values', I'm not
sure what this means exactly, or how to deal with it.


Any words of wisdom on how I should go about this, or whether I should
use an alternative command (I want to perform a simple (e.g. linear)
interpolation), would be much appreciated.


Many thanks for any advice offered,

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Import GAUSS .FMT files

2007-12-18 Thread Pedro.Rodriguez
Dear All,

 

Is it possible to import GAUSS .FMT files into R? 

 

Thanks for your time.

 

Kind Regards,

 

Pedro N. Rodriguez

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simulate an AR(1) process via distributions? (without specifying a model specification)

2007-11-28 Thread Pedro.Rodriguez
Dear All,

 

Is it possible to simulate an AR(1) process via a distribution? 

 

I have simulated an AR(1) process the usual way (that is, using a model
specification and using the random deviates in the error), and used the
generated time series to estimate 3- and 4-parameter distributions (for
instance, GLD). However, the random deviates generated from these
distributions do not follow the specified AR process.   

 

Any comment and feedback will be more than welcome.

 

Thanks for your time.

 

Pedro N. Rodriguez

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Factorial, L-moments, and overflows

2007-09-16 Thread Pedro.Rodriguez
Hi everyone,

In the package POT, there is a function that computes the L-moments of a given 
sample (samlmu). However, to compute those L-moments, one needs to obtain the 
total number of combinations between two numbers, which, by the way, requires 
the use of a factorial. See, for example, Hosking (1990 , p. 113).

How does the function samlmu in the package POT avoids overflows?

I was trying to build from scratch a R function similar to samlmu and ran 
into overflows (Just for my educational purposes :o) ). Is there a trick that I 
am missing to avoid overflows in the factorial function? 

Thank you very much for your time. 

Pedro N. Rodriguez
SSRN Homepage: http://ssrn.com/author=412141/ 
Homepage: http://www.pnrodriguez.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.