[R] S-Plus resample package and associated functions

2007-09-10 Thread Robert A. LaBudde
Are there any packages in R that reproduce the package resample of S-Plus?

The sample() function in R doesn't provide equivalent flexibility of 
bootstrap() and bootstrap2().

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Excel

2007-08-28 Thread Robert A LaBudde
At 01:21 AM 8/28/2007, David wrote:
On Tue, 28 Aug 2007, Robert A LaBudde wrote:

If you format the column as Text, you won't have this problem. By
leaving the cells as General, you leave it up to Excel to guess at
the correct interpretation.

Not true actually. I had converted the column to Text because I saw 
the interpretation as a date in the .xls file. I saved the .csv file 
*after* the column had been converted to Text. Looking at the .csv 
file in a text editor, the entry is correct.
snip

You need to convert the column to Text before you enter the data. 
This tells Excel the presumption to use.

Converting to Text after you enter the values has no effect on 
previously entered values.

I've tested this using Excel 2000.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Excel

2007-08-27 Thread Robert A LaBudde
If you format the column as Text, you won't have this problem. By 
leaving the cells as General, you leave it up to Excel to guess at 
the correct interpretation.

You will note that the conversion to a date occurs immediately in 
Excel when you enter the value. There are many formats to enter dates.

Either pre-format the column as Text, or prefix the individual entry 
with an ' to indicate text.

A similar problem occurs in R's read.table() function when a factor 
has levels that can be interpreted as numbers.

At 10:11 PM 8/27/2007, David wrote:

A common process when data is obtained in an Excel spreadsheet is to save
the spreadsheet as a .csv file then read it into R. Experienced users
might have learned to be wary of dates (as I have) but possibly have not
experienced what just happened to me. I thought I might just share it with
r-help as a cautionary tale.

I received an Excel file giving patient details. Each patient had an ID
code in the form of three letters followed by four digits. (Actually a New
Zealand National Health Identification.) I saved the .xls file as .csv.
Then I opened up the .csv (with Excel) to look at it. In the column of ID
codes I saw: Aug-99. Clicking on that entry it showed 1/08/2699.

In a column of character data, Excel had interpreted AUG2699 as a date.

The .csv did not actually have a date in that cell, but if I had saved the
.csv file it would have.

David Scott


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] chi-square statistics

2007-08-23 Thread Robert A LaBudde
At 12:41 AM 8/24/2007, Karen wrote:
I'm wondering if R has functions to return pvalues with given x-squares and
df. I googled for it but couldn't seem to find one. Appreciate any helps
here.
An example: df=4, x- c(33.69, 32.96, 30.90) which are the statistic for
chi-square, I'd like to get the corresponding pvalues for each values in x.

? pchisq

E.g.

pchisq(x, df, lower.tail=FALSE)


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional logistic regression on n:m matched cohort data

2007-07-22 Thread Robert A. LaBudde
I am designing an interlaboratory validation study for a 
presence/absence alternative method test kit vs. a presence/absence 
reference method test kit.

There will be 10 laboratories conducting tests using both methods. In 
each laboratory, there will be 5 specimens tested, each of the 5 
specimens twice by both methods (alternative, standard).

The total number of data are 10 x 5 x 4 = 200.

The general structure is as follows:

id: sequential result label, 1 to 200.
lab: a factor which has 10 levels, one for each participating lab.
specimen: 1 to 5 in each lab, no connection between labs.
method: A or R.
result: 0 or 1 (absence or presence).

This experiment appears to be equivalent to a matched cohort study. 
The matching is done on specimens, and there are 2 alternative and 2 
reference results for each specimen.

The sketchy description for clogit() of package survival presumes a 
matched case-control study is being analyzed.

I am looking for the correct syntax to use in clogit() for this experiment.

Is

library('survival')
clogit(result ~ lab + strata(specimen), method=exact, data=mydata)

all that is required?

Thanks.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional logistic regression on n:m matched cohort data [Corrected]

2007-07-22 Thread Robert A. LaBudde
[Corrected the model formula to include method.]

I am designing an interlaboratory validation study for a 
presence/absence alternative method test kit vs. a presence/absence 
reference method test kit.

There will be 10 laboratories conducting tests using both methods. In 
each laboratory, there will be 5 specimens tested, each of the 5 
specimens twice by both methods (alternative, standard).

The total number of data are 10 x 5 x 4 = 200.

The general structure is as follows:

id: sequential result label, 1 to 200.
lab: a factor which has 10 levels, one for each participating lab.
specimen: 1 to 5 in each lab, no connection between labs.
method: A or R.
result: 0 or 1 (absence or presence).

This experiment appears to be equivalent to a matched cohort study. 
The matching is done on specimens, and there are 2 alternative and 2 
reference results for each specimen.

The sketchy description for clogit() of package survival presumes a 
matched case-control study is being analyzed.

I am looking for the correct syntax to use in clogit() for this experiment.

Is

library('survival')
clogit(result ~ lab + method + strata(specimen), method=exact, data=mydata)

all that is required?

Thanks.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect size, interactions, and main effects (Stats Question) [ffmanova]

2007-07-21 Thread Robert A LaBudde
Statistical significance is detectability, and depends upon the 
size of the  sample as well as the effect. A large enough experiment 
will result in statistical detectability of almost every interaction 
action term allowed.

This is why estimation, not testing, has become the consensus 
recommendation in statistics.

As a practical matter, evaluate the combined effect of your model 
terms with and without the interaction term(s) you are worried about. 
Is the reduction in accuracy of physical importance? If so, the 
interaction terms are required for scientific reasons. If not, 
present both results and indicate the acceptability (for 
interpolation) of the simpler model.

You should also make it your first priority to hypothecate why the 
interaction terms are meaningful and expected. If a cause can be 
found, it may suggest an alternate model that will eliminate 
interactions, or satisfy your anxiety. If not, it may support your 
argument to simplify.


At 08:58 AM 7/21/2007, Mark wrote:

Dear List Members,

I would very much appreciate any pointers you could give me on the following
matter:

Main Question:
To what extent does the rule that it is unreasonable to talk about main
effects if there are significant interactions in a model depend upon effect
size [of the significant interaction terms]?  Or is this not an issue?

More practically:  Suppose I were to carry out a so-called Type-II MANOVA
(using ffmanova) and were to find that the interaction term in a 2-way
analysis has borderline significance (say p = 0.045) and a small effect
size, whereas one of the main effects is highly signficant (say p = 6.8e-10)
and has a large effect size.

Would it in this case be reasonable for me to ignore the interaction term,
and talk only about main effects?  And, presuming the main question is fair,
are there general guidlines concerning the relationship between level of
significance and effect size for interaction terms.

Thank you in advance for your help,


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] No convergence using ADAPT

2007-07-07 Thread Robert A LaBudde
What versions of adapt and R are you using? The current package was 
built with R-2.5.1.

I tried your program with R-2.5.0, and got the answer 0.1501053 in 
just a few seconds.

At 03:20 PM 7/7/2007, Philip wrote:
I am trying calculate a probability using numerical integration. The first
program I ran spit out an answer in a very short time. The program is below:

## START PROGRAM

trial - function(input)

{
pmvnorm(lower = c(0,0), upper = c(2, 2), mean = input, sigma = 
matrix(c(.1, 0,
0, .1), nrow = 2, ncol = 2, byrow = FALSE))
}

require(mvtnorm)
require(adapt)

bottomB - -5*sqrt(.1)
topB - 2 + 5*sqrt(.1)
areaB - (topB - bottomB)^2

unscaled.Po.in.a - adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB),
minpts = 1000, eps = 1e-4, functn = trial)

(1/areaB)*unscaled.Po.in.a$value

## FINISH PROGRAM

I tried to run the program again changing a.) sigma in the trial 
function, b.)
upper in the trial function, and c.) the bounds of integration; that is,
bottomB and topB.  The new program is below:

## START PROGRAM

trial - function(input)

{
pmvnorm(lower = c(0,0), upper = c(10, 10), mean = input, sigma = 
matrix(c(.01,
0, 0, .01), nrow = 2, ncol = 2, byrow = FALSE))
}

require(mvtnorm)
require(adapt)

bottomB - -5*sqrt(.01)
topB - 10 + 5*sqrt(.01)
areaB - (topB - bottomB)^2

unscaled.Po.in.a - adapt(2, lo = c(bottomB, bottomB), up = c(topB, topB),
minpts = 1000, eps = 1e-4, functn = trial)

(1/areaB)*unscaled.Po.in.a$value

## FINISH PROGRAM

Now, the program just runs and runs (48 hours at last count!).  By playing
around with the program, I have deduced the program is highly sensitive to
changing the upper option in the trial function.  For example, using a vector
like (4, 4) causes no problems and the program quickly yields an answer.  I
have a couple of other programs where I can easily obtain a simulation-based
answer, but I would ultimately like to know what's up with this 
program before
I give up on it so I can learn a thing or two.  Does anyone have any clues or
tricks to get around this problem?  My guess is that it will simply be very
difficult (impossible?) to obtain this type of relative error (eps = 
1e-4) and
I will have no choice but to pursue the simulation approach.

Thanks for any responses ([EMAIL PROTECTED])!

-- Phil

Philip Turk
Assistant Professor of Statistics
Northern Arizona University
Department of Mathematics and Statistics
PO Box 5717
Flagstaff, AZ 86011
Phone: 928-523-6884
Fax: 928-523-5847
E-mail: [EMAIL PROTECTED]
Web Site: http://jan.ucc.nau.edu/~stapjt-p/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] outlying

2007-06-19 Thread Robert A LaBudde
At 05:29 AM 6/19/2007, elyakhlifi wrote:
hello,
are there functions to detecte outlying observations in samples?
thanks.

library('car')
? outlier.test

library('outliers')
? grubbs.test
? dixon.test
? cochran.test
? chisq.out.test

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimization

2007-06-18 Thread Robert A LaBudde
You don't need optimization for the solution to your problem. You 
just need an understanding of the meaning of qnorm() and some simple algebra.

Try: x- (0.01-0.0032)/qnorm(0.7,0,1)


At 12:01 PM 6/18/2007, you wrote:

Hi, I would like to minimize the value of x1-x2, x2 is a fixed value of 0.01,
x1 is the quantile of normal distribution (0.0032,x) with probability of
0.7, and the changing value should be x. Initial value for x is 0.0207. I am
using the following codes, but it does not work.

fr - function(x) {
   x1-qnorm(0.7,0.0032,x)
   x2=0.01
   x1-x2
}
xsd - optim(0.0207, fr, NULL,method=BFGS)

It is the first time I am trying to use optimization. Could anyone give me
some advice?
--
View this message in context: 
http://www.nabble.com/Optimization-tf3941212.html#a11178663
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Iterative Solver [Converting Matlab's solve()]

2007-06-18 Thread Robert A LaBudde
At 12:24 AM 6/19/2007, Boris wrote:
I can't for the life of me figure out how to get the roots for this simple
(but sovable only iteratively) equation in R:

x = z*(1-(1-2/z)^y

where x and y are known, and z is unknown. In Matlab, this amounts to:

[my.solution] = solve('x = z*(1-(1-2/z)^y')
my.solution.real = solution(y-1,y)

% bottom line displays non-imaginary solution (last element)
snip

1.  Your equation is syntactically incorrect. It is missing a ). 
Ditto for your Matlab example.

2. Are you looking for an analytical (symbolic) solution? If so, I 
don't believe one exists (if you place a ) after y).

3. To find numerical roots, see uniroot().

4. If you want more help, you'll have to specify domains or values for x and y.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Not R question]: Better fit for order probit model

2007-06-17 Thread Robert A LaBudde
At 01:29 PM 6/17/2007, adschai wrote:
Thank you so much Robert. Please find the information below.

The scale 1-10 are subjective physical condition ratings scored by 
inspection engineers at the site. 1-5 are in very bad conditions 
(bridge close down to seriously deteriorated). The rest from 6-10 
are categorized as good condition.However, the proportion of samples 
in my data are as follows. Bridges with 1-5 scale covers about 2-3% 
of total population. The majority of the bridges are in 7-8. 
Certainly, I have enough data for model estimation but the mix of 
the proportion is good. I attached the detail of condition rating 
scale at the end of this message.
snip

My guess is that you really have two distributions here: 1) Bridges 
that are basically under proper supervision and repair (Categories 
6-10), and those that are not Categories 1-5). These two classes 
would probably have dramatically different relations to the 
covariates your are using.

So my recommendation would be to consider splitting your response 
into two classes (0/1), each with 5 sub categories.

Keeping the two classes merged together in a single group is 
equivalent to merging two different distributions with a weighting 
factor. This may be causing a bimodal distribution which is giving 
you problems.

You could try your modeling on each of the two classes separately 
before continuing with your full dataset modeling. You may learn 
something useful to help you with your problems.

For the full model, you would need to include a full set of 
interaction terms with class.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Not R question]: Better fit for order probit model

2007-06-16 Thread Robert A LaBudde
At 03:17 AM 6/16/2007, adschai wrote:
Thank you so much Robert. I haven't thought about the idea of 
clumping categories together. One of the reason is because these 
categories are bridge condition rating scores. They indeed represent 
different meaning and serviceability conditions. They vary from 0-9. 
I have about 300,000 data in which the first 5 labels, i.e. 0-4, are 
bad condition bridge and there are only less than 1000 instances in 
total. The worst case, is for example, score 0 (meaning the bridge 
is not operatable), I have 60 instances. Score 1 I have about 100.

I would appreciate if you could provide some opinion as to how you 
would make the order probit fits better in this case? Thank you so 
much in advance.

You certainly appear to have enough data to populate these 
categories. Your problems in a getting a good fit may relate to other problems.

You need to supply more information in order to say more.

What are the definitions of each category?

Is the ordering consistent, or are there really two different scales, 
one for bridge with essentially no problems, and another for those 
with serious damage?

What evidence do you have that your fit is poor?

What model are you fitting?

Etc.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Not R question]: Better fit for order probit model

2007-06-15 Thread Robert A LaBudde
At 09:31 PM 6/15/2007, adschai wrote:
I have a model which tries to fit a set of data with 10-level 
ordered responses. Somehow, in my data, the majority of the 
observations are from level 6-10 and leave only about 1-5% of total 
observations contributed to level 1-10. As a result, my model tends 
to perform badly on points that have lower level than 6.

I would like to ask if there's any way to circumvent this problem or 
not. I was thinking of the followings ideas. But I am opened to any 
suggestions if you could please.

1. Bootstrapping with small size of samples each time. Howevever, in 
each sample basket, I intentionally sample in such a way that there 
is a good mix between observations from each level. Then I have to 
do this many times. But I don't know how to obtain the true standard 
error of estimated parameters after all bootstrapping has been done. 
Is it going to be simply the average of all standard errors 
estimated each time?

2. Weighting points with level 1-6 more. But it's unclear to me how 
to put this weight back to maximum likelihood when estimating 
parameters. It's unlike OLS where your objective is to minimize 
error or, if you'd like, a penalty function. But MLE is obviously 
not a penalty function.

3. Do step-wise regression. I will segment the data into two 
regions, first points with response less than 6 and the rest with 
those above 6. The first step is a binary regression to determine if 
the point belongs to which of the two groups. Then in the second 
step, estimate ordered probit model for each group separately. The 
question here is then, why I am choosing 6 as a cutting point 
instead of others?

Any suggestions would be really appreciated. Thank you.

You could do the obvious, and lump categories such as 1-6 or 1-7 
together to make a composite category.

You don't mention the size of your dataset. If there are 10,000 data, 
you might live with a 1% category. If you only have 100 data, you 
have too many categories.

Also, next time plan your study and training better so that next time 
your categories are fully utilized. And don't use so many categories. 
People have trouble even selecting responses on a 5-level scale.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Appropriate regression model for categorical variables

2007-06-12 Thread Robert A LaBudde
At 01:45 PM 6/12/2007, Tirtha wrote:
Dear users,
In my psychometric test i have applied logistic regression on my data. My
data consists of 50 predictors (22 continuous and 28 categorical) plus a
binary response.

Using glm(), stepAIC() i didn't get satisfactory result as misclassification
rate is too high. I think categorical variables are responsible for this
debacle. Some of them have more than 6 level (one has 10 level).

Please suggest some better regression model for this situation. If possible
you can suggest some article.

1. Using if a factor has many levels, there is a natural order to the 
levels. If so, consider fitting the factor as an ordered factor.

2. Break the factor levels into 2 or 3 groups that have some rational 
connection. Then fit the factor with a smaller number of levels. 
E.g., race might have levels white, black, asian, pacific, 
Spanish surname, other. Consider a change to white, nonwhite.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pretty report

2007-06-12 Thread Robert A LaBudde
At 09:13 PM 6/12/2007, Don wrote:
At 5:01 PM -0400 6/12/07, Weiwei Shi wrote:
 Dear Listers:
 
 I have a couple of data frames to report and each corresponds to
 different condtions, e.g. conditions=c(10, 15, 20, 25). In this
 examples, four data frames need to be exported in a pretty report.
 
 I knew Perl has some module for exporting data to Excel and after
 googling, I found R does not.

I use write.table(), name the file with .xls as the suffix, then
outside R I double-click on it and it opens in Excel. Granted, it's a
text file, and Excel is opening a text file, but none the less, I
snip

Note that files with a .csv extension are also associated with 
Excel and can opened with a double-click. Comma-separated-value files 
also can be unambiguously loaded by Excel without parsing.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p-value from GEE why factor 2*pnorm?

2007-06-11 Thread Robert A LaBudde
At 11:21 AM 6/11/2007, Carmen wrote:
snip
In this tread there is a hint hwo to calculate the p-vlue of an GEE:
  _http://finzi.psych.upenn.edu/R/Rhelp02a/archive/74150.html
 
  Then, get the P values using a normal approximation for the
  distribution of z:
 
  / 2 * pnorm(abs(coef(summary(fm1))[,5]), lower.tail = FALSE) /
  (Intercept) TPTLD  0. 0.04190831

1. why is the result multiplicated  with 2? There is a P-value between 1 and 2
with the results below and multiplicated with 2:

2*pnorm(c(1.8691945,0.5882351,2.4903091,1.9287802,2.3172983,2.2092593,2.2625959,1.6395695),
lower.tail =TRUE)

1. The given in the thread mentioned was:

 2 * pnorm(abs(coef(summary(fm1))[,5]), lower.tail = FALSE)

2. The reason for the 2 at the front is to make it an equal-tails 
or 2-sided confidence interval. Pedantically, you should use 1.96 
instead of 2.0 for consistency, but 2.0 = 1.96 rounded to one decimal place.

3. This is what is usually called a Wald type confidence interval, 
as it is simply the normal quantile (+/- 1.96) multiplied by the 
standard error of estimate to get the +/- widths for the interval. 
These would be added to the estimate itself to get the final Wald 
confidence interval, which obviously assumes a normal distribution applies.



Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Windows vista's early terminate Rgui execution

2007-06-10 Thread Robert A LaBudde
At 03:28 PM 6/10/2007, [EMAIL PROTECTED] wrote:
Hi,I have a frustrating problem from vista that I wonder if anyone 
has come across the same problem. I wrote a script that involves 
long computational time (although, during the calculation, it spits 
out text on the gui to notify me the progress of the calculation 
periodically). Windows vista always stopped my calculation and 
claimed that 'Rgui is stop-working. Windows is checking for 
solution.' And when I looked into task manager, windows already 
stopped my Rgui process. I am quite disappointed with this. I would 
really appreciate if anyone finds a solution to go around this 
windows vista problem? Particularly, how to turn off this feature in 
vista? Any help would be really appreciated. Thank you!- adschai

You probably need to contact Vista periodically so it knows you are awake.

Just include a line that does a call to Vista that doesn't do output, such as

useless - dir()

placed in some outer loop that satisfies the drop dead time between calls.

Alternatively, you can attempt to find out how to change the registry 
entry corresponding to the wait time and increase it to a value you 
can live with.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What ECDF function?

2007-06-09 Thread Robert A LaBudde
At 12:57 PM 6/9/2007, Marco wrote:
snip
2.I found various version of P-P plot  where instead of using the
ecdf function use ((1:n)-0.5)/n
   After investigation I found there're different definition of ECDF
(note i is the rank):
   * Kaplan-Meier: i/n
   * modified Kaplan-Meier: (i-0.5)/n
   * Median Rank: (i-0.3)/(n+0.4)
   * Herd Johnson i/(n+1)
   * ...
   Furthermore, similar expressions are used by ppoints.
   So,
   2.1 For P-P plot, what shall I use?
   2.2 In general why should I prefer one kind of CDF over another one?
snip

This is an age-old debate in statistics. There are many different 
formulas, some of which are optimal for particular distributions.

Using i/n (which I would call the Kolmogorov method), (i-1)/n or 
i/(n+1) is to be discouraged for general ECDF modeling. These 
correspond in quality to the rectangular rule method of integration 
of the bins, and assume only that the underlying density function is 
piecewise constant. There is no disadvantage to using these methods, 
however, if the pdf has multiple discontinuities.

I tend to use (i-0.5)/n, which corresponds to integrating with the 
midpoint rule, which is a 1-point Gaussian quadrature, and which is 
exact for linear behavior with derivative continuous. It's simple, 
it's accurate, and it is near optimal for a wide range of continuous 
alternatives.

The formula (i- 3/8)/(n + 1/4) is optimal for the normal 
distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so 
there is no real benefit to using it. Similarly, there is a formula 
(i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure 
(don't need to test) the form of the distribution, you're better off 
fitting that distribution function directly and not worrying about the edf.

Also remember that edfs are not very accurate, so the differences 
between these formulae are difficult to justify in practice.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What ECDF function?

2007-06-09 Thread Robert A LaBudde
At 06:36 PM 6/9/2007, Marco wrote:
On 6/9/07, Robert A LaBudde [EMAIL PROTECTED] wrote:
  At 12:57 PM 6/9/2007, Marco wrote:
  snip
snip

Hmmm I'm a bit confused, but very interested!
So you don't use the R ecdf, do you?

Only when an i/n edf is needed (some tests, such as ks.test() are 
based on this). Also, I frequently do modeling in Excel as well, 
where you need to enter your own formulas.

snip
  Also remember that edfs are not very accurate, so the differences
  between these formulae are difficult to justify in practice.
 

I will bear in min! My first interpretation was that using some
different from i/n (e.g. i/(n+1)), let to better individuate tail 
differences (maybe...)

The chief advantage to i/(n+1) is that you don't generate 1.0 as an 
abscissa, as you do with i/n. But the same is true of (i-0.5)/n, and 
it's more accurate.

Unless you need to do otherwise, just use ecdf(), because it matches 
the theory for most uses, and it almost always doesn't matter that 
it's slightly less accurate than other choices.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pnorm how to decide lower-tail true or false

2007-06-08 Thread Robert A LaBudde
At 01:31 PM 6/8/2007, Carmen wrote:
Hi to all,
maybe the last question was not clear enough.
I did not found any hints how to decide whether it should use lower.tail
or not.
As it is an extra R-feature ( written in
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66250.html )
I do not find anything about it in any statistical books of me.
Regards Carmen

pnorm(z, lower.tail=TRUE) (the R default) gives the probability of a 
normal variate being at or below z. This is the value commonly called 
the cumulative distribution function at the point z, or the integral 
from -Inf to z of the gaussian density.

pnorm(z, lower.tail=FALSE) gives the complement of the above, or 1 - 
cdf(z), and is the integral from z to Inf of the gaussian density.

E.g.,

  pnorm(1.96, lower.tail=TRUE)
[1] 0.9750021
  pnorm(1.96, lower.tail=FALSE)
[1] 0.02499790
 

Use lower.tail=TRUE if you are, e.g., finding the probability at the 
lower tail of a confidence interval or if you want to the probability 
of values no larger than z.

Use lower.tail=FALSE if you are, e.g., trying to calculate test value 
significance or at the upper confidence limit, or you want the 
probability of values z or larger.

You should use pnorm(z, lower.tail=FALSE) instead of 1-pnorm(z) 
because the former returns a more accurate answer for large z.

This is really simple issue, and has no inherent complexity 
associated with it.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glm() for log link and Weibull family

2007-06-08 Thread Robert A. LaBudde
I need to be able to run a generalized linear model with a log() link 
and a Weibull family, or something similar to deal with an extreme 
value distribution.

I actually have a large dataset where this is apparently necessary. 
It has to do with recovery of forensic samples from surfaces, where 
as much powder as possible is collected. This apparently causes the 
results to conform to some type of extreme value distribution, so 
Weibull is a reasonable starting point for exploration.

I have tried ('surface' and 'team' are factors)

glm(surfcount ~ surface*team, data=powderd, family=Gamma(link='log'))

but this doesn't quite do the trick. The standardized deviance 
residuals are still curved away from normal at the tails.

Thanks for any info you can give on this nonstandard model.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding density curve peak

2007-06-02 Thread Robert A LaBudde
At 07:34 PM 6/2/2007, Davendra wrote:
I have a dataset where I get a density curve of a continuous variable with
two peaks.
How can I get the peaks?
Any simple solutions?

Plot the curve, and the use identify() and the mouse to click on the peaks.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to reference or sort rownames in a data frame

2007-05-27 Thread Robert A. LaBudde
As I was working through elementary examples, I was using dataset 
plasma of package HSAUR.

In performing a logistic regression of the data, and making the 
diagnostic plots (R-2.5.0)

data(plasma,package='HSAUR')
plasma_1- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
layout(matrix(1:4,nrow=2))
plot(plasma_1)

I find that data points corresponding to rownames 17 and 23 are 
outliers and high leverage.

I would then like to perform a fit without these two rows.

In principle this should be easy, using an update() with subset=-c(17,23).

The problem is that the rownames in this dataset are not ordered, 
and, in fact, the relevant rows are 30 and 31, not 17 and 23.

This brings up the following (elementary?) questions:

1. How do you reference rows in subset= for which you know the 
rownames, but not the row numbers?

2. How do you discovery the rows corresponding to particular 
rownames? (Using plasma[rownames(plasma)==17,] shows the data, but 
NOT the row number!) (Probably the same answer as in Q. 1 above.)

3. How do you sort (order) the rows of an existing data frame so that 
the rownames are in order?

I don't seem to know the magic words to find the answers to these 
questions in the help systems.

Obviously this can be done by writing new, brute force, functions 
scanning the subscripts, but there must be an (obvious?) direct way 
of doing this more elegantly.

Thanks for any pointers.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reference or sort rownames in a data frame

2007-05-27 Thread Robert A LaBudde
Thanks, Gabor.

I have to say I wouldn't have figured this out easily.

I'd summarize your comments by:

1. Remember to use arrays of logicals as indices.
2. Remember %in% for combination matches.
3. Remember which() to get indices.

It is the small tasks which appear most difficult to figure out in R.

At 10:29 PM 5/27/2007, Gabor wrote:
On 5/27/07, Robert A. LaBudde [EMAIL PROTECTED] wrote:
As I was working through elementary examples, I was using dataset
plasma of package HSAUR.

In performing a logistic regression of the data, and making the
diagnostic plots (R-2.5.0)

data(plasma,package='HSAUR')
plasma_1- glm(ESR ~ fibrinogen * globulin, data=plasma, family=binomial())
layout(matrix(1:4,nrow=2))
plot(plasma_1)

I find that data points corresponding to rownames 17 and 23 are
outliers and high leverage.

I would then like to perform a fit without these two rows.

In principle this should be easy, using an update() with subset=-c(17,23).

The problem is that the rownames in this dataset are not ordered,
and, in fact, the relevant rows are 30 and 31, not 17 and 23.

This brings up the following (elementary?) questions:

1. How do you reference rows in subset= for which you know the
rownames, but not the row numbers?

Use a logical vector:

   rownames(plasma) %in% c(17, 23)


2. How do you discovery the rows corresponding to particular
rownames? (Using plasma[rownames(plasma)==17,] shows the data, but
NOT the row number!) (Probably the same answer as in Q. 1 above.)

  which(rownames(plasma) %in% c(17, 23)) # 30, 31


3. How do you sort (order) the rows of an existing data frame so that
the rownames are in order?


  plasma[order(as.numeric(rownames(plasma))), ]


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Curve crosses back to origin in plot

2007-05-27 Thread Robert A. LaBudde
Another sample problem: In the Windows version of R-2.5.0,

data(GHQ,package='HSAUR')
layout(1)
GHQ_glm_1- glm(cbind(cases,non.cases) ~ GHQ, data=GHQ, family=binomial())
summary(GHQ_glm_1)
yfit_glm_1- predict(GHQ_glm_1, type='response')
layout(1)
plot(probs ~ GHQ,pch=1,col=1,ylab='Probability(Case)', data=GHQ)
lines(yfit_glm_1 ~ GHQ, pch=3,col=3, data=GHQ)
legend(topleft, c(linear, logistic), pch=c(2,3), col=c(2,3))

Everything is fine, but the predicted values curve in the lines() 
statement becomes closed by a straight line segment connecting the 
last point to the first.

How can this be avoided? It appears to be wrapping due to the change 
from GHQ=10 for female to GHQ=0 again for male, bring the curve back 
to the beginning value.

One way to avoid this is to plot each sex's data in a separate lines() call.

Is there a way of doing this in a single lines() statement? If not, 
I'd like to know. If two lines() are needed, I'd like to know an 
efficient syntax. My attempt would be

lines(yfit_glm_1[0:10] ~ GHQ[0:10], pch=3,col=3, data=GHQ)
lines(yfit_glm_1[11:22] ~ GHQ[11:22], pch=3,col=3, data=GHQ)

which seems inelegant, as it involves numerical ranges that have to 
be determined by inspection.

Thanks again for answering these simple questions that seem to be the 
hardest to find answers for.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Log-likelihood function

2007-05-02 Thread Robert A LaBudde
At 07:30 AM 5/2/2007, Doxastic wrote:
Thanks.  I used this and it gave me the same result as the logLik function.
The reason I ask is the SAS output gives me a loglik = 1089.  R gives me
-298.09583.  Both for my reduced model.  For the saturated (or complex)
model, SAS gives me an loglik = 1143.  R gives me -298.1993.  The problem is
these give two very different pictures about whether I can drop the
interaction.  However, I think the residual deviance in the R output is
equal to G^2.  So, I can just take the difference between those two.  If I
do this, I get a difference with an interpretation similar to that of what
comes from SAS.  So I think I'll just go with that.  But who knows if I'm
right (not me)?

Some comments:

1. Use summary() on your glm() object to get a fuller display of 
post-fit statistics, including the starting (null) and residual deviances.

2. The deviance is - 2 L, where L = ln(likelihood).

3. To test two nested models for the difference in covariates, 
subtract the two residual deviances and two d.f. and perform a 
chi-square test. This can be done nicely by anova() on the two glm() objects.

4. Check the coefficients in your SAS and R models and make sure you 
are performing the same fit in both cases.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] upgrade to 2.5

2007-05-02 Thread Robert A LaBudde
At 01:41 PM 5/2/2007, you wrote:
On 5/2/07, Sundar Dorai-Raj [EMAIL PROTECTED] wrote:
 
 
  Iasonas Lamprianou said the following on 5/2/2007 8:25 AM:
   Hi I am using R version 2.4.1. How can I upgrade to version 2.5 
 without having to install all the packages again?
   Thanks
   Jason
  
 
  You may find the following link relevant.
 
  http://finzi.psych.upenn.edu/R/Rhelp02a/archive/75359.html
 

if you use Windows XP.

This link was useful to me, as I am new to R. (Win2000, R-2.5.0)

What I have been doing is using a file compare utility (Beyond 
Compare in my case) to move files in the old library directory to 
the new one, if the files are missing in the new one. Then I perform 
an update.packages command.

This procedure appears to work without problem.

It would seem much preferable to have all packages saved in an 
installation-independent directory, instead of a library directory 
under R's installation directory. Then, of course, no update would be 
necessary.

I can't find how this option is settable in R, other than a direct 
argument to library() or install.package().

How does one shift the R default libraries location to a particular directory?

Thanks.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is R's fast fourier transform function different from fft2 in Matlab?

2007-05-02 Thread Robert A LaBudde
Discrete Fourier transforms can be normalized in different ways.

Some apply the whole normalization to the forward transform, some to 
the reverse transform, some apply the square root to each, and some 
don't normalize at all (in which case the reverse of the forward 
transform will need scaling).

The latter apparently the case with R, according to your values.

Note that the R and the MatLab answers agree to within a scale factor 
for each row.

At 10:53 PM 5/2/2007, Li-Li wrote:
Thanks for both replies.
Then I found the ifft2 from Matlab gives different result from fft( ,
inverse=T) from R.
An example:
in R:
  temp  - matrix(c(1,4,2, 20), nrow=2)
  fft(temp)
[,1]   [,2]
[1,]  27+0i -17+0i
[2,] -21+0i  15+0i
  fft(temp,inverse=T)
[,1]   [,2]
[1,]  27+0i -17+0i
[2,] -21+0i  15+0i

In Matlab:
  A = [1,2;4,20];
  fft2(A)
Ans =
27-17
   -21 15
 ifft2(A)
Ans=
6.7500-4.2500
   -5.2500  3.7500

I also tried mvfft with inverse but can't get same result with ifft2. Does
any function work?


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival statistics--displaying multiple plots

2007-05-02 Thread Robert A LaBudde
? layout()
? par()

E.g.,

layout(matrix(c(1,2,3),1,3,byrow=TRUE) #3 plots side-by-side

Then use plot() three times to generate each of your graphs.

At 11:14 PM 5/2/2007, Greg wrote:
I should clarify. I can generate plots for each category individually but
not for all three on the same chart.

Greg

-Original Message-
From: Gregory Pierce [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 02, 2007 10:21 PM
To: 'r-help@stat.math.ethz.ch'
Subject: Survival statistics--displaying multiple plots

Hello all!

I am once again analyzing patient survival data with chronic liver disease.

The severity of the liver disease is given by a number which is continuously
variable. I have referred to this number as meld--model for end stage
liver disease--which is the result of a mathematical calculation on
underlying laboratory values. So, for example, I can generate a Kaplan-Meier
plot of patients undergoing a TIPS procedure with the following:

 plot(survfit(Surv(days,status==1),subset(tips,meld10))

where tips is my data set, days is the number of days alive, and meld is
the meld score.

What I would like to do is display the survival graphs of patients with
meld10, 10meld20, and meld20. I am unsure about how to go about this.

Any suggestions would be appreciated.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival statistics--displaying multiple plots

2007-05-02 Thread Robert A LaBudde
The layout() function below states there are to be 3 graphs on the 
first row, side by side.

Each time you plot, R applies the graph to the next free position on 
the layout. After 3 plots with your different boolean subsets, you 
end up with 3 graphs side by side.

If this is not what you want (you asked for this in your first 
request), and instead you want 3 curves on the same, single graph, 
use plot() for the first curve(s) and then use lines() to add the 
additional curves.

Check the survfit() object to see how to pick off individual times 
and survival curves by subscripting.

At 12:09 AM 5/3/2007, Greg wrote:
Thanks for replying Robert. Forgive me, it might be the hour or my
limitations, but I am a little unclear on how to implement your suggestion.

In my original example,

 plot(survfit(Surv(days,status==1),subset(tips,meld10))

A plot of the fraction of patients surviving following the procedure against
the number of days since the procedure would be generated for patients with
meld scores of less than 10.

Similarly, if I wanted to generate a survival curve of patients with scores
of between 10 and 20, I can with the following:

  plot(survfit(Surv(days,status==1),subset(tips,meld10  meld 20))


And for patients with meld20,

 plot(survfit(Surv(days,status==1),subset(tips,meld20))


But how do I display the curves in each cadre (meld10, 10meld20, and
meld20) on the same chart?


-Original Message-
From: Robert A LaBudde [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 02, 2007 11:48 PM
To: Gregory Pierce
Subject: Re: [R] Survival statistics--displaying multiple plots

? layout()
? par()

E.g.,

layout(matrix(c(1,2,3),1,3,byrow=TRUE) #3 plots side-by-side

Then use plot() three times to generate each of your graphs.

At 11:14 PM 5/2/2007, Greg wrote:
 I should clarify. I can generate plots for each category individually but
 not for all three on the same chart.
 
 Greg
 
 -Original Message-
 From: Gregory Pierce [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 02, 2007 10:21 PM
 To: 'r-help@stat.math.ethz.ch'
 Subject: Survival statistics--displaying multiple plots
 
 Hello all!
 
 I am once again analyzing patient survival data with chronic liver disease.
 
 The severity of the liver disease is given by a number which is
continuously
 variable. I have referred to this number as meld--model for end stage
 liver disease--which is the result of a mathematical calculation on
 underlying laboratory values. So, for example, I can generate a
Kaplan-Meier
 plot of patients undergoing a TIPS procedure with the following:
 
  plot(survfit(Surv(days,status==1),subset(tips,meld10))
 
 where tips is my data set, days is the number of days alive, and meld
is
 the meld score.
 
 What I would like to do is display the survival graphs of patients with
 meld10, 10meld20, and meld20. I am unsure about how to go about this.
 
 Any suggestions would be appreciated.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bivariate normal distribution in likelihood

2007-04-30 Thread Robert A LaBudde
At 11:32 PM 4/30/2007, Deepankar wrote:
Hi all,

I am trying to do a maximum likelihood estimation. In my likelihood 
function, I have to evaluate a double integral of the bivariate 
normal density over a subset of the points of the plane. For 
instance, if I denote by y the y co-ordinate and by x, the x 
co-ordinate then the area over which I have to integrate is the 
following set of points, A:
A = {x=0  y  3x+10}

I have used the following code to calculate this double integral:

x - rmvnorm(10, mean=me, sigma=sig)
c - nrow(x)
x1 - x[ ,1]
x2 - x[ ,2]
e1 - as.numeric(x2  3*x1 + 10  x10)
p1 - sum(e1)/c

In this code, I provide the mean and covariance while drawing the 
bivariate random normal variables and get p1 as the required 
answer. The problem is that I have to draw at least 100,000 
bivariate random normals to get a reasonable answer; even then it is 
not very accurate.

Is there some other way to do the same thing more accurately and 
more efficiently? For instance, can this be done using the bivariate 
normal distribution function pmvnorm? Also feel free to point our 
errors if you see one.

Simple random sampling is a poor way to evaluate an integral 
(expectation). It converges on the order of 1/sqrt(N).

Stratified random sampling would be better, as it converges on the 
order of 1/N.

Even better is product Gauss-Hermite quadrature which will give a 
very accurate answer with a few dozen points.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to solve difficult equations?

2007-04-25 Thread Robert A LaBudde
At 03:15 AM 4/25/2007, francogrex wrote:

This below is not solvable with uniroot to find a:
fn=function(a){
b=(0.7/a)-a
(1/(a+b+1))-0.0025
}
uniroot(fn,c(-500,500))  gives
Error in uniroot(fn, c(-500, 500)) : f() values at end points not of
opposite sign

I read R-help posts and someone wrote a function:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/92407.html
but it is not very precise. Is there any 'standard function in R that can
solve this? thanks.

Actually, if you're solving fn(a)==0, then some trivial algebra leads 
to a linear equation with a=0.001754.

Why are you trying to solve this numerically? Is it a single instance 
of a larger, more general problem?


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use of Lexis function to convert survival data to counting format

2007-04-24 Thread Robert A. LaBudde
I'm trying to convert a dataset from the time-independent analysis form

  head(addicts)
   id clinic status survtime prison meth clinic01
1  1  1  1  428  0   501
2  2  1  1  275  1   551
3  3  1  1  262  0   551

into the counting data format necessary to perform extended Cox regression.

I got survSplit() to work, but it appears to have a problem with end 
times on the boundaries of the intervals. (My intervals are [0,183], 
[184,365], [366,548], [549,730], [730,Inf]). I looked in the 
searchable archives and found that others had also discovered 
problems with survSplit() and suggested Lexis() in the Epi package.

When I try to get Lexis to work, I get an error message that I cannot 
interpret:

  
addictscdf-Lexis(entry=start,exit=stop,fail='status',breaks=c(184,366,549,730),
+   include=c('id','clinic','prison','meth','clinic01'),data=addicts)

 The following object(s) are masked from addicts :

  clinic id meth prison status survtime

Error: object is not subsettable


I'm sure I'm making some type of trivial error here, but I don't know 
how to find it. I would guess that it has something to do with 
'start' and 'stop', of which arguments I am apparently clueless as to 
the meaning.

Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using R to create pdf's from each file in a directory

2007-04-20 Thread Robert A LaBudde
At 09:40 PM 4/20/2007, gecko951 wrote:
snip
list - dir(/tmp/data)
for(x in list){
d - read.table(x, sep=\t, header=TRUE) # read data
pdf(/tmp/graph/x.pdf)  # file for graph
snip

I'm a tyro at R, but it's obvious here that the line

pdf(/tmp/graph/x.pdf)

has the intended file name string 'x' embedded with the literal 
string enclosed in quotation marks. Obviously he needs /tmp/graph/ 
string concatenated with x and then .pdf.

I would suggest a fix, but I am unable to find in the documentation 
how to concatenate two strings into a single string.

So I will amplify gecko951's question to include How you you 
concatenate two strings in R?. I.e.,

x-abc
y-def

What operator or function returns abcdef as a result?

Thanks.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using R to create pdf's from each file in a directory

2007-04-20 Thread Robert A LaBudde
At 11:06 PM 4/20/2007, I wrote:
At 09:40 PM 4/20/2007, gecko951 wrote:
 snip
 list - dir(/tmp/data)
 for(x in list){
 d - read.table(x, sep=\t, header=TRUE) # read data
 pdf(/tmp/graph/x.pdf)  # file for graph
 snip

I'm a tyro at R, but it's obvious here that the line

pdf(/tmp/graph/x.pdf)

has the intended file name string 'x' embedded with the literal
string enclosed in quotation marks. Obviously he needs /tmp/graph/
string concatenated with x and then .pdf.

I would suggest a fix, but I am unable to find in the documentation
how to concatenate two strings into a single string.

So I will amplify gecko951's question to include How you you
concatenate two strings in R?. I.e.,

x-abc
y-def

What operator or function returns abcdef as a result?
snip

I found that paste() can be used for string concatenation, so I can 
now suggest a fix to gecko951's problem. Replace

pdf(/tmp/graph/x.pdf)

with

pdf(paste(paste('/tmp/graph/',x,sep=''),'.pdf',sep=''))

and the program should now work for the different file names.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.