On 12 Aug 2003, Elliot Coups wrote in part:

> My goal is to determine the percentage (and 95% CI) of individuals who
> are current vs. former vs. never smokers within each of the four time
> since diagnosis groups.

In effect, it sounds as though you wish to estimate:
 <%-current-smokers> as a function of <time-since-diagnosis> and <age>,
 <%-former-smokers> as a function of <time-since-diagnosis> and <age>,
 <%-never-smokers> as a function of <time-since-diagnosis> and <age>;
 and in your current model <time-since-diagnosis> is represented in four
categories (or, as some would say, is polychotomized).

Thinking in these terms, you might wish to change from a two-way
frequency-table model to a logistic regression model, carried out on
three dependent variables (either separately or as a single multivariate
model).  In that case, your four <time-since-diagnosis> groups can be
treated as the levels of a one-way ANOVA (represented in a regression
model by three predictors, which might be organized as orthogonal
linear, quadratic, and cubic components of <time-since-diagnosis>), and
<age> as a linear predictor.  (NB:  you may also wish to consider
modelling quadratic and cubic components of <age>, or some other
(possibly more pertinent) nonlinear function of <age>.)

OTOH, if you also have <time-since-diagnosis> in its original form (not
categorized, but as numbers from 1 (not 0?) to, say, 25:  then you could
use that as a linear predictor (and still include quadratic and cubic
orthogonal components if you wish), along with <age>.

If I were conducting analyses of either sort, I'd want to start with
four scatterplots of <time-since-diagnosis> vs. <age> for each of the
three smoker-groups [current, former, never].  These would offer some
hints as to the probable usefulness of including nonlinear terms (and
what kind of nonlinear terms, if useful) in the model.

However, I think both of the above approaches might produce misleading
results unless (at least in preliminary analyses) you also include
predictors representing the interaction beween <time-since-diagnosis>
and <age>.  There must surely be some such interaction, at least in a
population of interest in which age is not artificially restricted,
since you cannot have any cases for which, say, <age> = 19 and
<time-since-diagnosis> = 21+.

I should have to add that it is rather unclear to me what the utility of
estimating these %s might be.  (What do you intend to do with your
results?)
 And I should think the three smoking levels to be a rather coarse and
insensitive measure;  with respect to the relationship between smoking
behavior and <cancer-diagnosis>, I'd be interested in (e.g.) when
current smokers began smoking and when former smokers stopped smoking,
and possibly in more detailed smoking histories.  (Of course, since you
speak of "analyses on a dataset", more detailed information like this
may simply be unavailable.  In which case your report should mention
this situation as a possibly severe defect in dataset design.)

On 12 Aug 2003, Elliot Coups wrote:

> I'm doing some analyses on a dataset of individuals who have/had
> cancer, looking at the association between smoking status and the
> number of years since the cancer diagnosis. I have three levels of
> smoking status (current, former, never) and four levels of time since
> cancer diagnosis (1-5, 6-10, 11-20, 21+ years ago). My goal is to
> determine the percentage (and 95% CI) of individuals who are current
> vs. former vs. never smokers within each of the four time since
> diagnosis groups. That's simple enough (I can do it by looking at
> frequency crosstabs), but I want to run the analysis while holding age
> constant (since it is related to the time since diagnosis). I don't
> have a large enough sample size to run the analysis stratified by age
> group, so I want to partial out age. What is the best way to do that?
>
> Thank you in advance.
>
> Elliot
>
>
> Elliot Coups, Ph.D.
> Research Fellow
> Department of Psychiatry and Behavioral Sciences
> Memorial Sloan-Kettering Cancer Center
> .
> .
> =================================================================
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at:
> .                  http://jse.stat.ncsu.edu/                    .
> =================================================================
>

 -----------------------------------------------------------------------
 Donald F. Burrill                                         [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to