Pietro Pugni created MADLIB-1040:
------------------------------------
Summary: Survival Analysis - Cox regression model for
time-dependent covariates
Key: MADLIB-1040
URL: https://issues.apache.org/jira/browse/MADLIB-1040
Project: Apache MADlib
Issue Type: Wish
Components: Module: Cox Proportional Hazards
Reporter: Pietro Pugni
Fix For: v2.0
This JIRA follows a discussion opened on the user mailing list (
http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201611.mbox/browser
).
The actual Cox model implented in MadLib (
https://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html
) only supports time-independent covariates and doesn't provide any structure
for time-dependent covariates, where a subject has one or more rows for
different time-varying periods. This version of the CPH model is much more
useful in survival analysis because it accounts for changes of covariates
effect over time.
To provide some input, here are some good reference links:
- "Using Time Dependent Covariates and Time Dependent Coefficients in the Cox
Model", by T Thernau:
https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf
- "Time-dependent Covariates in Cox Regression":
http://www.math.ucsd.edu/~rxu/math284/slect7.pdf
- "Time-dependent covariates in the Cox Proportoinal-Hazards Regression
Model", by LD Fisher:
https://pdfs.semanticscholar.org/f970/7f0dd6ff04899d7a3323668ee9ed1b9ad28e.pdf
This is the article used by Thernau to implement the counting process algorithm
in the R survival package:
- "Cox's regression model for counting processes: a large sample study", by
Andersen and Gill:
https://projecteuclid.org/download/pdf_1/euclid.aos/1176345976
As far as I know, the counting process algorithm is the fastest used in CPH
models. The counter parts is that user has to provide a verticalized dataset
with a row per time changes within each subject. The formula used in the
coxph() function provided with the survival package is the following:
coxph(data = df, formula = Surv(start, stop, event) ~ cluster(subject.id) +
covariate.1 + covariate.2 + ... + covariate.n)
where covariates can be factors (categorical variables) or numeric.
In the linked documentation you can find some examples of counting process
datasets.
Thank you everyone
Pietro Pugni
PS: this is my first JIRA. I hope to opened it correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)