[ 
https://issues.apache.org/jira/browse/MADLIB-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pietro Pugni updated MADLIB-1040:
---------------------------------
    Description: 
This JIRA follows a discussion opened on the user mailing list ( 
http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201611.mbox/browser
 ).

The actual Cox model implented in MADlib ( 
https://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html
 ) only supports time-independent covariates and doesn't provide any structure 
for time-dependent covariates, where a subject has one or more rows for 
different time-varying periods. This version of the CPH model is much more 
useful in survival analysis because it accounts for changes of covariates 
effect over time.

To provide some input, here are some good reference links:
 - "Using Time Dependent Covariates and Time Dependent Coefficients in the Cox 
Model", by T Thernau: 
https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf
 - "Time-dependent Covariates in Cox Regression": 
http://www.math.ucsd.edu/~rxu/math284/slect7.pdf
 - "Time-dependent covariates in the Cox Proportoinal-Hazards Regression 
Model", by LD Fisher: 
https://pdfs.semanticscholar.org/f970/7f0dd6ff04899d7a3323668ee9ed1b9ad28e.pdf
 
This is the article used by Thernau to implement the counting process algorithm 
in the R survival package:
 -  "Cox's regression model for counting processes: a large sample study", by 
Andersen and Gill: 
https://projecteuclid.org/download/pdf_1/euclid.aos/1176345976

As far as I know, the counting process algorithm is the fastest used in CPH 
models. The counter part is that user has to provide a verticalized dataset 
with a row per time changes within each subject. The formula used in the 
coxph() function provided with the survival package is the following:

coxph(data = df, formula = Surv(start, stop, event) ~ cluster(subject.id) + 
covariate.1 + covariate.2 + ... + covariate.n)

where covariates can be factors (categorical variables) or numeric. In the 
linked documentation you can find some examples of counting process datasets.

Counting process is also the only dataset format supported by any R survival 
analysis package. SAS supports both counting process and longitudinal format. 
The longitudinal format is far more slow, but requires less user development 
time and effort in order to create the dataset. Here are some hints:
 - "Survival Analysis Using SAS - A practical Guide - Second Edition - Paul D. 
Allison - SAS Publishing", ISBN 978-1-59994-640-5, in particular Chapter 5 
starting from page 153.
 - "Your Survival Guide to Using Time-Dependent Covariates", by Powel and 
Bagnell: 
http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201611.mbox/browser


Thank you everyone
 Pietro Pugni

PS: this is my first JIRA. I hope to have it done the right way.


  was:
This JIRA follows a discussion opened on the user mailing list ( 
http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201611.mbox/browser
 ).

The actual Cox model implented in MADlib ( 
https://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html
 ) only supports time-independent covariates and doesn't provide any structure 
for time-dependent covariates, where a subject has one or more rows for 
different time-varying periods. This version of the CPH model is much more 
useful in survival analysis because it accounts for changes of covariates 
effect over time.

To provide some input, here are some good reference links:
 - "Using Time Dependent Covariates and Time Dependent Coefficients in the Cox 
Model", by T Thernau: 
https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf
 - "Time-dependent Covariates in Cox Regression": 
http://www.math.ucsd.edu/~rxu/math284/slect7.pdf
 - "Time-dependent covariates in the Cox Proportoinal-Hazards Regression 
Model", by LD Fisher: 
https://pdfs.semanticscholar.org/f970/7f0dd6ff04899d7a3323668ee9ed1b9ad28e.pdf
 
This is the article used by Thernau to implement the counting process algorithm 
in the R survival package:
 -  "Cox's regression model for counting processes: a large sample study", by 
Andersen and Gill: 
https://projecteuclid.org/download/pdf_1/euclid.aos/1176345976

As far as I know, the counting process algorithm is the fastest used in CPH 
models. The counter parts is that user has to provide a verticalized dataset 
with a row per time changes within each subject. The formula used in the 
coxph() function provided with the survival package is the following:

coxph(data = df, formula = Surv(start, stop, event) ~ cluster(subject.id) + 
covariate.1 + covariate.2 + ... + covariate.n)

where covariates can be factors (categorical variables) or numeric.

In the linked documentation you can find some examples of counting process 
datasets.

Thank you everyone
 Pietro Pugni

PS: this is my first JIRA. I hope to opened it correctly.



> Survival Analysis - Cox regression model for time-dependent covariates
> ----------------------------------------------------------------------
>
>                 Key: MADLIB-1040
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1040
>             Project: Apache MADlib
>          Issue Type: Wish
>          Components: Module: Cox Proportional Hazards
>            Reporter: Pietro Pugni
>             Fix For: v2.0
>
>
> This JIRA follows a discussion opened on the user mailing list ( 
> http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201611.mbox/browser
>  ).
> The actual Cox model implented in MADlib ( 
> https://madlib.incubator.apache.org/docs/latest/group__grp__cox__prop__hazards.html
>  ) only supports time-independent covariates and doesn't provide any 
> structure for time-dependent covariates, where a subject has one or more rows 
> for different time-varying periods. This version of the CPH model is much 
> more useful in survival analysis because it accounts for changes of 
> covariates effect over time.
> To provide some input, here are some good reference links:
>  - "Using Time Dependent Covariates and Time Dependent Coefficients in the 
> Cox Model", by T Thernau: 
> https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf
>  - "Time-dependent Covariates in Cox Regression": 
> http://www.math.ucsd.edu/~rxu/math284/slect7.pdf
>  - "Time-dependent covariates in the Cox Proportoinal-Hazards Regression 
> Model", by LD Fisher: 
> https://pdfs.semanticscholar.org/f970/7f0dd6ff04899d7a3323668ee9ed1b9ad28e.pdf
>  
> This is the article used by Thernau to implement the counting process 
> algorithm in the R survival package:
>  -  "Cox's regression model for counting processes: a large sample study", by 
> Andersen and Gill: 
> https://projecteuclid.org/download/pdf_1/euclid.aos/1176345976
> As far as I know, the counting process algorithm is the fastest used in CPH 
> models. The counter part is that user has to provide a verticalized dataset 
> with a row per time changes within each subject. The formula used in the 
> coxph() function provided with the survival package is the following:
> coxph(data = df, formula = Surv(start, stop, event) ~ cluster(subject.id) + 
> covariate.1 + covariate.2 + ... + covariate.n)
> where covariates can be factors (categorical variables) or numeric. In the 
> linked documentation you can find some examples of counting process datasets.
> Counting process is also the only dataset format supported by any R survival 
> analysis package. SAS supports both counting process and longitudinal format. 
> The longitudinal format is far more slow, but requires less user development 
> time and effort in order to create the dataset. Here are some hints:
>  - "Survival Analysis Using SAS - A practical Guide - Second Edition - Paul 
> D. Allison - SAS Publishing", ISBN 978-1-59994-640-5, in particular Chapter 5 
> starting from page 153.
>  - "Your Survival Guide to Using Time-Dependent Covariates", by Powel and 
> Bagnell: 
> http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201611.mbox/browser
> Thank you everyone
>  Pietro Pugni
> PS: this is my first JIRA. I hope to have it done the right way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to