[
https://issues.apache.org/jira/browse/SPARK-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119054#comment-15119054
]
Yanbo Liang edited comment on SPARK-13010 at 1/27/16 11:09 AM:
---------------------------------------------------------------
There are two issues that we should discuss:
* 1, Support AFTSurvivalRegression under the SparkR::glm interface or not?
I vote for not, we can have a new function named “survreg”(R have the same
function). “ survreg” also return a PipelineModel like SparkR::glm and can be
predicted by Spark::predict.
We should first reorg SparkRWrappers to make it support more models, although
it’s simple.
* 2, The response variable of the R formula should be pairs for Survival
analysis.
Take R survival analysis as examples:
{code}
survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian, dist="exponential”)
survfit(coxph(Surv(time,censor)~1), type="aalen”)
{code}
It wraps the pair of “labelCol” and “censorCol” as the response variable of R
formula.
So the first step is to make RFormula support pair as label.
One possible way is to support “cbind” in SparkR, it returns a Scala
Tuple2/Vector column and then make the label of RFormula supports the type of
Tuple2/Vector.
GLM with binomial family can also benefit from this feature. But we should also
concern about whether “cbind” conflicts with other functions of SparkR, and we
need to keep consistent semantics.
Looking forward to hear your thoughts. [~mengxr]
was (Author: yanboliang):
There are two issues that we should discuss:
1, Support AFTSurvivalRegression under the SparkR::glm interface or not?
I vote for not, we can have a new function named “survreg”(R have the same
function). “ survreg” also return a PipelineModel like SparkR::glm and can be
predicted by Spark::predict.
We should first reorg SparkRWrappers to make it support more models, although
it’s simple.
2, The response variable of the R formula should be pairs for Survival analysis.
Take R survival analysis as examples:
{code}
survreg(Surv(futime, fustat) ~ ecog.ps + rx, ovarian, dist="exponential”)
survfit(coxph(Surv(time,censor)~1), type="aalen”)
{code}
It wraps the pair of “labelCol” and “censorCol” as the response variable of R
formula.
So the first step is to make RFormula support pair as label.
One possible way is to support “cbind” in SparkR, it returns a Scala
Tuple2/Vector column and then make the label of RFormula supports the type of
Tuple2/Vector.
GLM with binomial family can also benefit from this feature. But we should also
concern about whether “cbind” conflicts with other functions of SparkR, and we
need to keep consistent semantics.
Looking forward to hear your thoughts. [~mengxr]
> Survival analysis in SparkR
> ---------------------------
>
> Key: SPARK-13010
> URL: https://issues.apache.org/jira/browse/SPARK-13010
> Project: Spark
> Issue Type: New Feature
> Components: ML, SparkR
> Reporter: Xiangrui Meng
> Assignee: Yanbo Liang
>
> Implement a simple wrapper of AFTSurvivalRegression in SparkR to support
> survival analysis.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]