[ 
https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew MacKinlay updated SPARK-19234:
-------------------------------------
    Description: 
If you try and use AFTSurvivalRegression and any label in your input data is 
0.0, you get coefficients of 0.0 returned, and in many cases, errors like this:

{{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to NaN}}

Zero should, I think, be an allowed value for survival analysis. I don't know 
if this is a pathological case for AFT specifically as I don't know enough 
about it, but this behaviour is clearly undesirable. If you have any labels of 
0.0, you get either a) obscure error messages, with no knowledge of the cause 
and coefficients which are all zero or b) no errors messages at all and 
coefficients of zero (arguably worse, since you don't even have console output 
to tell you something's gone awry). If AFT doesn't work with zero-valued 
labels, Spark should fail fast and let the developer know why. If it does, we 
should get results here.


  was:
If you try and use AFTSurvivalRegression and any label in your input data is 
0.0, you get coefficients of 0.0 returned, and in many cases, errors like this:

{{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
function evaluation. Decreasing step size to NaN}}




> AFTSurvivalRegression chokes silently or with confusing errors when any 
> labels are zero
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-19234
>                 URL: https://issues.apache.org/jira/browse/SPARK-19234
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>         Environment: spark-shell or pyspark
>            Reporter: Andrew MacKinlay
>         Attachments: spark-aft-failure.txt
>
>
> If you try and use AFTSurvivalRegression and any label in your input data is 
> 0.0, you get coefficients of 0.0 returned, and in many cases, errors like 
> this:
> {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
> function evaluation. Decreasing step size to NaN}}
> Zero should, I think, be an allowed value for survival analysis. I don't know 
> if this is a pathological case for AFT specifically as I don't know enough 
> about it, but this behaviour is clearly undesirable. If you have any labels 
> of 0.0, you get either a) obscure error messages, with no knowledge of the 
> cause and coefficients which are all zero or b) no errors messages at all and 
> coefficients of zero (arguably worse, since you don't even have console 
> output to tell you something's gone awry). If AFT doesn't work with 
> zero-valued labels, Spark should fail fast and let the developer know why. If 
> it does, we should get results here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to