[ 
https://issues.apache.org/jira/browse/SPARK-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Pentreath resolved SPARK-14891.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

> ALS in ML never validates input schema
> --------------------------------------
>
>                 Key: SPARK-14891
>                 URL: https://issues.apache.org/jira/browse/SPARK-14891
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>            Reporter: Nick Pentreath
>            Assignee: Nick Pentreath
>             Fix For: 2.0.0
>
>
> Currently, {{ALS.fit}} never validates the input schema. There is a 
> {{transformSchema}} impl that calls {{validateAndTransformSchema}}, but it is 
> never called in either {{ALS.fit}} or {{ALSModel.transform}}.
> This was highlighted in SPARK-13857 (and failing PySpark tests 
> [here|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56849/consoleFull])when
>  adding a call to {{transformSchema}} in {{ALSModel.transform}} that actually 
> validates the input schema. The PySpark docstring tests result in Long inputs 
> by default, which fail validation as Int is required.
> Currently, the inputs for user and item ids are cast to Int, with no input 
> type validation (or warning message). So users could pass in Long, Float, 
> Double, etc. It's also not made clear anywhere in the docs that only Int 
> types for user and item are supported.
> Enforcing validation seems the best option but might break user code that 
> previously "just worked" especially in PySpark. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to