Nick Pentreath created SPARK-14891:
--------------------------------------

             Summary: ALS in ML never validates input schema
                 Key: SPARK-14891
                 URL: https://issues.apache.org/jira/browse/SPARK-14891
             Project: Spark
          Issue Type: Bug
          Components: ML
            Reporter: Nick Pentreath


Currently, {{ALS.fit}} never validates the input schema. There is a 
{{transformSchema}} impl that calls {{validateAndTransformSchema}}, but it is 
never called in either {{ALS.fit}} or {{ALSModel.transform}}.

This was highlighted in SPARK-13857 (and failing PySpark tests 
[here|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56849/consoleFull])when
 adding a call to {{transformSchema}} in {{ALSModel.transform}} that actually 
validates the input schema. The PySpark docstring tests result in Long inputs 
by default, which fail validation as Int is required.

Currently, the inputs for user and item ids are cast to Int, with no input type 
validation (or warning message). So users could pass in Long, Float, Double, 
etc. It's also not made clear anywhere in the docs that only Int types for user 
and item are supported.

Enforcing validation seems the best option but might break user code that 
previously "just worked" especially in PySpark. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to