[
https://issues.apache.org/jira/browse/SPARK-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256084#comment-15256084
]
Nick Pentreath commented on SPARK-14891:
----------------------------------------
[~srowen] [~mengxr] [~josephkb] thoughts? Also, while this has come up, I know
that {{ALS.train}} is DeveloperApi can can accept arbitrary input types for
IDs. How bad is the performance hit for allowing at least {{Long}} and
{{String}} types for user/item ids? We could open that up in the ML API (with
appropriate warnings about performance if it's really a major issue).
> ALS in ML never validates input schema
> --------------------------------------
>
> Key: SPARK-14891
> URL: https://issues.apache.org/jira/browse/SPARK-14891
> Project: Spark
> Issue Type: Bug
> Components: ML
> Reporter: Nick Pentreath
>
> Currently, {{ALS.fit}} never validates the input schema. There is a
> {{transformSchema}} impl that calls {{validateAndTransformSchema}}, but it is
> never called in either {{ALS.fit}} or {{ALSModel.transform}}.
> This was highlighted in SPARK-13857 (and failing PySpark tests
> [here|https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56849/consoleFull])when
> adding a call to {{transformSchema}} in {{ALSModel.transform}} that actually
> validates the input schema. The PySpark docstring tests result in Long inputs
> by default, which fail validation as Int is required.
> Currently, the inputs for user and item ids are cast to Int, with no input
> type validation (or warning message). So users could pass in Long, Float,
> Double, etc. It's also not made clear anywhere in the docs that only Int
> types for user and item are supported.
> Enforcing validation seems the best option but might break user code that
> previously "just worked" especially in PySpark.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]