[
https://issues.apache.org/jira/browse/SPARK-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983644#comment-14983644
]
Bryan Cutler edited comment on SPARK-10158 at 10/31/15 7:05 AM:
----------------------------------------------------------------
I think the best way to handle this from the PySpark side is to add something
like the following to {{ALS._prepare}}
([link|https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py#L215])
which is called before training
{noformat}
MAX_ID_VALUE = ratings.ctx._gateway.jvm.Integer.MAX_VALUE
if ratings.filter(lambda x: x.user > MAX_ID_VALUE or x.product >
MAX_ID_VALUE).count() > 0:
raise ValueError("Rating IDs must be less than max Java int %s." %
str(MAX_ID_VALUE))
{noformat}
But any operations on the data are probably not worth the hit for this issue
Edit: I meant the above as an alternative to checking values for 2^31
explicitly, which could be done in the Ratings constructor but seems like too
much of a hack to me
was (Author: bryanc):
The only way I can see handling this from the PySpark side is to add something
like the following to {{ALS._prepare}}
([link|https://github.com/apache/spark/blob/master/python/pyspark/mllib/recommendation.py#L215])
which is called before training
{noformat}
MAX_ID_VALUE = ratings.ctx._gateway.jvm.Integer.MAX_VALUE
if ratings.filter(lambda x: x.user > MAX_ID_VALUE or x.product >
MAX_ID_VALUE).count() > 0:
raise ValueError("Rating IDs must be less than max Java int %s." %
str(MAX_ID_VALUE))
{noformat}
But any operations on the data are probably not worth the hit for this issue
> ALS should print better errors when given Long IDs
> --------------------------------------------------
>
> Key: SPARK-10158
> URL: https://issues.apache.org/jira/browse/SPARK-10158
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib, PySpark
> Reporter: Joseph K. Bradley
> Priority: Minor
>
> See [SPARK-10115] for the very confusing messages you get when you try to use
> ALS with Long IDs. We should catch and identify these errors and print
> meaningful error messages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]