Joseph K. Bradley created SPARK-14283:
-----------------------------------------
Summary: Avoid sort in randomSplit when possible
Key: SPARK-14283
URL: https://issues.apache.org/jira/browse/SPARK-14283
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Joseph K. Bradley
Dataset.randomSplit sorts each partition in order to guarantee an ordering and
make randomSplit deterministic given the seed. Since randomSplit is used a
fair amount in ML, it would be great to avoid the sort when possible.
Are there cases when it could be avoided?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]