Davies Liu created SPARK-4860:
---------------------------------
Summary: Improve performance of sample() and takeSample() on
SchemaRDD
Key: SPARK-4860
URL: https://issues.apache.org/jira/browse/SPARK-4860
Project: Spark
Issue Type: Improvement
Components: PySpark, SQL
Reporter: Davies Liu
In SchemaRDD, all the rows are already serialized into Java objects, so it's
possible to call sample()/takeSample() of JavaSchemaRDD() in Python, which will
be much faster than the current approach (implemented in pure Python).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]