Sandy Ryza created SPARK-5112:
---------------------------------
Summary: Expose SizeEstimator as a developer API
Key: SPARK-5112
URL: https://issues.apache.org/jira/browse/SPARK-5112
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Sandy Ryza
Assignee: Sandy Ryza
"The best way to size the amount of memory consumption your dataset will
require is to create an RDD, put it into cache, and look at the SparkContext
logs on your driver program. The logs will tell you how much memory each
partition is consuming, which you can aggregate to get the total size of the
RDD."
-the Tuning Spark page
This is a pain. It would be much nicer to expose simply functionality for
understanding the memory footprint of a Java object.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]