Joseph K. Bradley created SPARK-7148:
----------------------------------------
Summary: Configure Parquet block size (row group size) for ML
model import/export
Key: SPARK-7148
URL: https://issues.apache.org/jira/browse/SPARK-7148
Project: Spark
Issue Type: Improvement
Components: MLlib, SQL
Affects Versions: 1.3.1, 1.3.0, 1.4.0
Reporter: Joseph K. Bradley
Priority: Minor
It would be nice if we could configure the Parquet buffer size when using
Parquet format for ML model import/export. Currently, for some models (trees
and ensembles), the schema has 13+ columns. With a default buffer size of
128MB (I think), that puts the allocated buffer way over the default memory
made available by run-example. Because of this problem, users have to use
spark-submit and explicitly use a larger amount of memory in order to run some
ML examples.
Is there a simple way to specify {{parquet.block.size}}? I'm not familiar with
this part of SparkSQL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]