[jira] [Created] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export

Joseph K. Bradley (JIRA) Sat, 25 Apr 2015 20:20:06 -0700

Joseph K. Bradley created SPARK-7148:
----------------------------------------


             Summary: Configure Parquet block size (row group size) for ML 
model import/export
                 Key: SPARK-7148
                 URL: https://issues.apache.org/jira/browse/SPARK-7148
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, SQL
    Affects Versions: 1.3.1, 1.3.0, 1.4.0
            Reporter: Joseph K. Bradley
            Priority: Minor


It would be nice if we could configure the Parquet buffer size when using 
Parquet format for ML model import/export.  Currently, for some models (trees 
and ensembles), the schema has 13+ columns.  With a default buffer size of 
128MB (I think), that puts the allocated buffer way over the default memory 
made available by run-example.  Because of this problem, users have to use 
spark-submit and explicitly use a larger amount of memory in order to run some 
ML examples.

Is there a simple way to specify {{parquet.block.size}}?  I'm not familiar with 
this part of SparkSQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export

Reply via email to