Cheng Lian created SPARK-15719:
----------------------------------

             Summary: Disable writing Parquet summary files by default
                 Key: SPARK-15719
                 URL: https://issues.apache.org/jira/browse/SPARK-15719
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Cheng Lian
            Assignee: Cheng Lian


Parquet summary files are not particular useful nowadays since

# when schema merging is disabled, we assume schema of all Parquet part-files 
are identical, thus we can read the footer from any part-files.
# when schema merging is enabled, we need to read footers of all files anyway 
to do the merge.

On the other hand, writing summary files can be expensive because footers of 
all part-files must be read and merged. This is particularly costly when 
appending small dataset to large existing Parquet dataset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to