[jira] [Created] (SPARK-3091) Add support for caching metadata on Parquet files

Matei Zaharia (JIRA) Sun, 17 Aug 2014 16:42:37 -0700

Matei Zaharia created SPARK-3091:
------------------------------------

             Summary: Add support for caching metadata on Parquet files
                 Key: SPARK-3091
                 URL: https://issues.apache.org/jira/browse/SPARK-3091
             Project: Spark
          Issue Type: New Feature
          Components: SQL
            Reporter: Matei Zaharia
            Assignee: Matei Zaharia



For larger Parquet files, reading the file footers (which is done in parallel 
on up to 5 threads) and HDFS block locations (which is serial) can take 
multiple seconds. We can add an option to cache this data within 
FilteringParquetInputFormat. Unfortunately ParquetInputFormat only caches 
footers within each instance of ParquetInputFormat, not across them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-3091) Add support for caching metadata on Parquet files

Reply via email to