[jira] [Created] (SPARK-4413) Parquet support through datasource API

Michael Armbrust (JIRA) Fri, 14 Nov 2014 12:45:07 -0800

Michael Armbrust created SPARK-4413:
---------------------------------------


             Summary: Parquet support through datasource API
                 Key: SPARK-4413
                 URL: https://issues.apache.org/jira/browse/SPARK-4413
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Michael Armbrust
            Assignee: Michael Armbrust
            Priority: Critical


Right now there are several issues with out parquet support.  Specifically, the 
only way to access parquet files though pure SQL is by including Hive, which 
has the following issues
 - fairly verbose syntax
 - requires you to explicitly add partitions
 - does not support decimal types.
 - querying tables with many partitions results in metadata operations 
dominating the query time (even worse when reading from S3).

It would be great to have better native support here though the new datasources 
API.  Ideally once that is in place we can deprecate the existing 
ParquetRelation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-4413) Parquet support through datasource API

Reply via email to