Michael Armbrust created SPARK-4413:
---------------------------------------
Summary: Parquet support through datasource API
Key: SPARK-4413
URL: https://issues.apache.org/jira/browse/SPARK-4413
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Critical
Right now there are several issues with out parquet support. Specifically, the
only way to access parquet files though pure SQL is by including Hive, which
has the following issues
- fairly verbose syntax
- requires you to explicitly add partitions
- does not support decimal types.
- querying tables with many partitions results in metadata operations
dominating the query time (even worse when reading from S3).
It would be great to have better native support here though the new datasources
API. Ideally once that is in place we can deprecate the existing
ParquetRelation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]