[GitHub] [arrow-datafusion] snoe925 commented on issue #133: Add support for reading partitioned Parquet files

GitBox Thu, 01 Jul 2021 09:14:08 -0700


snoe925 commented on issue #133:
URL: 
https://github.com/apache/arrow-datafusion/issues/133#issuecomment-872375783



   The Presto/Athena syntax is nice for declaring a partitions without dynamic 
discovery on the filesystem.
   I would like to have the dynamic discovery as the default.  But there is a 
means to do explicit mappings in Athena/Presto SQL.
   This is perhaps a companion to the feature requested in this issue.  The 
benefit is perhaps faster operation as you don't have to scan the filesystem to 
discover partitions.  A secondary benefit is using this scheme for version 
snapshot support.  This is how delta-io works with Athena/Presto/Trino.
   
   Here is an example of syntax.  Definitely needs a Google Doc treatment to 
outline the details.
   
   I just wanted to comment to show how one can split the filesystem / storage 
discovery from the idea of partitions.  This is certainly easy syntax for test 
cases as 100% SQL based interaction.
   
   CREATE EXTERNAL TABLE users (
   first string,
   last string,
   username string
   )
   PARTITIONED BY (id string, id2 string)  -- same as the create table column 
syntax
   STORED AS PARQUET
   -- omit LOCATION because we are going to explicitly partition with ALTER 
TABLE
   
   ALTER TABLE user 
       ADD PARTITION (id='a', id2='02') LOCATION '/id=a/id=02/data.parquet'
       ADD PARTITION (id='a', id2='03') LOCATION '/id=a/id=03/data.parquet'
   
   This is perhaps a UNION ALL of hidden tables for each partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] snoe925 commented on issue #133: Add support for reading partitioned Parquet files

Reply via email to