snoe925 commented on issue #133:
URL:
https://github.com/apache/arrow-datafusion/issues/133#issuecomment-872375783
The Presto/Athena syntax is nice for declaring a partitions without dynamic
discovery on the filesystem.
I would like to have the dynamic discovery as the default. But there is a
means to do explicit mappings in Athena/Presto SQL.
This is perhaps a companion to the feature requested in this issue. The
benefit is perhaps faster operation as you don't have to scan the filesystem to
discover partitions. A secondary benefit is using this scheme for version
snapshot support. This is how delta-io works with Athena/Presto/Trino.
Here is an example of syntax. Definitely needs a Google Doc treatment to
outline the details.
I just wanted to comment to show how one can split the filesystem / storage
discovery from the idea of partitions. This is certainly easy syntax for test
cases as 100% SQL based interaction.
CREATE EXTERNAL TABLE users (
first string,
last string,
username string
)
PARTITIONED BY (id string, id2 string) -- same as the create table column
syntax
STORED AS PARQUET
-- omit LOCATION because we are going to explicitly partition with ALTER
TABLE
ALTER TABLE user
ADD PARTITION (id='a', id2='02') LOCATION '/id=a/id=02/data.parquet'
ADD PARTITION (id='a', id2='03') LOCATION '/id=a/id=03/data.parquet'
This is perhaps a UNION ALL of hidden tables for each partition.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]