[GitHub] [arrow-datafusion] alamb commented on issue #7036: Allowing to expose sort order of parquet file to datafusion does not work

via GitHub Sun, 13 Aug 2023 04:17:15 -0700


alamb commented on issue #7036:
URL: 
https://github.com/apache/arrow-datafusion/issues/7036#issuecomment-1676324706


   FWIW I think after the work from @devinjdangelo  in  
https://github.com/apache/arrow-datafusion/pull/7244 this feature should now be 
a matter of hooking up the code (or maybe even removing an error) and writing a 
test. So marking it as a good first issue. 
   
   Here is a reproducer:
   
   [cpu.zip](https://github.com/apache/arrow-datafusion/files/12329390/cpu.zip)
   
   ```
   ❯ create external table cpu stored as parquet location 
'/tmp/foo/cpu.parquet';
   0 rows in set. Query took 0.002 seconds.
   
   ❯ select * from cpu limit 10;
   +------+-----------------------------------+---------------------+
   | cpu  | host1                             | time                |
   +------+-----------------------------------+---------------------+
   | cpu2 | MacBook-Pro-8.hsd1.ma.comcast.net | 2022-09-30T12:55:00 |
   +------+-----------------------------------+---------------------+
   1 row in set. Query took 0.007 seconds.
   ```
   
   The goal is to make this work:
   ```
   ❯ create external table cpu stored as parquet location 
'/tmp/foo/cpu.parquet' with order (time);
   Error during planning: Provide a schema before specifying the order while 
creating a table.
   ```
   
   Possibly:
   ```
   ❯ create external table cpu(cpu varchar, host varchar, time timestamp) 
stored as parquet location '/tmp/foo/cpu.parquet' with order (time);
   Error during planning: Column definitions can not be specified for PARQUET 
files.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #7036: Allowing to expose sort order of parquet file to datafusion does not work

Reply via email to