[GitHub] [arrow-datafusion] edmondop commented on issue #7036: Allowing to expose sort order of parquet file to datafusion does not work

via GitHub Tue, 15 Aug 2023 08:53:20 -0700


edmondop commented on issue #7036:
URL: 
https://github.com/apache/arrow-datafusion/issues/7036#issuecomment-1679178283


   > FWIW I think after the work from @devinjdangelo in #7244 this feature 
should now be a matter of hooking up the code (or maybe even removing an error) 
and writing a test. So marking it as a good first issue.
   > 
   > Here is a reproducer:
   > 
   > 
[cpu.zip](https://github.com/apache/arrow-datafusion/files/12329390/cpu.zip)
   > 
   > ```
   > ❯ create external table cpu stored as parquet location 
'/tmp/foo/cpu.parquet';
   > 0 rows in set. Query took 0.002 seconds.
   > 
   > ❯ select * from cpu limit 10;
   > +------+-----------------------------------+---------------------+
   > | cpu  | host1                             | time                |
   > +------+-----------------------------------+---------------------+
   > | cpu2 | MacBook-Pro-8.hsd1.ma.comcast.net | 2022-09-30T12:55:00 |
   > +------+-----------------------------------+---------------------+
   > 1 row in set. Query took 0.007 seconds.
   > ```
   > 
   > The goal is to make this work:
   > 
   > ```
   > ❯ create external table cpu stored as parquet location 
'/tmp/foo/cpu.parquet' with order (time);
   > Error during planning: Provide a schema before specifying the order while 
creating a table.
   > ```
   > 
   > Possibly:
   > 
   > ```
   > ❯ create external table cpu(cpu varchar, host varchar, time timestamp) 
stored as parquet location '/tmp/foo/cpu.parquet' with order (time);
   > Error during planning: Column definitions can not be specified for PARQUET 
files.
   > ```
   
   @alamb should we use this issue to track the fix or do we want to open a 
subissue ? I can start looking into this soon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] edmondop commented on issue #7036: Allowing to expose sort order of parquet file to datafusion does not work

Reply via email to