alamb commented on issue #7036: URL: https://github.com/apache/arrow-datafusion/issues/7036#issuecomment-1676324706
FWIW I think after the work from @devinjdangelo in https://github.com/apache/arrow-datafusion/pull/7244 this feature should now be a matter of hooking up the code (or maybe even removing an error) and writing a test. So marking it as a good first issue. Here is a reproducer: [cpu.zip](https://github.com/apache/arrow-datafusion/files/12329390/cpu.zip) ``` ❯ create external table cpu stored as parquet location '/tmp/foo/cpu.parquet'; 0 rows in set. Query took 0.002 seconds. ❯ select * from cpu limit 10; +------+-----------------------------------+---------------------+ | cpu | host1 | time | +------+-----------------------------------+---------------------+ | cpu2 | MacBook-Pro-8.hsd1.ma.comcast.net | 2022-09-30T12:55:00 | +------+-----------------------------------+---------------------+ 1 row in set. Query took 0.007 seconds. ``` The goal is to make this work: ``` ❯ create external table cpu stored as parquet location '/tmp/foo/cpu.parquet' with order (time); Error during planning: Provide a schema before specifying the order while creating a table. ``` Possibly: ``` ❯ create external table cpu(cpu varchar, host varchar, time timestamp) stored as parquet location '/tmp/foo/cpu.parquet' with order (time); Error during planning: Column definitions can not be specified for PARQUET files. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
