alamb commented on issue #10069:
URL: 
https://github.com/apache/arrow-datafusion/issues/10069#issuecomment-2053645227

   I think we could add this as a user defined table function in datafusion-cli 
quite easily
   
   For example we could follow the model of `parquet_metadata` (which also 
follows duckdb):
   ```sql
   SELECT path_in_schema, row_group_id, row_group_num_rows, stats_min, 
stats_max, total_compressed_size
   FROM parquet_metadata('hits.parquet')
   WHERE path_in_schema = '"WatchID"'
   LIMIT 3;
   
   
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
   | path_in_schema | row_group_id | row_group_num_rows | stats_min           | 
stats_max           | total_compressed_size |
   
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
   | "WatchID"      | 0            | 450560             | 4611687214012840539 | 
9223369186199968220 | 3883759               |
   | "WatchID"      | 1            | 612174             | 4611689135232456464 | 
9223371478009085789 | 5176803               |
   | "WatchID"      | 2            | 344064             | 4611692774829951781 | 
9223363791697310021 | 3031680               |
   
+----------------+--------------+--------------------+---------------------+---------------------+-----------------------+
   3 rows in set. Query took 0.053 seconds.
   ```
   
   It is documented here https://arrow.apache.org/datafusion/user-guide/cli.html
   
   The code for it is here; 
https://github.com/apache/arrow-datafusion/blob/637293580db0634a4efbd3f52e4700992ee3080d/datafusion-cli/src/functions.rs#L215-L442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to