JosephWagner opened a new issue, #1490:
URL: https://github.com/apache/datafusion-python/issues/1490

   **Describe the bug**
   Selecting from a partitioned parquet file with EXPLAIN ANALYZE raises this 
error:
   
   Exception: DataFusion error: Internal error: Unsupported logical plan: 
Analyze must be root of the plan.
   This issue was likely caused by a bug in DataFusion's code. Please help us 
to resolve this by filing a bug report in our issue tracker: 
https://github.com/apache/datafusion/issues
   
   **To Reproduce**
   
   ```
   import os
   import shutil
   import pyarrow as pa
   import pyarrow.parquet as pq
   import datafusion
   
   BASE_DIR = "repro_analyze_bug"
   
   if os.path.exists(BASE_DIR):
       shutil.rmtree(BASE_DIR)
   
   partition_dir = f"{BASE_DIR}/a=1"
   os.makedirs(partition_dir)
   
   data_schema = pa.schema([
       ('b', pa.int32())
   ])
   
   data_table = pa.Table.from_arrays(
       [[10, 20, 30, 40, 50, 60]],
       schema=data_schema
   )
   pq.write_table(data_table, f"{partition_dir}/data.parquet")
   
   ctx = datafusion.SessionContext()
   
   ctx.sql(f"""
       CREATE EXTERNAL TABLE my_table (
           b INT
       )
       STORED AS PARQUET 
       LOCATION '{BASE_DIR}/a=*/*.parquet'
       PARTITIONED BY (a INT)
   """)
   
   
   result = ctx.sql("EXPLAIN ANALYZE SELECT * FROM my_table")
   
   # show will trigger the exception. collect will not
   result.show()
   
   # result output in ipython:
   #[ins] In [2]: result
   #Out[2]:
   #    DataFrame()
   #    +-------------------+-----------------------+
   #    | plan_type         | plan                  |
   #    +-------------------+-----------------------+
   #    | Plan with Metrics | EmptyExec, metrics=[] |
   #    |                   |                       |
   #    +-------------------+-----------------------+
   
   ```
   
   **Expected behavior**
   I would expect no exception, and a query plan to be displayed
   
   **Additional context**
   The table is selectable but appears to have 0 rows. I'm not sure why.
   
   ```
   [ins] In [7]: ctx.sql("select count(*) from my_table")
   Out[7]:
   DataFrame()
   +----------+
   | count(*) |
   +----------+
   | 0        |
   +----------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to