zhangxffff opened a new issue, #8524: URL: https://github.com/apache/arrow-datafusion/issues/8524
### Describe the bug I find `select count(*) from '*.parquet'` not only scan the parquet file in current directory, but it also recursively scan all the parquet file in subdirectory. I wonder is this behavior by design or a bug. ### To Reproduce I tried with three parquet file, and two of them are in subdir. ``` $$ tree . ├── subdir │ ├── file1.parquet │ └── file2.parquet └── users.parquet 2 directories, 3 files ``` `users.parquet` has 2 record, `file1.parquet` has 1 record and `file2.parquet` has 1 record. `select count(*) from '*.parquet'` get 4 ``` $$ datafusion-cli -c "select count(*) from '*.parquet'" DataFusion CLI v33.0.0 +----------+ | COUNT(*) | +----------+ | 4 | +----------+ 1 row in set. Query took 0.058 seconds. $$ datafusion-cli -c "select count(*) from 'users.parquet'" DataFusion CLI v33.0.0 +----------+ | COUNT(*) | +----------+ | 2 | +----------+ 1 row in set. Query took 0.002 seconds. $$ datafusion-cli -c "select count(*) from 'subdir/*.parquet'" DataFusion CLI v33.0.0 +----------+ | COUNT(*) | +----------+ | 2 | +----------+ 1 row in set. Query took 0.002 seconds. ``` ### Expected behavior I try same query in duckdb, and duckdb only scan parquet file in current directory ``` $$ duckdb -c "select count(*) from '*.parquet'" ┌──────────────┐ │ count_star() │ │ int64 │ ├──────────────┤ │ 2 │ └──────────────┘ ``` ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
