adriangb opened a new issue, #14406: URL: https://github.com/apache/datafusion/issues/14406
### Describe the bug Outer limits seem to be able to impact the inner limits of a subquery ### To Reproduce Run the following python script to create test data: ```python import os from datetime import datetime, timedelta import polars as pl # Start date base_date = datetime(1970, 1, 1) # Create directory structure if it doesn't exist os.makedirs('parquet_files', exist_ok=True) # Generate 100 files for i in range(100): # Calculate the date for this partition current_date = base_date + timedelta(days=i) partition_path = f'parquet_files/day={current_date.strftime("%Y-%m-%d")}' # Create partition directory os.makedirs(partition_path, exist_ok=True) # Create DataFrame with single row df = pl.DataFrame({'duration': [1.0]}) # Write to parquet file df.write_parquet(f'{partition_path}/file_{i}.parquet') ``` Now in datafusion-cli (`datafusion-cli 43.0.0` for me) run: ```sql with selection as ( select * from 'parquet_files/*' limit 1 ) select 1 as foo from selection order by duration limit 1000; ``` I get: ``` +-----+ | foo | +-----+ | 1 | | 1 | +-----+ 2 row(s) fetched. ``` Which is wrong! It should only ever return 1 row. This is an MRE of a problem I found in our production stack. In real world tests it's not 2x the rows, it can be varying numbers, it seems to depend on the number of partitions chosen to execute with. Setting `SET datafusion.execution.target_partitions = 1;` the problem goes away. Also without the outer `limit 1000` the problem goes away. [parquet_files.zip](https://github.com/user-attachments/files/18630234/parquet_files.zip) ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org