pmcgleenon commented on issue #6983:
URL: 
https://github.com/apache/arrow-datafusion/issues/6983#issuecomment-1953000796

   I ran the reproducer 
https://github.com/apache/arrow-datafusion/issues/6983#issuecomment-1662556865 
and didn't see this issue.
   
   1. generate benchmark data
   ```
   cd benchmarks
   ./bench.sh data tpch10
   ```
   
   2. run CLI with query (3.2 seconds) and without query (3.5 seconds)
   
   
   ```
   DataFusion CLI v36.0.0
   ❯ create external table test stored as parquet location 
'/Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet';
   0 rows in set. Query took 0.115 seconds.
   
   ❯ create table t as select * from test;
   0 rows in set. Query took 3.527 seconds.
   ```
   
   ```
   DataFusion CLI v36.0.0
   ❯ create external table test stored as parquet location 
'/Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet';
   0 rows in set. Query took 0.006 seconds.
   
   ❯ create table t as (select * from test where l_linenumber > 0);
   0 rows in set. Query took 3.216 seconds.
   ```
   
   3. ran the rust program with query (3.1 seconds) and without query (3 
seconds)
   
   ```
       let _df = _ctx
           .read_parquet(FILENAME, _read_options)
           .await
           .unwrap();
           // .filter(col("l_orderkey").gt(lit(0)))
           // .unwrap();
   ```
   
   4. checked the plan output for the presence of file_groups in the physical 
plan to make it parallel. 
   
   
   
   ```
   ❯ explain select * from test;
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                 |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | TableScan: test projection=[l_orderkey, l_partkey, 
l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, 
l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, 
l_shipinstruct, l_shipmode, l_comment]                                          
                                                                                
                                                                                
                                                                                
                                                                                
           |
   | physical_plan | ParquetExec: file_groups={4 groups: 
[[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:0..10165445],
 
[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:10165445..20330890],
 
[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:20330890..30496335],
 
[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:30496335..40661778]]},
 projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment] |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                 |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.010 seconds.
   ```
   
   ```
   ❯ explain select * from test where l_orderkey > 0;
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Filter: test.l_orderkey > Int64(0)                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 |
   |               |   TableScan: test projection=[l_orderkey, l_partkey, 
l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, 
l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, 
l_shipinstruct, l_shipmode, l_comment], partial_filters=[test.l_orderkey > 
Int64(0)]                                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                              |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 |
   |               |   FilterExec: l_orderkey@0 > 0                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 |
   |               |     ParquetExec: file_groups={4 groups: 
[[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:0..10165445],
 
[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:10165445..20330890],
 
[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:20330890..30496335],
 
[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:30496335..40661778]]},
 projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], 
predicate=l_orderkey@0 > 0, pruning_predicate=l_orderkey_max@0 > 0, 
required_guarantees=[] |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.012 seconds.
   ```
   
   @alamb  this looks ok to me (unless I've missed something).    `file_groups 
= 4` means it's loaded in parallel on each of the 4 CPUs available?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to