alamb commented on issue #5942:
URL: 
https://github.com/apache/arrow-datafusion/issues/5942#issuecomment-1507271239

   The explain plan clearly shows DataFusion trying to parallelize the scan:
   
   ```
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Sort: lineitem.parquet.l_returnflag ASC NULLS LAST, 
lineitem.parquet.l_linestatus ASC NULLS LAST                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |   Projection: lineitem.parquet.l_returnflag, 
lineitem.parquet.l_linestatus, SUM(lineitem.parquet.l_quantity) AS sum_qty, 
SUM(lineitem.parquet.l_extendedprice) AS sum_base_price, 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - lineitem.parquet.l_discount) 
AS sum_disc_price, SUM(lineitem.parquet.l_extendedprice * Int64(1) - 
lineitem.parquet.l_discount * Int64(1) + lineitem.parquet.l_tax) AS sum_charge, 
AVG(lineitem.parquet.l_quantity) AS avg_qty, 
AVG(lineitem.parquet.l_extendedprice) AS avg_price, 
AVG(lineitem.parquet.l_discount) AS avg_disc, COUNT(UInt8(1)) AS count_order    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |     Aggregate: groupBy=[[lineitem.parquet.l_returnflag, 
lineitem.parquet.l_linestatus]], aggr=[[SUM(lineitem.parquet.l_quantity), 
SUM(lineitem.parquet.l_extendedprice), 
SUM(CAST(lineitem.parquet.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2)) AS Decimal128(38, 4))CAST(Decimal128(Some(100),23,2) - 
CAST(lineitem.parquet.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 
4))Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2))CAST(lineitem.parquet.l_discount AS Decimal128(23, 
2))lineitem.parquet.l_discountDecimal128(Some(100),23,2)CAST(lineitem.parquet.l_extendedprice
 AS Decimal128(38, 4))lineitem.parquet.l_extendedprice AS 
lineitem.parquet.l_extendedprice * Decimal128(Some(100),23,2) - 
lineitem.parquet.l_discount) AS SUM(lineitem.parquet.l_extendedprice * Int64(1) 
- lineitem.parquet.l_discount), SUM(CAST(CAST(lineitem.parquet.l_extendedprice 
AS Decimal128(
 38, 4)) * CAST(Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount 
AS Decimal128(23, 2)) AS Decimal128(38, 4))CAST(Decimal128(Some(100),23,2) - 
CAST(lineitem.parquet.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 
4))Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2))CAST(lineitem.parquet.l_discount AS Decimal128(23, 
2))lineitem.parquet.l_discountDecimal128(Some(100),23,2)CAST(lineitem.parquet.l_extendedprice
 AS Decimal128(38, 4))lineitem.parquet.l_extendedprice AS 
lineitem.parquet.l_extendedprice * Decimal128(Some(100),23,2) - 
lineitem.parquet.l_discount AS Decimal128(38, 6)) * 
CAST(Decimal128(Some(100),23,2) + CAST(lineitem.parquet.l_tax AS Decimal128(23, 
2)) AS Decimal128(38, 6))) AS SUM(lineitem.parquet.l_extendedprice * Int64(1) - 
lineitem.parquet.l_discount * Int64(1) + lineitem.parquet.l_tax), 
AVG(lineitem.parquet.l_quantity), AVG(lineitem.parquet.l_extendedprice), 
AVG(lineitem.parquet.l_discount), COUNT(UInt8(1))]] |
   |               |       Projection: CAST(lineitem.parquet.l_extendedprice AS 
Decimal128(38, 4)) * CAST(Decimal128(Some(100),23,2) - 
CAST(lineitem.parquet.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 4)) AS 
CAST(lineitem.parquet.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2)) AS Decimal128(38, 4))CAST(Decimal128(Some(100),23,2) - 
CAST(lineitem.parquet.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 
4))Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2))CAST(lineitem.parquet.l_discount AS Decimal128(23, 
2))lineitem.parquet.l_discountDecimal128(Some(100),23,2)CAST(lineitem.parquet.l_extendedprice
 AS Decimal128(38, 4))lineitem.parquet.l_extendedprice, 
lineitem.parquet.l_quantity, lineitem.parquet.l_extendedprice, 
lineitem.parquet.l_discount, lineitem.parquet.l_tax, 
lineitem.parquet.l_returnflag, lineitem.parquet.l_linestatus                    
                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |         Filter: lineitem.parquet.l_shipdate <= 
Date32("10471")                                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |           TableScan: lineitem.parquet 
projection=[l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, 
l_linestatus, l_shipdate], partial_filters=[lineitem.parquet.l_shipdate <= 
Date32("10471")]                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   | physical_plan | SortPreservingMergeExec: [l_returnflag@0 ASC NULLS 
LAST,l_linestatus@1 ASC NULLS LAST]                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |   SortExec: expr=[l_returnflag@0 ASC NULLS 
LAST,l_linestatus@1 ASC NULLS LAST]                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |     ProjectionExec: expr=[l_returnflag@0 as l_returnflag, 
l_linestatus@1 as l_linestatus, SUM(lineitem.parquet.l_quantity)@2 as sum_qty, 
SUM(lineitem.parquet.l_extendedprice)@3 as sum_base_price, 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - 
lineitem.parquet.l_discount)@4 as sum_disc_price, 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - lineitem.parquet.l_discount * 
Int64(1) + lineitem.parquet.l_tax)@5 as sum_charge, 
AVG(lineitem.parquet.l_quantity)@6 as avg_qty, 
AVG(lineitem.parquet.l_extendedprice)@7 as avg_price, 
AVG(lineitem.parquet.l_discount)@8 as avg_disc, COUNT(UInt8(1))@9 as 
count_order]                                                                    
                                                                                
                                                                                
                                                                                
                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |       AggregateExec: mode=FinalPartitioned, 
gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], 
aggr=[SUM(lineitem.parquet.l_quantity), SUM(lineitem.parquet.l_extendedprice), 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - lineitem.parquet.l_discount), 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - lineitem.parquet.l_discount * 
Int64(1) + lineitem.parquet.l_tax), AVG(lineitem.parquet.l_quantity), 
AVG(lineitem.parquet.l_extendedprice), AVG(lineitem.parquet.l_discount), 
COUNT(UInt8(1))]                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |         CoalesceBatchesExec: target_batch_size=8192        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |           RepartitionExec: partitioning=Hash([Column { 
name: "l_returnflag", index: 0 }, Column { name: "l_linestatus", index: 1 }], 
8), input_partitions=8                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |             AggregateExec: mode=Partial, 
gby=[l_returnflag@5 as l_returnflag, l_linestatus@6 as l_linestatus], 
aggr=[SUM(lineitem.parquet.l_quantity), SUM(lineitem.parquet.l_extendedprice), 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - lineitem.parquet.l_discount), 
SUM(lineitem.parquet.l_extendedprice * Int64(1) - lineitem.parquet.l_discount * 
Int64(1) + lineitem.parquet.l_tax), AVG(lineitem.parquet.l_quantity), 
AVG(lineitem.parquet.l_extendedprice), AVG(lineitem.parquet.l_discount), 
COUNT(UInt8(1))]                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |               ProjectionExec: expr=[CAST(l_extendedprice@1 
AS Decimal128(38, 4)) * CAST(Some(100),23,2 - CAST(l_discount@2 AS 
Decimal128(23, 2)) AS Decimal128(38, 4)) as 
CAST(lineitem.parquet.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2)) AS Decimal128(38, 4))CAST(Decimal128(Some(100),23,2) - 
CAST(lineitem.parquet.l_discount AS Decimal128(23, 2)) AS Decimal128(38, 
4))Decimal128(Some(100),23,2) - CAST(lineitem.parquet.l_discount AS 
Decimal128(23, 2))CAST(lineitem.parquet.l_discount AS Decimal128(23, 
2))lineitem.parquet.l_discountDecimal128(Some(100),23,2)CAST(lineitem.parquet.l_extendedprice
 AS Decimal128(38, 4))lineitem.parquet.l_extendedprice, l_quantity@0 as 
l_quantity, l_extendedprice@1 as l_extendedprice, l_discount@2 as l_discount, 
l_tax@3 as l_tax, l_returnflag@4 as l_returnflag, l_linestatus@5 as 
l_linestatus]                                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |                 CoalesceBatchesExec: 
target_batch_size=8192                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |                   FilterExec: l_shipdate@6 <= 10471        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |                     ParquetExec: limit=None, partitions={8 
groups: [[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:0..20797508], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:20797508..41595016], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:41595016..62392524], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:62392524..83190032], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:83190032..103987540], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:103987540..124785048], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:124785048..145582556], 
[home/alamb/tpch_data/parquet_data_SF1/lineitem.parquet:145582556..166380062]]},
 predicate=l_shipdate@10 <= 10471, pruning_predicate=l_shipdate_min@0 <= 10471, 
projection=[l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, 
l_linestatus, l_shipdate]                                                       
                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   Here is the duckdb generated schema:
   
   ```
   ❯  create external table lineitem stored as parquet location 
'lineitem.parquet';
   0 rows in set. Query took 0.003 seconds.
   ❯ describe lineitem;
   +-----------------+-------------------+-------------+
   | column_name     | data_type         | is_nullable |
   +-----------------+-------------------+-------------+
   | l_orderkey      | Int32             | YES         |
   | l_partkey       | Int32             | YES         |
   | l_suppkey       | Int32             | YES         |
   | l_linenumber    | Int32             | YES         |
   | l_quantity      | Int32             | YES         |
   | l_extendedprice | Decimal128(15, 2) | YES         |
   | l_discount      | Decimal128(15, 2) | YES         |
   | l_tax           | Decimal128(15, 2) | YES         |
   | l_returnflag    | Utf8              | YES         |
   | l_linestatus    | Utf8              | YES         |
   | l_shipdate      | Date32            | YES         |
   | l_commitdate    | Date32            | YES         |
   | l_receiptdate   | Date32            | YES         |
   | l_shipinstruct  | Utf8              | YES         |
   | l_shipmode      | Utf8              | YES         |
   | l_comment       | Utf8              | YES         |
   +-----------------+-------------------+-------------+
   16 rows in set. Query took 0.001 seconds.
   ```
   Here is the datafusion generated schema:
   
   
   
   ```
   ❯ describe lineitem;
   +-----------------+-------------------+-------------+
   | column_name     | data_type         | is_nullable |
   +-----------------+-------------------+-------------+
   | l_orderkey      | Int64             | NO          |
   | l_partkey       | Int64             | NO          |
   | l_suppkey       | Int64             | NO          |
   | l_linenumber    | Int32             | NO          |
   | l_quantity      | Decimal128(15, 2) | NO          |
   | l_extendedprice | Decimal128(15, 2) | NO          |
   | l_discount      | Decimal128(15, 2) | NO          |
   | l_tax           | Decimal128(15, 2) | NO          |
   | l_returnflag    | Utf8              | NO          |
   | l_linestatus    | Utf8              | NO          |
   | l_shipdate      | Date32            | NO          |
   | l_commitdate    | Date32            | NO          |
   | l_receiptdate   | Date32            | NO          |
   | l_shipinstruct  | Utf8              | NO          |
   | l_shipmode      | Utf8              | NO          |
   | l_comment       | Utf8              | NO          |
   +-----------------+-------------------+-------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to