xudong963 edited a comment on pull request #1067:
URL: https://github.com/apache/arrow-datafusion/pull/1067#issuecomment-932890819


   Though the PR fix the bug mentioned by the issue, there are still some bugs. 
I am confused and need help.
   When I ran the `cargo test`, I found there were four tests that couldn't 
pass. For example `tests::ballista_round_trip::q7`
   The following is the SQL executed in `tests::ballista_round_trip::q7`
   ```sql
   select
       supp_nation,
       cust_nation,
       l_year,
       sum(volume) as revenue
   from
       (
           select
               n1.n_name as supp_nation,
               n2.n_name as cust_nation,
               extract(year from l_shipdate) as l_year,
               l_extendedprice * (1 - l_discount) as volume
           from
               supplier,
               lineitem,
               orders,
               customer,
               nation n1,
               nation n2
           where
                   s_suppkey = l_suppkey
             and o_orderkey = l_orderkey
             and c_custkey = o_custkey
             and s_nationkey = n1.n_nationkey
             and c_nationkey = n2.n_nationkey
             and (
                   (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                   or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
               )
             and l_shipdate between date '1995-01-01' and date '1996-12-31'
       ) as shipping
   group by
       supp_nation,
       cust_nation,
       l_year
   order by
       supp_nation,
       cust_nation,
       l_year;
   ```
   Then I printed some logs to find the potential problems.
   First of all, I printed the plan at 
https://github.com/apache/arrow-datafusion/blob/master/benchmarks/src/bin/tpch.rs#L1092
   ```
   Sort: #shipping.supp_nation ASC NULLS FIRST, #shipping.cust_nation ASC NULLS 
FIRST, #shipping.l_year ASC NULLS FIRST
     Projection: #shipping.supp_nation, #shipping.cust_nation, 
#shipping.l_year, #shipping.volume
       Projection: #n1.n_name AS supp_nation, #n2.n_name AS cust_nation, 
datepart(Utf8("YEAR"), #lineitem.l_shipdate) AS l_year, 
#lineitem.l_extendedprice * Int64(1) - #lineitem.l_discount AS volume
         Filter: #n1.n_name = Utf8("FRANCE") AND #n2.n_name = Utf8("GERMANY") 
OR #n1.n_name = Utf8("GERMANY") AND #n2.n_name = Utf8("FRANCE") AND 
#lineitem.l_shipdate BETWEEN CAST(Utf8("1995-01-01") AS Date32) AND 
CAST(Utf8("1996-12-31") AS Date32)
           Join: #customer.c_nationkey = #n2.n_nationkey
             Join: #supplier.s_nationkey = #n1.n_nationkey
               Join: #orders.o_custkey = #customer.c_custkey
                 Join: #lineitem.l_orderkey = #orders.o_orderkey
                   Join: #supplier.s_suppkey = #lineitem.l_suppkey
                     TableScan: supplier projection=None
                     TableScan: lineitem projection=None
                   TableScan: orders projection=None
                 TableScan: customer projection=None
               TableScan: n1 projection=None
             TableScan: n2 projection=None
   ```
   It's OK?
   
   Then I tried to find problems in 
https://github.com/apache/arrow-datafusion/blob/master/benchmarks/src/bin/tpch.rs#L1094
 because the test panicked at here(`Logical Plan Proto` to `Logical Plan`)
   Finally, I found the panic at building projection logical plan 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/logical_plan/builder.rs#L240
   
   The projected_expr in 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/logical_plan/builder.rs#L255
 is the following
   ```
   [#shipping.supp_nation, #shipping.cust_nation, #shipping.l_year, 
#shipping.volume]
   ```
   But the plan and the plan schema are the followings
   ```
    Projection: #n1.n_name AS supp_nation, #n2.n_name AS cust_nation, 
datepart(Utf8("YEAR"), #lineitem.l_shipdate) AS l_year, 
#lineitem.l_extendedprice * Int64(1) - #lineitem.l_discount AS volume
         Filter: #n1.n_name = Utf8("FRANCE") AND #n2.n_name = Utf8("GERMANY") 
OR #n1.n_name = Utf8("GERMANY") AND #n2.n_name = Utf8("FRANCE") AND 
#lineitem.l_shipdate BETWEEN CAST(Utf8("1995-01-01") AS Date32) AND 
CAST(Utf8("1996-12-31") AS Date32)
           Join: #customer.c_nationkey = #n2.n_nationkey
             Join: #supplier.s_nationkey = #n1.n_nationkey
               Join: #orders.o_custkey = #customer.c_custkey
                 Join: #lineitem.l_orderkey = #orders.o_orderkey
                   Join: #supplier.s_suppkey = #lineitem.l_suppkey
                     TableScan: supplier projection=None
                     TableScan: lineitem projection=None
                   TableScan: orders projection=None
                 TableScan: customer projection=None
               TableScan: n1 projection=None
             TableScan: n2 projection=None
   ```
   ```
   DFSchema { fields: [DFField { qualifier: Some("n1"), field: Field { name: 
"supp_nation", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: 
false, metadata: None } }, DFField { qualifier: Some("n2"), field: Field { 
name: "cust_nation", data_type: Utf8, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: 
Field { name: "l_year", data_type: Int32, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: None } }, DFField { qualifier: None, field: 
Field { name: "volume", data_type: Float64, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: None } }] }
   ```
   So there will be conflict in 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/logical_plan/dfschema.rs#L150
   
   I have two questions
   1. 
https://github.com/apache/arrow-datafusion/blob/master/benchmarks/src/bin/tpch.rs#L1092
 is ok and get a plan, the plan is correct?
   2. Which part makes  `Logical Plan Proto` to `Logical Plan` get errors, now 
I don't know how to fix it. 
   
   PTAL and give me some help @alamb @houqp @Dandandan. Thanks very much!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to