onursatici commented on issue #4763:
URL: https://github.com/apache/datafusion/issues/4763#issuecomment-2435286033

   this problem with q35, taken from the description above:
   ```
   - `Projections require unique expression names but the expression 
"MAX(customer_demographics.cd_dep_count)" at position 6 and 
"MAX(customer_demographics.cd_dep_count)" at position 7 have the same name. 
Consider aliasing ("AS") one of them.` (q35)
   ```
   seems to be some sort of bug in query generation for tpc-ds. For me the 
provided `dsqgen` binary creates the query 35 as something like:
   ```
   select   
     ca_state,
     cd_gender,
     cd_marital_status,
     cd_dep_count,
     count(*) cnt1,
     max(cd_dep_count),
     stddev_samp(cd_dep_count),
     stddev_samp(cd_dep_count),
     cd_dep_employed_count,
     count(*) cnt2,
     max(cd_dep_employed_count),
     stddev_samp(cd_dep_employed_count),
     stddev_samp(cd_dep_employed_count),
     cd_dep_college_count,
     count(*) cnt3,
     max(cd_dep_college_count),
     stddev_samp(cd_dep_college_count),
     stddev_samp(cd_dep_college_count)
    from
     customer c,customer_address ca,customer_demographics
     ...
   ```
   This has multiple columns that would resolve to the same name, like 
`stddev_samp(cd_dep_count`, violating the unique column name constraint 
enforced in datafusion logical plan builder.
   
   I believe this is not intentional, as the query template goes like this:
   ```
    [_LIMITA] select [_LIMITB]  
     ca_state,
     cd_gender,
     cd_marital_status,
     cd_dep_count,
     count(*) cnt1,
     [AGGONE](cd_dep_count),
     [AGGTWO](cd_dep_count),
     [AGGTHREE](cd_dep_count),
     cd_dep_employed_count,
     count(*) cnt2,
     [AGGONE](cd_dep_employed_count),
     [AGGTWO](cd_dep_employed_count),
     [AGGTHREE](cd_dep_employed_count),
     cd_dep_college_count,
     count(*) cnt3,
     [AGGONE](cd_dep_college_count),
     [AGGTWO](cd_dep_college_count),
     [AGGTHREE](cd_dep_college_count)
    from
     customer c,customer_address ca,customer_demographics
   ```
   latest spec mentions this about q35:
   ```
   Qualification Substitution Parameters:
   • YEAR.01 = 2002
   • AGGONE = min
   • AGGTWO = max
   • AGGTHREE = avg
   ```
   so all aggregates are different than each other, if the template was used 
with the substitutions listed in the spec, no duplicates should occur, and this 
query can be executed successfully with the main branch of datafusion as of now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to