[ 
https://issues.apache.org/jira/browse/DRILL-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8182:
--------------------------------
    Description: 
Two file scans that differ only by format config specified using table 
functions may be genuinely different in terms of the data they should return. 
The format config may affect the behaviour of a parser, or even direct the

When a query includes multiple SELECTs against a workbook by using TABLE 
functions to access different sheets, and those sheets contain a column with 
the same name, then values for that column come a single sheet for both 
SELECTs.  To reproduce, run the following query against the attachment and note 
that the `Name` values returned from the Products sheet are `Name` values from 
the Customers sheet.

 
{code:java}
with
prod as (
    select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type 
=> 'excel', sheetName => 'Products'))
)
, cust as (
    select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type 
=> 'excel', sheetName => 'Customers'))
)
select * from cust join prod on cust.Id = prod.Id; {code}

  was:
When a query includes multiple SELECTs against a workbook by using TABLE 
functions to access different sheets, and those sheets contain a column with 
the same name, then values for that column come a single sheet for both 
SELECTs.  To reproduce, run the following query against the attachment and note 
that the `Name` values returned from the Products sheet are `Name` values from 
the Customers sheet.

 
{code:java}
with
prod as (
    select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type 
=> 'excel', sheetName => 'Products'))
)
, cust as (
    select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type 
=> 'excel', sheetName => 'Customers'))
)
select * from cust join prod on cust.Id = prod.Id; {code}


> Scan nodes not differentiated by format config
> ----------------------------------------------
>
>                 Key: DRILL-8182
>                 URL: https://issues.apache.org/jira/browse/DRILL-8182
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Other
>    Affects Versions: 1.20.0
>            Reporter: James Turton
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.20.2
>
>         Attachments: Products_Customers_Orders.xlsx
>
>
> Two file scans that differ only by format config specified using table 
> functions may be genuinely different in terms of the data they should return. 
> The format config may affect the behaviour of a parser, or even direct the
> When a query includes multiple SELECTs against a workbook by using TABLE 
> functions to access different sheets, and those sheets contain a column with 
> the same name, then values for that column come a single sheet for both 
> SELECTs.  To reproduce, run the following query against the attachment and 
> note that the `Name` values returned from the Products sheet are `Name` 
> values from the Customers sheet.
>  
> {code:java}
> with
> prod as (
>     select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` 
> (type => 'excel', sheetName => 'Products'))
> )
> , cust as (
>     select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` 
> (type => 'excel', sheetName => 'Customers'))
> )
> select * from cust join prod on cust.Id = prod.Id; {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to