[
https://issues.apache.org/jira/browse/DRILL-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Turton updated DRILL-8182:
--------------------------------
Description:
When a query includes multiple SELECTs against a workbook by using TABLE
functions to access different sheets, and those sheets contain a column with
the same name, then values for that column come a single sheet for both
SELECTs. To reproduce, run the following query against the attachment and note
that the `Name` values returned from the Products sheet are `Name` values from
the Customers sheet.
{code:java}
with
prod as (
select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type
=> 'excel', sheetName => 'Products'))
)
, cust as (
select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type
=> 'excel', sheetName => 'Customers'))
)
select * from cust join prod on cust.Id = prod.Id; {code}
was:
When a query creates multiple scans against a workbook, targeting different
sheets using TABLE functions then the resulting datasets appear to get mixed
with one overwriting the other. To reproduce, run the following query against
the attachment and note that the value returned from the Products sheet is a
name from the Customers sheet.
{code:java}
with cust as (
select * from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type =>
'excel', sheetName => 'Customers'))
),
prod as (
select * from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx` (type =>
'excel', sheetName => 'Products'))
)
select * from cust join prod on cust.Id = prod.Id;
{code}
> Excel format plugin sheet scan overwriting bug
> ----------------------------------------------
>
> Key: DRILL-8182
> URL: https://issues.apache.org/jira/browse/DRILL-8182
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Other
> Affects Versions: 1.20.0
> Reporter: James Turton
> Assignee: Charles Givre
> Priority: Major
> Fix For: 2.0.0
>
> Attachments: Products_Customers_Orders.xlsx
>
>
> When a query includes multiple SELECTs against a workbook by using TABLE
> functions to access different sheets, and those sheets contain a column with
> the same name, then values for that column come a single sheet for both
> SELECTs. To reproduce, run the following query against the attachment and
> note that the `Name` values returned from the Products sheet are `Name`
> values from the Customers sheet.
>
> {code:java}
> with
> prod as (
> select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx`
> (type => 'excel', sheetName => 'Products'))
> )
> , cust as (
> select Id, Name from TABLE(dfs.tmp.`/Products_Customers_Orders.xlsx`
> (type => 'excel', sheetName => 'Customers'))
> )
> select * from cust join prod on cust.Id = prod.Id; {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)