[jira] [Updated] (DRILL-6605) TPCDS-84 Query does not return any rows

Robert Hou (JIRA) Mon, 16 Jul 2018 14:48:11 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Hou updated DRILL-6605:
------------------------------
    Description: 
Query is:
Advanced/tpcds/tpcds_sf100/hive/parquet/query84.sql

This uses the hive parquet reader.
{code:sql}
SELECT c_customer_id   AS customer_id,
c_last_name
|| ', '
|| c_first_name AS customername
FROM   customer,
customer_address,
customer_demographics,
household_demographics,
income_band,
store_returns
WHERE  ca_city = 'Green Acres'
AND c_current_addr_sk = ca_address_sk
AND ib_lower_bound >= 54986
AND ib_upper_bound <= 54986 + 50000
AND ib_income_band_sk = hd_income_band_sk
AND cd_demo_sk = c_current_cdemo_sk
AND hd_demo_sk = c_current_hdemo_sk
AND sr_cdemo_sk = cd_demo_sk
ORDER  BY c_customer_id
LIMIT 100
{code}

This query should return 100 rows

Here is the explain plan:
{noformat}
| 00-00    Screen
00-01      Project(customer_id=[$0], customername=[$1])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[100])
00-04            SingleMergeExchange(sort0=[0])
01-01              OrderedMuxExchange(sort0=[0])
02-01                SelectionVectorRemover
02-02                  TopN(limit=[100])
02-03                    HashToRandomExchange(dist0=[[$0]])
03-01                      Project(customer_id=[$0], customername=[||(||($5, ', 
'), $4)])
03-02                        Project(c_customer_id=[$1], 
c_current_cdemo_sk=[$2], c_current_hdemo_sk=[$3], c_current_addr_sk=[$4], 
c_first_name=[$5], c_last_name=[$6], ca_address_sk=[$8], ca_city=[$9], 
cd_demo_sk=[$7], hd_demo_sk=[$10], hd_income_band_sk=[$11], 
ib_income_band_sk=[$12], ib_lower_bound=[$13], ib_upper_bound=[$14], 
sr_cdemo_sk=[$0])
03-03                          HashJoin(condition=[=($7, $0)], joinType=[inner])
03-05                            HashToRandomExchange(dist0=[[$0]])
04-01                              Scan(groupscan=[HiveScan 
[table=Table(dbName:tpcds100_parquet, tableName:store_returns), 
columns=[`sr_cdemo_sk`], numPartitions=0, partitions= null, 
inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/web_returns], 
confProperties={}]])
03-04                            HashToRandomExchange(dist0=[[$6]])
05-01                              HashJoin(condition=[=($2, $9)], 
joinType=[inner])
05-03                                HashJoin(condition=[=($3, $7)], 
joinType=[inner])
05-05                                  HashJoin(condition=[=($1, $6)], 
joinType=[inner])
05-07                                    Scan(groupscan=[HiveScan 
[table=Table(dbName:tpcds100_parquet, tableName:customer), 
columns=[`c_customer_id`, `c_current_cdemo_sk`, `c_current_hdemo_sk`, 
`c_current_addr_sk`, `c_first_name`, `c_last_name`], numPartitions=0, 
partitions= null, 
inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer], 
confProperties={}]])
05-06                                    BroadcastExchange
06-01                                      Scan(groupscan=[HiveScan 
[table=Table(dbName:tpcds100_parquet, tableName:customer_demographics), 
columns=[`cd_demo_sk`], numPartitions=0, partitions= null, 
inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_demographics],
 confProperties={}]])
05-04                                  BroadcastExchange
07-01                                    SelectionVectorRemover
07-02                                      Filter(condition=[=($1, 'Green 
Acres')])
07-03                                        Scan(groupscan=[HiveScan 
[table=Table(dbName:tpcds100_parquet, tableName:customer_address), 
columns=[`ca_address_sk`, `ca_city`], numPartitions=0, partitions= null, 
inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_address], 
confProperties={}]])
05-02                                BroadcastExchange
08-01                                  HashJoin(condition=[=($1, $2)], 
joinType=[inner])
08-03                                    Scan(groupscan=[HiveScan 
[table=Table(dbName:tpcds100_parquet, tableName:household_demographics), 
columns=[`hd_demo_sk`, `hd_income_band_sk`], numPartitions=0, partitions= null, 
inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/household_demographics],
 confProperties={}]])
08-02                                    SelectionVectorRemover
08-04                                      Filter(condition=[AND(>=($1, 54986), 
<=($2, 104986))])
08-05                                        Scan(groupscan=[HiveScan 
[table=Table(dbName:tpcds100_parquet, tableName:income_band), 
columns=[`ib_income_band_sk`, `ib_lower_bound`, `ib_upper_bound`], 
numPartitions=0, partitions= null, 
inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/income_band], 
confProperties={}]])
{noformat}

commit id is:
1.14.0-SNAPSHOT a77fd142d86dd5648cda8866b8ff3af39c7b6b11        DRILL-6516: 
EMIT support in streaming agg       11.07.2018 @ 18:40:03 PDT       Unknown 
12.07.2018 @ 01:50:37 PDT



  was:
Query is:
Advanced/tpcds/tpcds_sf100/hive/parquet/query84.sql

This uses the hive parquet reader.
{code:sql}
SELECT c_customer_id   AS customer_id,
c_last_name
|| ', '
|| c_first_name AS customername
FROM   customer,
customer_address,
customer_demographics,
household_demographics,
income_band,
store_returns
WHERE  ca_city = 'Green Acres'
AND c_current_addr_sk = ca_address_sk
AND ib_lower_bound >= 54986
AND ib_upper_bound <= 54986 + 50000
AND ib_income_band_sk = hd_income_band_sk
AND cd_demo_sk = c_current_cdemo_sk
AND hd_demo_sk = c_current_hdemo_sk
AND sr_cdemo_sk = cd_demo_sk
ORDER  BY c_customer_id
LIMIT 100
{code}

This query should return 100 rows

commit id is:
1.14.0-SNAPSHOT a77fd142d86dd5648cda8866b8ff3af39c7b6b11        DRILL-6516: 
EMIT support in streaming agg       11.07.2018 @ 18:40:03 PDT       Unknown 
12.07.2018 @ 01:50:37 PDT




> TPCDS-84 Query does not return any rows
> ---------------------------------------
>
>                 Key: DRILL-6605
>                 URL: https://issues.apache.org/jira/browse/DRILL-6605
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Robert Hou
>            Assignee: Vitalii Diravka
>            Priority: Blocker
>             Fix For: 1.14.0
>
>
> Query is:
> Advanced/tpcds/tpcds_sf100/hive/parquet/query84.sql
> This uses the hive parquet reader.
> {code:sql}
> SELECT c_customer_id   AS customer_id,
> c_last_name
> || ', '
> || c_first_name AS customername
> FROM   customer,
> customer_address,
> customer_demographics,
> household_demographics,
> income_band,
> store_returns
> WHERE  ca_city = 'Green Acres'
> AND c_current_addr_sk = ca_address_sk
> AND ib_lower_bound >= 54986
> AND ib_upper_bound <= 54986 + 50000
> AND ib_income_band_sk = hd_income_band_sk
> AND cd_demo_sk = c_current_cdemo_sk
> AND hd_demo_sk = c_current_hdemo_sk
> AND sr_cdemo_sk = cd_demo_sk
> ORDER  BY c_customer_id
> LIMIT 100
> {code}
> This query should return 100 rows
> Here is the explain plan:
> {noformat}
> | 00-00    Screen
> 00-01      Project(customer_id=[$0], customername=[$1])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[100])
> 00-04            SingleMergeExchange(sort0=[0])
> 01-01              OrderedMuxExchange(sort0=[0])
> 02-01                SelectionVectorRemover
> 02-02                  TopN(limit=[100])
> 02-03                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Project(customer_id=[$0], customername=[||(||($5, 
> ', '), $4)])
> 03-02                        Project(c_customer_id=[$1], 
> c_current_cdemo_sk=[$2], c_current_hdemo_sk=[$3], c_current_addr_sk=[$4], 
> c_first_name=[$5], c_last_name=[$6], ca_address_sk=[$8], ca_city=[$9], 
> cd_demo_sk=[$7], hd_demo_sk=[$10], hd_income_band_sk=[$11], 
> ib_income_band_sk=[$12], ib_lower_bound=[$13], ib_upper_bound=[$14], 
> sr_cdemo_sk=[$0])
> 03-03                          HashJoin(condition=[=($7, $0)], 
> joinType=[inner])
> 03-05                            HashToRandomExchange(dist0=[[$0]])
> 04-01                              Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:store_returns), 
> columns=[`sr_cdemo_sk`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/web_returns], 
> confProperties={}]])
> 03-04                            HashToRandomExchange(dist0=[[$6]])
> 05-01                              HashJoin(condition=[=($2, $9)], 
> joinType=[inner])
> 05-03                                HashJoin(condition=[=($3, $7)], 
> joinType=[inner])
> 05-05                                  HashJoin(condition=[=($1, $6)], 
> joinType=[inner])
> 05-07                                    Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer), 
> columns=[`c_customer_id`, `c_current_cdemo_sk`, `c_current_hdemo_sk`, 
> `c_current_addr_sk`, `c_first_name`, `c_last_name`], numPartitions=0, 
> partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer], 
> confProperties={}]])
> 05-06                                    BroadcastExchange
> 06-01                                      Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer_demographics), 
> columns=[`cd_demo_sk`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_demographics],
>  confProperties={}]])
> 05-04                                  BroadcastExchange
> 07-01                                    SelectionVectorRemover
> 07-02                                      Filter(condition=[=($1, 'Green 
> Acres')])
> 07-03                                        Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:customer_address), 
> columns=[`ca_address_sk`, `ca_city`], numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/customer_address],
>  confProperties={}]])
> 05-02                                BroadcastExchange
> 08-01                                  HashJoin(condition=[=($1, $2)], 
> joinType=[inner])
> 08-03                                    Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:household_demographics), 
> columns=[`hd_demo_sk`, `hd_income_band_sk`], numPartitions=0, partitions= 
> null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/household_demographics],
>  confProperties={}]])
> 08-02                                    SelectionVectorRemover
> 08-04                                      Filter(condition=[AND(>=($1, 
> 54986), <=($2, 104986))])
> 08-05                                        Scan(groupscan=[HiveScan 
> [table=Table(dbName:tpcds100_parquet, tableName:income_band), 
> columns=[`ib_income_band_sk`, `ib_lower_bound`, `ib_upper_bound`], 
> numPartitions=0, partitions= null, 
> inputDirectories=[maprfs:/drill/testdata/tpcds_sf100/parquet/income_band], 
> confProperties={}]])
> {noformat}
> commit id is:
> 1.14.0-SNAPSHOT       a77fd142d86dd5648cda8866b8ff3af39c7b6b11        
> DRILL-6516: EMIT support in streaming agg       11.07.2018 @ 18:40:03 PDT     
>   Unknown 12.07.2018 @ 01:50:37 PDT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6605) TPCDS-84 Query does not return any rows

Reply via email to