[
https://issues.apache.org/jira/browse/DRILL-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chun Chang closed DRILL-3062.
-----------------------------
Assignee: Chun Chang (was: Jacques Nadeau)
verified fix. All automated advanced tests passed which included mondrian suite.
> regression: Mondrian query447.q - lots of rows missing in result set
> --------------------------------------------------------------------
>
> Key: DRILL-3062
> URL: https://issues.apache.org/jira/browse/DRILL-3062
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Data Types
> Affects Versions: 1.0.0
> Reporter: Chun Chang
> Assignee: Chun Chang
> Priority: Blocker
> Fix For: 1.0.0
>
> Attachments:
> 0001-DRILL-3062-Rtrim-the-fixed-char-strings-in-the-IN-li.patch
>
>
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select * from sys.version;
> +------------+----------------+-------------+-------------+------------+
> | commit_id | commit_message | commit_time | build_email | build_time |
> +------------+----------------+-------------+-------------+------------+
> | d1526f9462f6817a76631464ff332bb99b3bdf28 | DRILL-2750: Running 1 or more
> queries against Drillbits having insufficient DirectMem renders the Drillbits
> in an unusable state | 13.05.2015 @ 08:47:20 EDT | Unknown | 13.05.2015 @
> 10:44:43 EDT |
> +------------+----------------+-------------+-------------+------------+
> {code}
> Many (total of 42) mondrian queries regressed. All of them missing rows in
> the returned result set.
> Here is an example, query447.q
> {code}
> SELECT time_by_day.the_year AS c0,
> product_class.product_family AS c1,
> customer.state_province AS c2,
> customer.city AS c3,
> Sum(sales_fact_1997.unit_sales) AS m0
> FROM time_by_day AS time_by_day,
> sales_fact_1997 AS sales_fact_1997,
> product_class AS product_class,
> product AS product,
> customer AS customer
> WHERE sales_fact_1997.time_id = time_by_day.time_id
> AND time_by_day.the_year = 1997
> AND sales_fact_1997.product_id = product.product_id
> AND product.product_class_id = product_class.product_class_id
> AND product_class.product_family = 'Drink'
> AND sales_fact_1997.customer_id = customer.customer_id
> AND customer.state_province = 'WA'
> AND customer.city IN ( 'Anacortes', 'Ballard', 'Bellingham',
> 'Bremerton',
> 'Burien', 'Edmonds', 'Everett', 'Issaquah',
> 'Kirkland', 'Lynnwood', 'Marysville',
> 'Olympia',
> 'Port Orchard', 'Puyallup', 'Redmond',
> 'Renton',
> 'Seattle', 'Sedro Woolley', 'Spokane',
> 'Tacoma',
> 'Walla Walla', 'Yakima' )
> GROUP BY time_by_day.the_year,
> product_class.product_family,
> customer.state_province,
> customer.city;
> {code}
> This query should return the following result:
> {code}
> [root@qa-node120 mondrian]# cat query447.e
> 1997 Drink WA Walla Walla 191.0000
> 1997 Drink WA Issaquah 203.0000
> 1997 Drink WA Everett 208.0000
> 1997 Drink WA Olympia 1066.0000
> 1997 Drink WA Edmonds 166.0000
> 1997 Drink WA Bremerton 1160.0000
> 1997 Drink WA Renton 225.0000
> 1997 Drink WA Bellingham 68.0000
> 1997 Drink WA Ballard 214.0000
> 1997 Drink WA Burien 251.0000
> 1997 Drink WA Seattle 168.0000
> 1997 Drink WA Redmond 137.0000
> 1997 Drink WA Lynnwood 201.0000
> 1997 Drink WA Puyallup 1040.0000
> 1997 Drink WA Tacoma 986.0000
> 1997 Drink WA Kirkland 247.0000
> 1997 Drink WA Sedro Woolley 58.0000
> 1997 Drink WA Yakima 1159.0000
> 1997 Drink WA Port Orchard 1128.0000
> 1997 Drink WA Spokane 2238.0000
> 1997 Drink WA Anacortes 82.0000
> 1997 Drink WA Marysville 193.0000
> {code}
> But drill now returns:
> {code}
> 1997 Drink WA Sedro Woolley 58.0000
> {code}
> Here is the plan:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select
> time_by_day.the_year as c0, product_class.product_family as c1,
> customer.state_province as c2, customer.city as c3,
> sum(sales_fact_1997.unit_sales) as m0 from time_by_day as time_by_day,
> sales_fact_1997 as sales_fact_1997, product_class as product_class, product
> as product, customer as customer where sales_fact_1997.time_id =
> time_by_day.time_id and time_by_day.the_year = 1997 and
> sales_fact_1997.product_id = product.product_id and product.product_class_id
> = product_class.product_class_id and product_class.product_family = 'Drink'
> and sales_fact_1997.customer_id = customer.customer_id and
> customer.state_province = 'WA' and customer.city in ('Anacortes', 'Ballard',
> 'Bellingham', 'Bremerton', 'Burien', 'Edmonds', 'Everett', 'Issaquah',
> 'Kirkland', 'Lynnwood', 'Marysville', 'Olympia', 'Port Orchard', 'Puyallup',
> 'Redmond', 'Renton', 'Seattle', 'Sedro Woolley', 'Spokane', 'Tacoma', 'Walla
> Walla', 'Yakima') group by time_by_day.the_year,
> product_class.product_family, customer.state_province, customer.city;
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(c0=[$0], c1=[$1], c2=[$2], c3=[$3], m0=[$4])
> 00-02 HashAgg(group=[{0, 1, 2, 3}], m0=[SUM($4)])
> 00-03 Project(c0=[$0], c1=[$2], c2=[$3], c3=[$4], unit_sales=[$1])
> 00-04 HashJoin(condition=[=($5, $6)], joinType=[inner])
> 00-06 Project(the_year=[$0], unit_sales=[$5],
> product_family=[$6], state_province=[$10], city=[$11], f17=[$11])
> 00-08 Project(the_year=[$4], time_id=[$5], time_id0=[$0],
> product_id=[$1], customer_id=[$2], unit_sales=[$3], product_family=[$8],
> product_class_id=[$9], product_id0=[$6], product_class_id0=[$7],
> state_province=[$10], city=[$11], customer_id0=[$12])
> 00-09 HashJoin(condition=[=($2, $12)], joinType=[inner])
> 00-11 HashJoin(condition=[=($1, $6)], joinType=[inner])
> 00-14 HashJoin(condition=[=($0, $5)], joinType=[inner])
> 00-18 Project(time_id=[$2], product_id=[$0],
> customer_id=[$1], unit_sales=[$3])
> 00-23 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/mondrian/sales_fact_1997]],
> selectionRoot=/drill/testdata/mondrian/sales_fact_1997, numFiles=1,
> columns=[`time_id`, `product_id`, `customer_id`, `unit_sales`]]])
> 00-17 Project(the_year=[$0], time_id0=[$1])
> 00-22 SelectionVectorRemover
> 00-26 Filter(condition=[=($0, 1997)])
> 00-28 Project(the_year=[$1], time_id=[$0])
> 00-30 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/mondrian/time_by_day]],
> selectionRoot=/drill/testdata/mondrian/time_by_day, numFiles=1,
> columns=[`the_year`, `time_id`]]])
> 00-13 Project(product_id0=[$0], product_class_id=[$1],
> product_family=[$2], product_class_id0=[$3])
> 00-16 HashJoin(condition=[=($1, $3)], joinType=[inner])
> 00-21 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/mondrian/product]],
> selectionRoot=/drill/testdata/mondrian/product, numFiles=1,
> columns=[`product_id`, `product_class_id`]]])
> 00-20 Project(product_family=[$0],
> product_class_id0=[$1])
> 00-25 SelectionVectorRemover
> 00-27 Filter(condition=[=($0, 'Drink')])
> 00-29 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath
> [path=maprfs:/drill/testdata/mondrian/product_class]],
> selectionRoot=/drill/testdata/mondrian/product_class, numFiles=1,
> columns=[`product_family`, `product_class_id`]]])
> 00-10 Project(state_province=[$0], city=[$1],
> customer_id0=[$2])
> 00-12 SelectionVectorRemover
> 00-15 Filter(condition=[=($0, 'WA')])
> 00-19 Project(state_province=[$1], city=[$2],
> customer_id=[$0])
> 00-24 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/mondrian/customer]],
> selectionRoot=/drill/testdata/mondrian/customer, numFiles=1,
> columns=[`state_province`, `city`, `customer_id`]]])
> 00-05 HashAgg(group=[{0}])
> 00-07 Values
> | {
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)