[ 
https://issues.apache.org/jira/browse/DRILL-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang closed DRILL-3062.
-----------------------------
    Assignee: Chun Chang  (was: Jacques Nadeau)

verified fix. All automated advanced tests passed which included mondrian suite.

> regression: Mondrian query447.q - lots of rows missing in result set
> --------------------------------------------------------------------
>
>                 Key: DRILL-3062
>                 URL: https://issues.apache.org/jira/browse/DRILL-3062
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.0.0
>            Reporter: Chun Chang
>            Assignee: Chun Chang
>            Priority: Blocker
>             Fix For: 1.0.0
>
>         Attachments: 
> 0001-DRILL-3062-Rtrim-the-fixed-char-strings-in-the-IN-li.patch
>
>
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select * from sys.version;
> +------------+----------------+-------------+-------------+------------+
> | commit_id  | commit_message | commit_time | build_email | build_time |
> +------------+----------------+-------------+-------------+------------+
> | d1526f9462f6817a76631464ff332bb99b3bdf28 | DRILL-2750: Running 1 or more 
> queries against Drillbits having insufficient DirectMem renders the Drillbits 
> in an unusable state | 13.05.2015 @ 08:47:20 EDT | Unknown     | 13.05.2015 @ 
> 10:44:43 EDT |
> +------------+----------------+-------------+-------------+------------+
> {code}
> Many (total of 42) mondrian queries regressed. All of them missing rows in 
> the returned result set.
> Here is an example, query447.q
> {code}
> SELECT time_by_day.the_year            AS c0, 
>        product_class.product_family    AS c1, 
>        customer.state_province         AS c2, 
>        customer.city                   AS c3, 
>        Sum(sales_fact_1997.unit_sales) AS m0 
> FROM   time_by_day AS time_by_day, 
>        sales_fact_1997 AS sales_fact_1997, 
>        product_class AS product_class, 
>        product AS product, 
>        customer AS customer 
> WHERE  sales_fact_1997.time_id = time_by_day.time_id 
>        AND time_by_day.the_year = 1997 
>        AND sales_fact_1997.product_id = product.product_id 
>        AND product.product_class_id = product_class.product_class_id 
>        AND product_class.product_family = 'Drink' 
>        AND sales_fact_1997.customer_id = customer.customer_id 
>        AND customer.state_province = 'WA' 
>        AND customer.city IN ( 'Anacortes', 'Ballard', 'Bellingham', 
> 'Bremerton', 
>                               'Burien', 'Edmonds', 'Everett', 'Issaquah', 
>                               'Kirkland', 'Lynnwood', 'Marysville', 
> 'Olympia', 
>                               'Port Orchard', 'Puyallup', 'Redmond', 
> 'Renton', 
>                               'Seattle', 'Sedro Woolley', 'Spokane', 
> 'Tacoma', 
>                               'Walla Walla', 'Yakima' ) 
> GROUP  BY time_by_day.the_year, 
>           product_class.product_family, 
>           customer.state_province, 
>           customer.city; 
> {code}
> This query should return the following result:
> {code}
> [root@qa-node120 mondrian]# cat query447.e
> 1997  Drink   WA      Walla Walla     191.0000
> 1997  Drink   WA      Issaquah        203.0000
> 1997  Drink   WA      Everett 208.0000
> 1997  Drink   WA      Olympia 1066.0000
> 1997  Drink   WA      Edmonds 166.0000
> 1997  Drink   WA      Bremerton       1160.0000
> 1997  Drink   WA      Renton  225.0000
> 1997  Drink   WA      Bellingham      68.0000
> 1997  Drink   WA      Ballard 214.0000
> 1997  Drink   WA      Burien  251.0000
> 1997  Drink   WA      Seattle 168.0000
> 1997  Drink   WA      Redmond 137.0000
> 1997  Drink   WA      Lynnwood        201.0000
> 1997  Drink   WA      Puyallup        1040.0000
> 1997  Drink   WA      Tacoma  986.0000
> 1997  Drink   WA      Kirkland        247.0000
> 1997  Drink   WA      Sedro Woolley   58.0000
> 1997  Drink   WA      Yakima  1159.0000
> 1997  Drink   WA      Port Orchard    1128.0000
> 1997  Drink   WA      Spokane 2238.0000
> 1997  Drink   WA      Anacortes       82.0000
> 1997  Drink   WA      Marysville      193.0000
> {code}
> But drill now returns:
> {code}
> 1997  Drink   WA      Sedro Woolley   58.0000
> {code}
> Here is the plan:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select 
> time_by_day.the_year as c0, product_class.product_family as c1, 
> customer.state_province as c2, customer.city as c3, 
> sum(sales_fact_1997.unit_sales) as m0 from time_by_day as time_by_day, 
> sales_fact_1997 as sales_fact_1997, product_class as product_class, product 
> as product, customer as customer where sales_fact_1997.time_id = 
> time_by_day.time_id and time_by_day.the_year = 1997 and 
> sales_fact_1997.product_id = product.product_id and product.product_class_id 
> = product_class.product_class_id and product_class.product_family = 'Drink' 
> and sales_fact_1997.customer_id = customer.customer_id and 
> customer.state_province = 'WA' and customer.city in ('Anacortes', 'Ballard', 
> 'Bellingham', 'Bremerton', 'Burien', 'Edmonds', 'Everett', 'Issaquah', 
> 'Kirkland', 'Lynnwood', 'Marysville', 'Olympia', 'Port Orchard', 'Puyallup', 
> 'Redmond', 'Renton', 'Seattle', 'Sedro Woolley', 'Spokane', 'Tacoma', 'Walla 
> Walla', 'Yakima') group by time_by_day.the_year, 
> product_class.product_family, customer.state_province, customer.city;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(c0=[$0], c1=[$1], c2=[$2], c3=[$3], m0=[$4])
> 00-02        HashAgg(group=[{0, 1, 2, 3}], m0=[SUM($4)])
> 00-03          Project(c0=[$0], c1=[$2], c2=[$3], c3=[$4], unit_sales=[$1])
> 00-04            HashJoin(condition=[=($5, $6)], joinType=[inner])
> 00-06              Project(the_year=[$0], unit_sales=[$5], 
> product_family=[$6], state_province=[$10], city=[$11], f17=[$11])
> 00-08                Project(the_year=[$4], time_id=[$5], time_id0=[$0], 
> product_id=[$1], customer_id=[$2], unit_sales=[$3], product_family=[$8], 
> product_class_id=[$9], product_id0=[$6], product_class_id0=[$7], 
> state_province=[$10], city=[$11], customer_id0=[$12])
> 00-09                  HashJoin(condition=[=($2, $12)], joinType=[inner])
> 00-11                    HashJoin(condition=[=($1, $6)], joinType=[inner])
> 00-14                      HashJoin(condition=[=($0, $5)], joinType=[inner])
> 00-18                        Project(time_id=[$2], product_id=[$0], 
> customer_id=[$1], unit_sales=[$3])
> 00-23                          Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/mondrian/sales_fact_1997]], 
> selectionRoot=/drill/testdata/mondrian/sales_fact_1997, numFiles=1, 
> columns=[`time_id`, `product_id`, `customer_id`, `unit_sales`]]])
> 00-17                        Project(the_year=[$0], time_id0=[$1])
> 00-22                          SelectionVectorRemover
> 00-26                            Filter(condition=[=($0, 1997)])
> 00-28                              Project(the_year=[$1], time_id=[$0])
> 00-30                                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/mondrian/time_by_day]], 
> selectionRoot=/drill/testdata/mondrian/time_by_day, numFiles=1, 
> columns=[`the_year`, `time_id`]]])
> 00-13                      Project(product_id0=[$0], product_class_id=[$1], 
> product_family=[$2], product_class_id0=[$3])
> 00-16                        HashJoin(condition=[=($1, $3)], joinType=[inner])
> 00-21                          Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/mondrian/product]], 
> selectionRoot=/drill/testdata/mondrian/product, numFiles=1, 
> columns=[`product_id`, `product_class_id`]]])
> 00-20                          Project(product_family=[$0], 
> product_class_id0=[$1])
> 00-25                            SelectionVectorRemover
> 00-27                              Filter(condition=[=($0, 'Drink')])
> 00-29                                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:/drill/testdata/mondrian/product_class]], 
> selectionRoot=/drill/testdata/mondrian/product_class, numFiles=1, 
> columns=[`product_family`, `product_class_id`]]])
> 00-10                    Project(state_province=[$0], city=[$1], 
> customer_id0=[$2])
> 00-12                      SelectionVectorRemover
> 00-15                        Filter(condition=[=($0, 'WA')])
> 00-19                          Project(state_province=[$1], city=[$2], 
> customer_id=[$0])
> 00-24                            Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/mondrian/customer]], 
> selectionRoot=/drill/testdata/mondrian/customer, numFiles=1, 
> columns=[`state_province`, `city`, `customer_id`]]])
> 00-05              HashAgg(group=[{0}])
> 00-07                Values
>  | {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to