[ 
https://issues.apache.org/jira/browse/HIVE-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941922#comment-15941922
 ] 

Pengcheng Xiong commented on HIVE-15640:
----------------------------------------

I am deferring this to Hive 3.0 as we are going to cut the first RC and it is 
not marked as blocker. Please feel free to commit to the branch if this can be 
resolved before the release.

> Hive/Druid integration: null handling for metrics
> -------------------------------------------------
>
>                 Key: HIVE-15640
>                 URL: https://issues.apache.org/jira/browse/HIVE-15640
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> Null values for metrics in Druid and Hive are not handled the same way (_0.0_ 
> vs _NULL_).
> In Druid:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
> sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
> 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006       1999-11-03 00:00:00     0.0     0.0
> 9011009       1999-11-05 00:00:00     0.0     0.0
> 8003009       1999-11-03 00:00:00     11.0    1.0299999713897705
> 10005014      1999-11-05 00:00:00     86.0    1.100000023841858
> 6008007       1999-11-09 00:00:00     81.0    1.3700000047683716
> 6003003       1999-11-08 00:00:00     45.0    1.600000023841858
> 8008009       1999-11-08 00:00:00     98.0    1.7100000381469727
> 8015003       1999-11-02 00:00:00     10.0    1.7400000095367432
> 8004008       1999-11-10 00:00:00     45.0    1.7599999904632568
> 8009009       1999-11-07 00:00:00     81.0    1.7699999809265137
> {code}
> In Hive:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
> sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset_hive
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
> 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006       1999-11-03 00:00:00     NULL    NULL
> 9011009       1999-11-05 00:00:00     NULL    NULL
> 8003009       1999-11-03 00:00:00     11      1.03
> 10005014      1999-11-05 00:00:00     86      1.1
> 6008007       1999-11-09 00:00:00     81      1.37
> 6003003       1999-11-08 00:00:00     45      1.6
> 8008009       1999-11-08 00:00:00     98      1.71
> 8015003       1999-11-02 00:00:00     10      1.74
> 8004008       1999-11-10 00:00:00     45      1.76
> 8009009       1999-11-07 00:00:00     81      1.77
> {code}
> However, for Druid dimensions, NULL values seem to be handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to