Jesus Camacho Rodriguez created HIVE-15640:
----------------------------------------------

             Summary: Hive/Druid integration: null handling for metrics
                 Key: HIVE-15640
                 URL: https://issues.apache.org/jira/browse/HIVE-15640
             Project: Hive
          Issue Type: Bug
          Components: Druid integration
    Affects Versions: 2.2.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez
            Priority: Critical


Null values for metrics in Druid and Hive are not handled the same way (_0.0_ 
vs _NULL_).

In Druid:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
6015006 1999-11-03 00:00:00     0.0     0.0
9011009 1999-11-05 00:00:00     0.0     0.0
8003009 1999-11-03 00:00:00     11.0    1.0299999713897705
10005014        1999-11-05 00:00:00     86.0    1.100000023841858
6008007 1999-11-09 00:00:00     81.0    1.3700000047683716
6003003 1999-11-08 00:00:00     45.0    1.600000023841858
8008009 1999-11-08 00:00:00     98.0    1.7100000381469727
8015003 1999-11-02 00:00:00     10.0    1.7400000095367432
8004008 1999-11-10 00:00:00     45.0    1.7599999904632568
8009009 1999-11-07 00:00:00     81.0    1.7699999809265137
{code}

In Hive:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset_hive
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
6015006 1999-11-03 00:00:00     NULL    NULL
9011009 1999-11-05 00:00:00     NULL    NULL
8003009 1999-11-03 00:00:00     11      1.03
10005014        1999-11-05 00:00:00     86      1.1
6008007 1999-11-09 00:00:00     81      1.37
6003003 1999-11-08 00:00:00     45      1.6
8008009 1999-11-08 00:00:00     98      1.71
8015003 1999-11-02 00:00:00     10      1.74
8004008 1999-11-10 00:00:00     45      1.76
8009009 1999-11-07 00:00:00     81      1.77
{code}

However, for Druid dimensions, NULL values seem to be handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to