[ 
https://issues.apache.org/jira/browse/HIVE-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15635:
-----------------------------------
    Target Version/s: 3.0.0  (was: 2.2.0)

> Hive/Druid integration: timeseries query shows all days, even if no data
> ------------------------------------------------------------------------
>
>                 Key: HIVE-15635
>                 URL: https://issues.apache.org/jira/browse/HIVE-15635
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> We should have consistent results on Druid vs Hive. However, following query 
> is transformed into timeseries Druid query which yields different results in 
> Druid, since it will show all values for the given time granularity, even if 
> there is no data for the given _i\_brand\_id_.
> In Druid:
> {code:sql}
> SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), 
> sum(ss_wholesale_cost)
> FROM store_sales_sold_time_subset
> WHERE i_brand_id = 10001009
> GROUP BY floor_day(`__time`)
> ORDER BY `granularity`;
> OK
> 1999-11-01 00:00:00   45      37.47
> 1999-11-02 00:00:00   -9223372036854775808    0.0
> 1999-11-03 00:00:00   -9223372036854775808    0.0
> 1999-11-04 00:00:00   39      61.52
> 1999-11-05 00:00:00   74      145.84
> 1999-11-06 00:00:00   62      14.5
> 1999-11-07 00:00:00   -9223372036854775808    0.0
> 1999-11-08 00:00:00   5       34.08
> 1999-11-09 00:00:00   -9223372036854775808    0.0
> 1999-11-10 00:00:00   -9223372036854775808    0.0
> 1999-11-11 00:00:00   -9223372036854775808    0.0
> 1999-11-12 00:00:00   66      67.22
> 1999-11-13 00:00:00   -9223372036854775808    0.0
> 1999-11-14 00:00:00   -9223372036854775808    0.0
> 1999-11-15 00:00:00   -9223372036854775808    0.0
> 1999-11-16 00:00:00   60      96.37
> 1999-11-17 00:00:00   50      79.11
> 1999-11-18 00:00:00   -9223372036854775808    0.0
> 1999-11-19 00:00:00   -9223372036854775808    0.0
> 1999-11-20 00:00:00   -9223372036854775808    0.0
> 1999-11-21 00:00:00   -9223372036854775808    0.0
> 1999-11-22 00:00:00   -9223372036854775808    0.0
> 1999-11-23 00:00:00   57      17.69
> 1999-11-24 00:00:00   -9223372036854775808    0.0
> 1999-11-25 00:00:00   -9223372036854775808    0.0
> 1999-11-26 00:00:00   -9223372036854775808    0.0
> 1999-11-27 00:00:00   86      91.59
> 1999-11-28 00:00:00   -9223372036854775808    0.0
> 1999-11-29 00:00:00   93      136.48
> 1999-11-30 00:00:00   -9223372036854775808    0.0
> {code}
> In Hive:
> {code:sql}
> SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), 
> sum(ss_wholesale_cost)
> FROM store_sales_sold_time_subset_hive
> WHERE i_brand_id = 10001009
> GROUP BY floor_day(`__time`)
> ORDER BY `granularity`;
> OK
> 1999-11-01 00:00:00   45      37.47
> 1999-11-04 00:00:00   39      61.52
> 1999-11-05 00:00:00   74      145.84
> 1999-11-06 00:00:00   62      14.5
> 1999-11-08 00:00:00   5       34.08
> 1999-11-12 00:00:00   66      67.22
> 1999-11-16 00:00:00   60      96.36999999999999
> 1999-11-17 00:00:00   50      79.11
> 1999-11-23 00:00:00   57      17.689999999999998
> 1999-11-27 00:00:00   86      91.59
> 1999-11-29 00:00:00   93      136.48
> {code}
> Probably we should handle this in the _timeseries_ record reader.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to