[ 
https://issues.apache.org/jira/browse/HIVE-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941918#comment-15941918
 ] 

Pengcheng Xiong commented on HIVE-15635:
----------------------------------------

I am deferring this to Hive 3.0 as we are going to cut the first RC and it is 
not marked as blocker. Please feel free to commit to the branch if this can be 
resolved before the release.

> Hive/Druid integration: timeseries query shows all days, even if no data
> ------------------------------------------------------------------------
>
>                 Key: HIVE-15635
>                 URL: https://issues.apache.org/jira/browse/HIVE-15635
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> We should have consistent results on Druid vs Hive. However, following query 
> is transformed into timeseries Druid query which yields different results in 
> Druid, since it will show all values for the given time granularity, even if 
> there is no data for the given _i\_brand\_id_.
> In Druid:
> {code:sql}
> SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), 
> sum(ss_wholesale_cost)
> FROM store_sales_sold_time_subset
> WHERE i_brand_id = 10001009
> GROUP BY floor_day(`__time`)
> ORDER BY `granularity`;
> OK
> 1999-11-01 00:00:00   45      37.47
> 1999-11-02 00:00:00   -9223372036854775808    0.0
> 1999-11-03 00:00:00   -9223372036854775808    0.0
> 1999-11-04 00:00:00   39      61.52
> 1999-11-05 00:00:00   74      145.84
> 1999-11-06 00:00:00   62      14.5
> 1999-11-07 00:00:00   -9223372036854775808    0.0
> 1999-11-08 00:00:00   5       34.08
> 1999-11-09 00:00:00   -9223372036854775808    0.0
> 1999-11-10 00:00:00   -9223372036854775808    0.0
> 1999-11-11 00:00:00   -9223372036854775808    0.0
> 1999-11-12 00:00:00   66      67.22
> 1999-11-13 00:00:00   -9223372036854775808    0.0
> 1999-11-14 00:00:00   -9223372036854775808    0.0
> 1999-11-15 00:00:00   -9223372036854775808    0.0
> 1999-11-16 00:00:00   60      96.37
> 1999-11-17 00:00:00   50      79.11
> 1999-11-18 00:00:00   -9223372036854775808    0.0
> 1999-11-19 00:00:00   -9223372036854775808    0.0
> 1999-11-20 00:00:00   -9223372036854775808    0.0
> 1999-11-21 00:00:00   -9223372036854775808    0.0
> 1999-11-22 00:00:00   -9223372036854775808    0.0
> 1999-11-23 00:00:00   57      17.69
> 1999-11-24 00:00:00   -9223372036854775808    0.0
> 1999-11-25 00:00:00   -9223372036854775808    0.0
> 1999-11-26 00:00:00   -9223372036854775808    0.0
> 1999-11-27 00:00:00   86      91.59
> 1999-11-28 00:00:00   -9223372036854775808    0.0
> 1999-11-29 00:00:00   93      136.48
> 1999-11-30 00:00:00   -9223372036854775808    0.0
> {code}
> In Hive:
> {code:sql}
> SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), 
> sum(ss_wholesale_cost)
> FROM store_sales_sold_time_subset_hive
> WHERE i_brand_id = 10001009
> GROUP BY floor_day(`__time`)
> ORDER BY `granularity`;
> OK
> 1999-11-01 00:00:00   45      37.47
> 1999-11-04 00:00:00   39      61.52
> 1999-11-05 00:00:00   74      145.84
> 1999-11-06 00:00:00   62      14.5
> 1999-11-08 00:00:00   5       34.08
> 1999-11-12 00:00:00   66      67.22
> 1999-11-16 00:00:00   60      96.36999999999999
> 1999-11-17 00:00:00   50      79.11
> 1999-11-23 00:00:00   57      17.689999999999998
> 1999-11-27 00:00:00   86      91.59
> 1999-11-29 00:00:00   93      136.48
> {code}
> Probably we should handle this in the _timeseries_ record reader.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to