[
https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988247#comment-15988247
]
Sahil Takiar commented on HIVE-16513:
-------------------------------------
Updated patch, addressed comments on the RB, some significant changes to the
approach:
* Using {{FunctionRegistry.getCommonClassForComparison}} to find a common
{{TypeInfo}} for {{minValue}}, {{maxValue}}, {{expr}} so that they can all be
compared properly
** Once the proper {{TypeInfo}} is determined,
{{ObjectInspectorConverters.getConverter}} is used to convert each argument to
the correct type
** A {{switch...case}} statement is then used based on the common {{TypeInfo}}
* I changed the approach of the core algorithm for calculating the correct
bucket. The old algorithm relied on the modulus operator, which had issues when
run against doubles; the new algorithm is based on the approach taken by
Postgres
* Added a lot more tests to the qfile
> width_bucket issues
> -------------------
>
> Key: HIVE-16513
> URL: https://issues.apache.org/jira/browse/HIVE-16513
> Project: Hive
> Issue Type: Bug
> Reporter: Carter Shanklin
> Assignee: Sahil Takiar
> Attachments: HIVE-16513.1.patch, HIVE-16513.2.patch
>
>
> width_bucket was recently added with HIVE-15982. This ticket notes a few
> issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles
> are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the
> reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another
> example (that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping
> sets. These queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10)
> from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10)
> from e011_01;
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as
> integer)) from e011_02 group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem
> straightforward, valid, and it's strange that they don't work. Similar
> queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe
> [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing
> functions like this don't need to think about value expression support but we
> don't seem to be there yet.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)