[
https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067507#comment-15067507
]
Johan Gustavsson commented on HIVE-12664:
-----------------------------------------
Thanks for the advice, I didn't know about this review board before. I
submitted [https://reviews.apache.org/r/41628/]
> Bug in reduce deduplication optimization causing ArrayOutOfBoundException
> -------------------------------------------------------------------------
>
> Key: HIVE-12664
> URL: https://issues.apache.org/jira/browse/HIVE-12664
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.1, 1.2.1
> Reporter: Johan Gustavsson
> Assignee: Johan Gustavsson
> Attachments: HIVE-12664-1.patch, HIVE-12664.1.patch, HIVE-12664.patch
>
>
> The optimisation check for reduce deduplication only checks the first child
> node for join -and the check itself also contains a major bug- causing
> ArrayOutOfBoundException no matter what.
> Sample data table form:
> ||time||user||host||path||referer||code||agent||size||method||
> |int|string|string|string|string|bigint|string|bigint|string|
> Sample query
> {code:sql}
> SELECT
> t1.host,
> COUNT(DISTINCT t1.`date`) AS login_count,
> MAX(t2.code) AS code,
> unix_timestamp() AS time
> FROM (
> SELECT
> HOST,
> MIN(time) AS DATE
> FROM
> www_access
> WHERE
> HOST IS NOT NULL
> GROUP BY
> HOST
> ) t1
> JOIN (
> SELECT
> HOST,
> MIN(time) AS code
> FROM
> www_access
> WHERE
> HOST IS NOT NULL
> GROUP BY
> HOST
> ) t2
> ON t1.host = t2.host
> GROUP BY
> t1.host
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)