[
https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johan Gustavsson updated HIVE-12664:
------------------------------------
Description:
The optimisation check for reduce deduplication only checks the first child
node for join -and the check itself also contains a major bug- causing
ArrayOutOfBoundException no matter what.
Sample data table form:
time||user||host||path||referer||code||agent||size||method
int|string|string|string|string|bigint|string|bigint|string
Sample query
{code:sql}
SELECT
t1.host,
COUNT(DISTINCT t1.`date`) AS login_count,
MAX(t2.code) AS code,
unix_timestamp() AS time
FROM (
SELECT
HOST,
MIN(time) AS DATE
FROM
www_access
WHERE
HOST IS NOT NULL
GROUP BY
HOST
) t1
JOIN (
SELECT
HOST,
MIN(time) AS code
FROM
www_access
WHERE
HOST IS NOT NULL
GROUP BY
HOST
) t2
ON t1.host = t2.host
GROUP BY
t1.host
{code}
was:The optimisation check for reduce deduplication only checks the first
child node for join -and the check itself also contains a major bug- causing
ArrayOutOfBoundException no matter what.
> Bug in reduce deduplication optimization causing ArrayOutOfBoundException
> -------------------------------------------------------------------------
>
> Key: HIVE-12664
> URL: https://issues.apache.org/jira/browse/HIVE-12664
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.1, 1.2.1
> Reporter: Johan Gustavsson
> Assignee: Johan Gustavsson
> Attachments: HIVE-12664-1.patch, HIVE-12664.1.patch, HIVE-12664.patch
>
>
> The optimisation check for reduce deduplication only checks the first child
> node for join -and the check itself also contains a major bug- causing
> ArrayOutOfBoundException no matter what.
> Sample data table form:
> time||user||host||path||referer||code||agent||size||method
> int|string|string|string|string|bigint|string|bigint|string
> Sample query
> {code:sql}
> SELECT
> t1.host,
> COUNT(DISTINCT t1.`date`) AS login_count,
> MAX(t2.code) AS code,
> unix_timestamp() AS time
> FROM (
> SELECT
> HOST,
> MIN(time) AS DATE
> FROM
> www_access
> WHERE
> HOST IS NOT NULL
> GROUP BY
> HOST
> ) t1
> JOIN (
> SELECT
> HOST,
> MIN(time) AS code
> FROM
> www_access
> WHERE
> HOST IS NOT NULL
> GROUP BY
> HOST
> ) t2
> ON t1.host = t2.host
> GROUP BY
> t1.host
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)