[
https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863337#comment-15863337
]
Hive QA commented on HIVE-15872:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852309/HIVE-15872.4.patch
{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10223 tests
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out)
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_percentile_approx_23]
(batchId=31)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
(batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14]
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23]
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
(batchId=125)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3515/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3515/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3515/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12852309 - PreCommit-HIVE-Build
> The PERCENTILE_APPROX UDAF does not work with empty set
> -------------------------------------------------------
>
> Key: HIVE-15872
> URL: https://issues.apache.org/jira/browse/HIVE-15872
> Project: Hive
> Issue Type: Bug
> Components: UDF
> Reporter: Chaozhong Yang
> Assignee: Chaozhong Yang
> Fix For: 2.1.2
>
> Attachments: HIVE-15872.2.patch, HIVE-15872.3.patch,
> HIVE-15872.4.patch, HIVE-15872.patch
>
>
> 1. Original SQL:
> {code}
> select
> percentile_approx(
> column0,
> array(0.50, 0.70, 0.90, 0.95, 0.99)
> )
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> {code}
> 2. Exception StackTrace:
> {code}
> Error: java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,10000.0]}} at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,10000.0]}} at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
> ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
> ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:653) at
> java.util.ArrayList.get(ArrayList.java:429) at
> org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134)
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318)
> at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
> at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
> ... 8 more
> {code}
> 3. review data:
> {code}
> select
> column0
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> {code}
> After run this sql, we found the result is NULL.
> 4. what's the meaning of [0.0, 10000.0] in stacktrace?
> In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an
> ArrayList which name is partialHistogram. Normally, the basic structure of
> partialHistogram is [npercentiles, percentile0, percentile1..., nbins,
> bin0.x, bin0.y, bin1.x, bin1.y,...]. However, if we process NULL(empty set)
> column values, the partialHistoram will only contains [npercentiles(0),
> nbins(10000)]. That's the reason why the stacktrace shows a strange row data:
> {"key":{},"value":{"_col0":[0.0,10000.0]}}
> Before we call histogram#merge (on-line hisgoram algorithm from paper:
> http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the
> partialHistogram should remove elements which store percentiles like
> `partialHistogram.subList(0, nquantiles+1).clear();`. In the case of empty
> set, GenericUDAFPercentileApproxEvaluator will not remove percentiles.
> Consequently, NumericHistogram will merge a list which contains only 2
> elements([0, 10000.0]) and throws IndexOutOfBoundsException.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)