[
https://issues.apache.org/jira/browse/HIVE-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jesus Camacho Rodriguez reassigned HIVE-21496:
----------------------------------------------
Assignee: Jesus Camacho Rodriguez
> Automatic sizing of unordered buffer can overflow
> -------------------------------------------------
>
> Key: HIVE-21496
> URL: https://issues.apache.org/jira/browse/HIVE-21496
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 4.0.0
> Reporter: Prasanth Jayachandran
> Assignee: Jesus Camacho Rodriguez
> Priority: Major
> Attachments: hive.log
>
>
> HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer
> based on group by statistics. However, some corner cases for group by
> statistics sets Long.MAX for data size. This ends up setting Integer.MAX for
> unordered KV buffer size. This buffer size is expected to be in MB.
> Converting Integer.MAX value from MB to bytes will overflow and following
> exception is thrown.
> {code:java}
> 2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}]
> HistoryEventHandler.criticalEvents:
> [HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]:
> vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_000000_0,
> creationTime=1553330117468, allocationTime=1553330117524,
> startTime=1553330117562, finishTime=1553330117755, timeTaken=193,
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR,
> diagnostics=Error: Error while running task ( failure ) :
> attempt_1553330105749_0001_1_00_000000_0:java.lang.IllegalArgumentException
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
> at
> org.apache.tez.runtime.common.resources.MemoryDistributor.registerRequest(MemoryDistributor.java:177)
> at
> org.apache.tez.runtime.common.resources.MemoryDistributor.requestMemory(MemoryDistributor.java:110)
> at
> org.apache.tez.runtime.api.impl.TezTaskContextImpl.requestInitialMemory(TezTaskContextImpl.java:214)
> at
> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput.initialize(UnorderedPartitionedKVOutput.java:76)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:537)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:520)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:505)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>
> Stats for GBY operator is getting Long.MAX_VALUE as seen below
> {code:java}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [0] STATS-TS[0] (logs): numRows: 1795
> dataSize: 4443078 basicStatsState: PARTIAL colStatsState: NONE colStats:
> {severity= colName: severity colType: string countDistincts: 359 numNulls: 89
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> true}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: Estimating row count for
> GenericUDFOPEqual(Column[severity], Const string ERROR) Original num rows:
> 1795 New num rows: 5
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [1] STATS-FIL[8]: numRows: 5 dataSize:
> 12376 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity=
> colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen:
> 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> exec.FilterOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats:
> PARTIAL Column stats: NONE) on: FIL[8]
> 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> exec.SelectOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats:
> PARTIAL Column stats: NONE) on: SEL[2]
> 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [1] STATS-SEL[2]: numRows: 5 dataSize:
> 12376 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity=
> colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen:
> 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: STATS-GBY[3]: inputSize: 4443078
> maxSplitSize: 256000000 parallelism: 1 containsGroupingSet: false
> sizeOfGroupingSet: 1
> 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [Case 1] STATS-GBY[3]: cardinality: 5
> 2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> exec.GroupByOperator: Setting stats (Num rows: 1 Data size:
> 9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: GBY[3]
> 2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [0] STATS-GBY[3]: numRows: 1 dataSize:
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 18
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> false}
> 2019-03-23T01:35:16,473 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> exec.ReduceSinkOperator: Setting stats (Num rows: 1 Data size:
> 9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: RS[4]
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [0] STATS-RS[4]: numRows: 1 dataSize:
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 18
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> false}
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: STATS-GBY[5]: inputSize: 1 maxSplitSize:
> 256000000 parallelism: 1 containsGroupingSet: false sizeOfGroupingSet: 1
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [Case 7] STATS-GBY[5]: cardinality: 0
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> stats.StatsUtils: STATS-GBY[5]: Equals 0 in number of rows. 0 rows will be
> set to 1
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> exec.GroupByOperator: Setting stats (Num rows: 1 Data size:
> 9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: GBY[5]
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [0] STATS-GBY[5]: numRows: 1 dataSize:
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 18
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> false}
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main]
> annotation.StatsRulesProcFactory: [0] STATS-FS[7]: numRows: 1 dataSize:
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats:
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 36
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated:
> false}{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)