[ 
https://issues.apache.org/jira/browse/HIVE-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-21496:
----------------------------------------------

    Assignee: Jesus Camacho Rodriguez

> Automatic sizing of unordered buffer can overflow
> -------------------------------------------------
>
>                 Key: HIVE-21496
>                 URL: https://issues.apache.org/jira/browse/HIVE-21496
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 4.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: hive.log
>
>
> HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer 
> based on group by statistics. However, some corner cases for group by 
> statistics sets Long.MAX for data size. This ends up setting Integer.MAX for 
> unordered KV buffer size. This buffer size is expected to be in MB. 
> Converting Integer.MAX value from MB to bytes will overflow and following 
> exception is thrown.
> {code:java}
> 2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}] 
> HistoryEventHandler.criticalEvents: 
> [HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_000000_0, 
> creationTime=1553330117468, allocationTime=1553330117524, 
> startTime=1553330117562, finishTime=1553330117755, timeTaken=193, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, 
> diagnostics=Error: Error while running task ( failure ) : 
> attempt_1553330105749_0001_1_00_000000_0:java.lang.IllegalArgumentException
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
> at 
> org.apache.tez.runtime.common.resources.MemoryDistributor.registerRequest(MemoryDistributor.java:177)
> at 
> org.apache.tez.runtime.common.resources.MemoryDistributor.requestMemory(MemoryDistributor.java:110)
> at 
> org.apache.tez.runtime.api.impl.TezTaskContextImpl.requestInitialMemory(TezTaskContextImpl.java:214)
> at 
> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput.initialize(UnorderedPartitionedKVOutput.java:76)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:537)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:520)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:505)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> Stats for GBY operator is getting Long.MAX_VALUE as seen below
> {code:java}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [0] STATS-TS[0] (logs): numRows: 1795 
> dataSize: 4443078 basicStatsState: PARTIAL colStatsState: NONE colStats: 
> {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> true}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: Estimating row count for 
> GenericUDFOPEqual(Column[severity], Const string ERROR) Original num rows: 
> 1795 New num rows: 5
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [1] STATS-FIL[8]: numRows: 5 dataSize: 
> 12376 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= 
> colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 
> 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> exec.FilterOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats: 
> PARTIAL Column stats: NONE) on: FIL[8]
> 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> exec.SelectOperator: Setting stats (Num rows: 5 Data size: 12376 Basic stats: 
> PARTIAL Column stats: NONE) on: SEL[2]
> 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [1] STATS-SEL[2]: numRows: 5 dataSize: 
> 12376 basicStatsState: PARTIAL colStatsState: NONE colStats: {severity= 
> colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen: 
> 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: STATS-GBY[3]: inputSize: 4443078 
> maxSplitSize: 256000000 parallelism: 1 containsGroupingSet: false 
> sizeOfGroupingSet: 1
> 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [Case 1] STATS-GBY[3]: cardinality: 5
> 2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> exec.GroupByOperator: Setting stats (Num rows: 1 Data size: 
> 9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: GBY[3]
> 2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [0] STATS-GBY[3]: numRows: 1 dataSize: 
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats: 
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 18 
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0 
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> false}
> 2019-03-23T01:35:16,473 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> exec.ReduceSinkOperator: Setting stats (Num rows: 1 Data size: 
> 9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: RS[4]
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [0] STATS-RS[4]: numRows: 1 dataSize: 
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats: 
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 18 
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0 
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> false}
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: STATS-GBY[5]: inputSize: 1 maxSplitSize: 
> 256000000 parallelism: 1 containsGroupingSet: false sizeOfGroupingSet: 1
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [Case 7] STATS-GBY[5]: cardinality: 0
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> stats.StatsUtils: STATS-GBY[5]: Equals 0 in number of rows. 0 rows will be 
> set to 1
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> exec.GroupByOperator: Setting stats (Num rows: 1 Data size: 
> 9223372036854775807 Basic stats: PARTIAL Column stats: NONE) on: GBY[5]
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [0] STATS-GBY[5]: numRows: 1 dataSize: 
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats: 
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 18 
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0 
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> false}
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [0] STATS-FS[7]: numRows: 1 dataSize: 
> 9223372036854775807 basicStatsState: PARTIAL colStatsState: NONE colStats: 
> {severity= colName: severity colType: string countDistincts: 1 numNulls: 36 
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> true, _col0= colName: _col0 colType: bigint countDistincts: 1 numNulls: 0 
> avgColLen: 8.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> false}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to