[ https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Magyar resolved TEZ-4271. -------------------------------- Resolution: Won't Fix Instead of limiting it on the TEZ side we'll increase the range in Hive as part of HIVE-24715. > Add config to limit desiredNumSplits > ------------------------------------ > > Key: TEZ-4271 > URL: https://issues.apache.org/jira/browse/TEZ-4271 > Project: Apache Tez > Issue Type: Bug > Reporter: Attila Magyar > Assignee: Attila Magyar > Priority: Major > > raThere are multiple config parameters (like tez.grouping.min/max-size, > tez.grouping.by-length, tez.grouping.by-count, > tez.grouping.node.local.only) that impacts the number of grouped input splits > but there is no single property for setting an exact top limit on the desired > count. > In Hive the max number of buckets is 4095. During an insert overwrite each > tasks writes its own bucket and when TEZ runs more than 4095 tasks Hive fails > with a bucketId out of range exception. > > When "tez.grouping.by-count" is used then clamping the desiredNumSplits would > be easy. However when "tez.grouping.by-length" is enabled (which is the > default) clamping desiredNumSplits is not enough since TEZ might generate a > few more splits than the desired. > For example: > * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 > is on node0 the other 5 is on node1. > * desiredNumSplits: 4 > * Total size: 100 > * lengthPerGroup: 100 / 4 = 25 > * group0: [node0=>10, node0=>10] > * group1: [node1=>10, node1=>10] > * group2: [node0=>10, node0=>10] > * group2: [node1=>10, node1=>10] > * group4: default-rack=>[node0=>10, node1=>10] > > The lengthPerGroup prevents adding more than 2 splits into the group > resulting 5 groups instead of the 4 desired. > > If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would > generate 3. But we can't assume all splits have the same size (?) > We might need to detect if groupedSplits.size() is greater than desired in > the loop, and redistribute the remaining splits across the existing groups > (either in a round robin fashion or by selecting the smallest), instead of > creating new groups. This might cause existing groups to be converted > rackLocal groups if the node locality of the remaining is different then > locality of the existing groups. > Or doing a second pass after groupedSplits is fully calculated and trying to > merge existing groups. Either way this complicates the logic even further. At > this point I'm not sure what would be the best. [~rajesh.balamohan], > [~t3rmin4t0r] do you have any suggestions? > {code:java} > Error while compiling statement: FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, > vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task failed, > taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: bucketId out of range: 4098 at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of > range: 4098 at > org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at > org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270) > at > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814) > ... 26 more ], TaskAttempt 1 failed, info=[Error: Error while running task ( > failure ) : > attempt_1610498854304_0004_1_00_004098_1:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: bucketId out of range: 4098 at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of > range: 4098 at > org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at > org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270) > at > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814) > ... 26 more ], TaskAttempt 2 failed, info=[Error: Error while running task ( > failure ) : > attempt_1610498854304_0004_1_00_004098_2:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: bucketId out of range: 4098 at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of > range: 4098 at > org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at > org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270) > at > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814) > ... 26 more ], TaskAttempt 3 failed, info=[Error: Error while running task ( > failure ) : > attempt_1610498854304_0004_1_00_004098_3:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing row at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: bucketId out of range: 4098 at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of > range: 4098 at > org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at > org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270) > at > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814) > ... 26 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, > failedTasks:1 killedTasks:3645, Vertex vertex_1610498854304_0004_1_00 [Map 1] > killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, > vertexId=vertex_1610498854304_0004_1_01, diagnostics=[Vertex received Kill > while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, > failedTasks:0 killedTasks:1, Vertex vertex_1610498854304_0004_1_01 [Reducer > 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:1 {code} > > cc: [~abstractdog], [~ashutoshc] > -- This message was sent by Atlassian Jira (v8.3.4#803005)