[ https://issues.apache.org/jira/browse/HIVE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795423#comment-16795423 ]
Hive QA commented on HIVE-21466: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12962854/HIVE-21466.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 15833 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[udaf_invalid_place] (batchId=99) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query28] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query44] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query8] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[mv_query44] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query28] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query44] (batchId=275) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query8] (batchId=275) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16561/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16561/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16561/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12962854 - PreCommit-HIVE-Build > Increase Default Size of SPLIT_MAXSIZE > -------------------------------------- > > Key: HIVE-21466 > URL: https://issues.apache.org/jira/browse/HIVE-21466 > Project: Hive > Issue Type: Improvement > Components: Configuration > Affects Versions: 4.0.0, 3.2.0 > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Minor > Attachments: HIVE-21466.1.patch > > > {code:java} > MAPREDMAXSPLITSIZE(FileInputFormat.SPLIT_MAXSIZE, 256000000L, "", true), > {code} > [https://github.com/apache/hive/blob/8d4300a02691777fc96f33861ed27e64fed72f2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L682] > This field specifies a maximum size for each MR (maybe other?) splits. > This number should be a multiple of the HDFS Block size. The way that this > maximum is implemented, is that each block is added to the split, and if the > split grows to be larger than the maximum allowed, the split is submitted to > the cluster and a new split is opened. > So, imagine the following scenario: > * HDFS block size of 16 bytes > * Maximum size of 40 bytes > This will produce a split with 3 blocks. (2x16) = 32; another block will be > inserted, (3x16) = 48 bytes in the split. So, while many operators would > assume a split of 2 blocks, the actual is 3 blocks. Setting the maximum split > size to a multiple of the HDFS block size will make this behavior less > confusing. > The current setting is ~256MB and when this was introduced, the default HDFS > block size was 64MB. That is a factor of 4x. However, now HDFS block sizes > are 128MB by default, so I propose setting this to 4x128MB. The larger > splits (fewer tasks) should give a nice performance boost for modern hardware. -- This message was sent by Atlassian JIRA (v7.6.3#76005)