[
https://issues.apache.org/jira/browse/TEZ-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408235#comment-15408235
]
Zhiyuan Yang commented on TEZ-3402:
-----------------------------------
[~ozawa] Thanks for your work on this! However, LinkedList cannot really fix
this issue, because Java List is designed with max capacity as MAX_INT. If you
see List interface, its size() method returns an int value, which is also
respected by LinkedList(although not fully respected because List.size()
require MAX_INT returned if list contains more than MAX_INT elements while
LinkedList.size() returns an overflowed int). If a LinkedList with more than
MAX_INT elements is returned, TezMapredSplitsGrouper.getGroupedSplits() will
have problem generating an array from returned list.
The problem is TezMapRedSplitsGrouper.getGroupedSplits() always return an array
which cannot have more than MAX_INT elements.
> SplitGrouper: Integer overflow
> ------------------------------
>
> Key: TEZ-3402
> URL: https://issues.apache.org/jira/browse/TEZ-3402
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.8.4
> Reporter: Gopal V
> Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-3402.001.patch
>
>
> Bad configs triggers integer overflow. This is a 5Tb query which tries to
> group by max-size of 4096.
> {code}
> // splits too small to work. Need to override with size.
> int newDesiredNumSplits = (int)(totalLength/minLengthPerGroup) + 1;
> {code}
> {code}
> diagnostics=[Vertex vertex_1470081722620_0072_3_00 [Map 2] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: srvc_fee initializer failed,
> vertex=vertex_1470081722620_0072_3_00 [Map 2],
> java.lang.IllegalArgumentException: Illegal Capacity: -1401168103
> at java.util.ArrayList.<init>(ArrayList.java:156)
> at
> org.apache.hadoop.mapred.split.TezMapredSplitsGrouper.getGroupedSplits(TezMapredSplitsGrouper.java:230)
> at
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.group(SplitGrouper.java:89)
> at
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:168)
> at
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:138)
> at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
> at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)