[ https://issues.apache.org/jira/browse/SPARK-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684659#comment-15684659 ]
Cheng Lian edited comment on SPARK-18403 at 11/22/16 6:54 AM: -------------------------------------------------------------- Here is a minimal test case (add it to {{ObjectHashAggregateSuite}}) that can be used to reproduce this issue steadily: {code} test("oom") { withSQLConf( SQLConf.USE_OBJECT_HASH_AGG.key -> "true", SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key -> "1" ) { Seq(Tuple1(Seq.empty[Int])) .toDF("c0") .groupBy(lit(1)) .agg(typed_count($"c0"), max($"c0")) .show() } } {code} What I observed is that the partial aggregation phase produces a malformed {{UnsafeRow}} after applying the {{resultProjection}} [here|https://github.com/apache/spark/blob/07beb5d21c6803e80733149f1560c71cd3cacc86/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala#L254]. When printed, the malformed {{UnsafeRow}} is always {noformat} [0,0,2000000008,2800000008,100000000000000,5a5a5a5a5a5a5a5a] {noformat} The {{5a5a5a5a5a5a5a5a}} is interpreted as the length of an {{ArrayData}}. Therefore, the JVM blows up when trying to allocate a huge array to deep copy this {{ArrayData}} at a later phase. [~sameer] and [~davies], would you mind to have a look at this issue? Thanks! was (Author: lian cheng): Here is a minimal test case (add it to {{ObjectHashAggregateSuite}}) that can be used to reproduce this issue steadily: {code} test("oom") { withSQLConf( SQLConf.USE_OBJECT_HASH_AGG.key -> "true", SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key -> "1" ) { Seq(Tuple1(Seq.empty[Int])) .toDF("c0") .groupBy(lit(1)) .agg(typed_count($"c0"), max($"c0")) .show() } } {code} What I observed is that the partial aggregation phase produces a malformed {{UnsafeRow}} after applying the {{resultProjection}} [here|https://github.com/apache/spark/blob/07beb5d21c6803e80733149f1560c71cd3cacc86/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala#L254]. When printed, the malformed {{UnsafeRow}} is always {noformat} [0,0,2000000008,2800000008,100000000000000,5a5a5a5a5a5a5a5a] {noformat} The {{5a5a5a5a5a5a5a5a}} is interpreted as the length of an {{ArrayData}}. Therefore, the JVM blows up when trying to allocate a huge array to deep copy of this {{ArrayData}} at a later phase. [~sameer] and [~davies], would you mind to have a look at this issue? Thanks! > ObjectHashAggregateSuite is being flaky (occasional OOM errors) > --------------------------------------------------------------- > > Key: SPARK-18403 > URL: https://issues.apache.org/jira/browse/SPARK-18403 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Cheng Lian > Assignee: Cheng Lian > Fix For: 2.2.0 > > > This test suite fails occasionally on Jenkins due to OOM errors. I've already > reproduced it locally but haven't figured out the root cause. > We should probably disable it temporarily before getting it fixed so that it > doesn't break the PR build too often. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org