[jira] [Comment Edited] (SPARK-18403) ObjectHashAggregateSuite is being flaky (occasional OOM errors)

Cheng Lian (JIRA) Mon, 21 Nov 2016 22:55:21 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684659#comment-15684659
 ]


Cheng Lian edited comment on SPARK-18403 at 11/22/16 6:54 AM:
--------------------------------------------------------------

Here is a minimal test case (add it to {{ObjectHashAggregateSuite}}) that can 
be used to reproduce this issue steadily:
{code}
test("oom") {
  withSQLConf(
    SQLConf.USE_OBJECT_HASH_AGG.key -> "true",
    SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key -> "1"
  ) {
    Seq(Tuple1(Seq.empty[Int]))
      .toDF("c0")
      .groupBy(lit(1))
      .agg(typed_count($"c0"), max($"c0"))
      .show()
  }
}
{code}
What I observed is that the partial aggregation phase produces a malformed 
{{UnsafeRow}} after applying the {{resultProjection}} 
[here|https://github.com/apache/spark/blob/07beb5d21c6803e80733149f1560c71cd3cacc86/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala#L254].

When printed, the malformed {{UnsafeRow}} is always
{noformat}
[0,0,2000000008,2800000008,100000000000000,5a5a5a5a5a5a5a5a]
{noformat}
The {{5a5a5a5a5a5a5a5a}} is interpreted as the length of an {{ArrayData}}. 
Therefore, the JVM blows up when trying to allocate a huge array to deep copy 
this {{ArrayData}} at a later phase.

[~sameer] and [~davies], would you mind to have a look at this issue? Thanks!


was (Author: lian cheng):
Here is a minimal test case (add it to {{ObjectHashAggregateSuite}}) that can 
be used to reproduce this issue steadily:
{code}
test("oom") {
  withSQLConf(
    SQLConf.USE_OBJECT_HASH_AGG.key -> "true",
    SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key -> "1"
  ) {
    Seq(Tuple1(Seq.empty[Int]))
      .toDF("c0")
      .groupBy(lit(1))
      .agg(typed_count($"c0"), max($"c0"))
      .show()
  }
}
{code}
What I observed is that the partial aggregation phase produces a malformed 
{{UnsafeRow}} after applying the {{resultProjection}} 
[here|https://github.com/apache/spark/blob/07beb5d21c6803e80733149f1560c71cd3cacc86/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala#L254].

When printed, the malformed {{UnsafeRow}} is always
{noformat}
[0,0,2000000008,2800000008,100000000000000,5a5a5a5a5a5a5a5a]
{noformat}
The {{5a5a5a5a5a5a5a5a}} is interpreted as the length of an {{ArrayData}}. 
Therefore, the JVM blows up when trying to allocate a huge array to deep copy 
of this {{ArrayData}} at a later phase.

[~sameer] and [~davies], would you mind to have a look at this issue? Thanks!

> ObjectHashAggregateSuite is being flaky (occasional OOM errors)
> ---------------------------------------------------------------
>
>                 Key: SPARK-18403
>                 URL: https://issues.apache.org/jira/browse/SPARK-18403
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>             Fix For: 2.2.0
>
>
> This test suite fails occasionally on Jenkins due to OOM errors. I've already 
> reproduced it locally but haven't figured out the root cause.
> We should probably disable it temporarily before getting it fixed so that it 
> doesn't break the PR build too often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18403) ObjectHashAggregateSuite is being flaky (occasional OOM errors)

Reply via email to