spark git commit: [SPARK-17405] RowBasedKeyValueBatch should use default page size to prevent OOMs

joshrosen Thu, 08 Sep 2016 16:48:06 -0700

Repository: spark
Updated Branches:
  refs/heads/master 78d5d4dd5 -> 722afbb2b



[SPARK-17405] RowBasedKeyValueBatch should use default page size to prevent OOMs

## What changes were proposed in this pull request?

Before this change, we would always allocate 64MB per aggregation task for the 
first-level hash map storage, even when running in low-memory situations such 
as local mode. This changes it to use the memory manager default page size, 
which is automatically reduced from 64MB in these situations.

cc ooq JoshRosen

## How was this patch tested?

Tested manually with `bin/spark-shell --master=local[32]` and verifying that 
`(1 to math.pow(10, 3).toInt).toDF("n").withColumn("m", 'n % 
2).groupBy('m).agg(sum('n)).show` does not crash.

Author: Eric Liang <e...@databricks.com>

Closes #15016 from ericl/sc-4483.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/722afbb2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/722afbb2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/722afbb2

Branch: refs/heads/master
Commit: 722afbb2b33037a30d385a15725f2db5365bd375
Parents: 78d5d4d
Author: Eric Liang <e...@databricks.com>
Authored: Thu Sep 8 16:47:18 2016 -0700
Committer: Josh Rosen <joshro...@databricks.com>
Committed: Thu Sep 8 16:47:18 2016 -0700

----------------------------------------------------------------------
 .../catalyst/expressions/RowBasedKeyValueBatch.java  | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/722afbb2/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
index 4899f85..551443a 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
@@ -37,19 +37,18 @@ import org.slf4j.LoggerFactory;
  * We use `FixedLengthRowBasedKeyValueBatch` if all fields in the key and the 
value are fixed-length
  * data types. Otherwise we use `VariableLengthRowBasedKeyValueBatch`.
  *
- * RowBasedKeyValueBatch is backed by a single page / MemoryBlock (defaults to 
64MB). If the page
- * is full, the aggregate logic should fallback to a second level, larger hash 
map. We intentionally
- * use the single-page design because it simplifies memory address encoding & 
decoding for each
- * key-value pair. Because the maximum capacity for RowBasedKeyValueBatch is 
only 2^16, it is
- * unlikely we need a second page anyway. Filling the page requires an average 
size for key value
- * pairs to be larger than 1024 bytes.
+ * RowBasedKeyValueBatch is backed by a single page / MemoryBlock (ranges from 
1 to 64MB depending
+ * on the system configuration). If the page is full, the aggregate logic 
should fallback to a
+ * second level, larger hash map. We intentionally use the single-page design 
because it simplifies
+ * memory address encoding & decoding for each key-value pair. Because the 
maximum capacity for
+ * RowBasedKeyValueBatch is only 2^16, it is unlikely we need a second page 
anyway. Filling the
+ * page requires an average size for key value pairs to be larger than 1024 
bytes.
  *
  */
 public abstract class RowBasedKeyValueBatch extends MemoryConsumer {
   protected final Logger logger = 
LoggerFactory.getLogger(RowBasedKeyValueBatch.class);
 
   private static final int DEFAULT_CAPACITY = 1 << 16;
-  private static final long DEFAULT_PAGE_SIZE = 64 * 1024 * 1024;
 
   protected final StructType keySchema;
   protected final StructType valueSchema;
@@ -105,7 +104,7 @@ public abstract class RowBasedKeyValueBatch extends 
MemoryConsumer {
     this.keyRow = new UnsafeRow(keySchema.length());
     this.valueRow = new UnsafeRow(valueSchema.length());
 
-    if (!acquirePage(DEFAULT_PAGE_SIZE)) {
+    if (!acquirePage(manager.pageSizeBytes())) {
       page = null;
       recordStartOffset = 0;
     } else {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17405] RowBasedKeyValueBatch should use default page size to prevent OOMs

Reply via email to