spark git commit: [MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException

gurwls223 Wed, 14 Nov 2018 16:33:58 -0800

Repository: spark
Updated Branches:
  refs/heads/master ad853c567 -> f6255d7b7



[MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException

## What changes were proposed in this pull request?
It will throw `RuntimeException` when read from bucketed table(about 1.7G per 
bucket file):
![image](https://user-images.githubusercontent.com/5399861/48346889-8041ce00-e6b7-11e8-83b0-ead83fb15821.png)

Default(enable bucket read):
![image](https://user-images.githubusercontent.com/5399861/48347084-2c83b480-e6b8-11e8-913a-9cafc043e9e4.png)

Disable bucket read:
![image](https://user-images.githubusercontent.com/5399861/48347099-3a393a00-e6b8-11e8-94af-cb814e1ba277.png)

The reason is that each bucket file is too big. a workaround is disable bucket 
read. This PR add this workaround to Spark.

## How was this patch tested?

manual tests

Closes #23014 from wangyum/anotherWorkaround.

Authored-by: Yuming Wang <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f6255d7b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f6255d7b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f6255d7b

Branch: refs/heads/master
Commit: f6255d7b7cc4cc5d1f4fe0e5e493a1efee22f38f
Parents: ad853c5
Author: Yuming Wang <[email protected]>
Authored: Thu Nov 15 08:33:06 2018 +0800
Committer: hyukjinkwon <[email protected]>
Committed: Thu Nov 15 08:33:06 2018 +0800

----------------------------------------------------------------------
 .../spark/sql/execution/vectorized/WritableColumnVector.java    | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/f6255d7b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
index b0e119d..4f5e72c 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
@@ -101,10 +101,11 @@ public abstract class WritableColumnVector extends 
ColumnVector {
     String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader (" +
         (requiredCapacity >= 0 ? "requested " + requiredCapacity + " bytes" : 
"integer overflow") +
         "). As a workaround, you can reduce the vectorized reader batch size, 
or disable the " +
-        "vectorized reader. For parquet file format, refer to " +
+        "vectorized reader, or disable " + SQLConf.BUCKETING_ENABLED().key() + 
" if you read " +
+        "from bucket table. For Parquet file format, refer to " +
         SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().key() +
         " (default " + 
SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().defaultValueString() +
-        ") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; for 
orc file format, " +
+        ") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; for 
ORC file format, " +
         "refer to " + SQLConf.ORC_VECTORIZED_READER_BATCH_SIZE().key() +
         " (default " + 
SQLConf.ORC_VECTORIZED_READER_BATCH_SIZE().defaultValueString() +
         ") and " + SQLConf.ORC_VECTORIZED_READER_ENABLED().key() + ".";


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException

Reply via email to