[
https://issues.apache.org/jira/browse/SPARK-28707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jobit mathew updated SPARK-28707:
---------------------------------
Description:
Spark sql select * from table; query fails with the validation of
spark.driver.maxResultSize.
But select * from table limit 1000; pass with the same table data.
*Test steps*
spark.driver.maxResultSize=5120 in spark-default.conf
1.Create a table with more than 5KB size in my example 23KB text file with name
consecutive2.txt
local path /opt/jobit/consecutive2.txt
AUS,30.33,CRICKET,1000
AUS,30.33,CRICKET,1001
--
AUS,30.33,CRICKET,1999
2.launch spark-sql --master yarn
3.create table cons5(country String,avg float, sports String, year int) row
format delimited fields terminated by ',';
4.load data local inpath '/opt/jobit/consecutive2.txt' into table cons5;
5. select count(*)from cons5; gives 1000;
6.select * from cons5 *limit 1000*; query and displays the 1000 data .*Not
getting any error and query executing successfully.*
7. select * from cons5;
getting the error as mentioned below.
*ERROR*
select * from cons5;
*org.apache.spark.SparkException: Job aborted due to stage failure: Total size
of serialized results of 2 tasks (7.5 KB) is bigger than
spark.driver.maxResultSize (5.0 KB)*
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
*As per my observation limit query also should validate maxResultSize check if
select * does.*
was:
Spark sql select * from table; query fails with the validation of
spark.driver.maxResultSize.
But select * from table limit 1000; pass with the same table data.
*Test steps*
spark.driver.maxResultSize=5120 in spark-default.conf
1.Create a table with more than 5KB size in my example 23KB text file with name
consecutive2.txt
local path /opt/jobit/consecutive2.txt
AUS,30.33,CRICKET,1000
AUS,30.33,CRICKET,1001
--
AUS,30.33,CRICKET,1999
2.launch spark-sql --master yarn
3.create table cons5(country String,avg float, sports String, year int) row
format delimited fields terminated by ',';
4.load data local inpath '/opt/jobit/consecutive2.txt' into table cons5;
5. select count(*)from cons5; gives 1000;
6. select * from cons5;
getting a error as mentioned below.
6.select * from cons5 limit 1000; query and displays the 1000 data .Not
getting any error and query executing successfully.
*ERROR*
select * from cons5;
org.apache.spark.SparkException: Job aborted due to stage failure: Total size
of serialized results of 2 tasks (7.5 KB) is bigger than
spark.driver.maxResultSize (5.0 KB)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
As per my observation limit query also should validate maxResultSize check if
select * does.
> Spark SQL select query result size issue
> ----------------------------------------
>
> Key: SPARK-28707
> URL: https://issues.apache.org/jira/browse/SPARK-28707
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0
> Reporter: jobit mathew
> Priority: Major
>
> Spark sql select * from table; query fails with the validation of
> spark.driver.maxResultSize.
> But select * from table limit 1000; pass with the same table data.
> *Test steps*
> spark.driver.maxResultSize=5120 in spark-default.conf
> 1.Create a table with more than 5KB size in my example 23KB text file with
> name consecutive2.txt
> local path /opt/jobit/consecutive2.txt
> AUS,30.33,CRICKET,1000
> AUS,30.33,CRICKET,1001
> --
> AUS,30.33,CRICKET,1999
> 2.launch spark-sql --master yarn
> 3.create table cons5(country String,avg float, sports String, year int) row
> format delimited fields terminated by ',';
> 4.load data local inpath '/opt/jobit/consecutive2.txt' into table cons5;
> 5. select count(*)from cons5; gives 1000;
> 6.select * from cons5 *limit 1000*; query and displays the 1000 data .*Not
> getting any error and query executing successfully.*
> 7. select * from cons5;
> getting the error as mentioned below.
> *ERROR*
> select * from cons5;
> *org.apache.spark.SparkException: Job aborted due to stage failure: Total
> size of serialized results of 2 tasks (7.5 KB) is bigger than
> spark.driver.maxResultSize (5.0 KB)*
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> *As per my observation limit query also should validate maxResultSize check
> if select * does.*
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]