[ https://issues.apache.org/jira/browse/SPARK-28707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905867#comment-16905867 ]
Sujith Chacko commented on SPARK-28707: --------------------------------------- [~jobitmathew] Thanks for reporting the issue, will check the issue and raise PR if required. > Spark SQL select query result size issue > ---------------------------------------- > > Key: SPARK-28707 > URL: https://issues.apache.org/jira/browse/SPARK-28707 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: jobit mathew > Priority: Major > > Spark sql select * from table; query fails with the validation of > spark.driver.maxResultSize. > But select * from table limit 1000; pass with the same table data. > *Test steps* > spark.driver.maxResultSize=5120 in spark-default.conf > 1.Create a table with more than 5KB size in my example 23KB text file with > name consecutive2.txt > local path /opt/jobit/consecutive2.txt > AUS,30.33,CRICKET,1000 > AUS,30.33,CRICKET,1001 > -- > AUS,30.33,CRICKET,1999 > 2.launch spark-sql --master yarn > 3.create table cons5(country String,avg float, sports String, year int) row > format delimited fields terminated by ','; > 4.load data local inpath '/opt/jobit/consecutive2.txt' into table cons5; > 5. select count(*)from cons5; gives 1000; > 6. select * from cons5; > getting a error as mentioned below. > 6.select * from cons5 limit 1000; query and displays the 1000 data .Not > getting any error and query executing successfully. > *ERROR* > select * from cons5; > org.apache.spark.SparkException: Job aborted due to stage failure: Total size > of serialized results of 2 tasks (7.5 KB) is bigger than > spark.driver.maxResultSize (5.0 KB) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > As per my observation limit query also should validate maxResultSize check if > select * does. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org