[ https://issues.apache.org/jira/browse/SPARK-21105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21105. ---------------------------------- Resolution: Duplicate This is a duplicate of SPARK-10216. > Useless empty files in hive table > --------------------------------- > > Key: SPARK-21105 > URL: https://issues.apache.org/jira/browse/SPARK-21105 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.1 > Reporter: pin_zhang > Priority: Minor > > case class Base(v: Option[Double]) > object EmptyFiles { > > def main(args: Array[String]): Unit = { > val conf = new SparkConf().setAppName("scala").setMaster("local[12]") > val ctx = new SparkContext(conf) > val spark = > SparkSession.builder().enableHiveSupport().config(conf).getOrCreate() > val seq = Seq(Base(Some(1D)), Base(Some(1D))); > val rdd = ctx.makeRDD[Base](seq) > import spark.implicits._ > > rdd.toDS().write.format("json").mode(SaveMode.Append).saveAsTable("EmptyFiles") > } > } > // DataSet create many useless empty files for empty partition > // if insert small RDD into the table many times, which result in too many > empty files, which slow down the query. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org