[ https://issues.apache.org/jira/browse/SPARK-21177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prashant Sharma updated SPARK-21177: ------------------------------------ Description: In short, please use the following shell transcript for the reproducer. {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. scala> def printTimeTaken(str: String, f: () => Unit) { val start = System.nanoTime() f() val end = System.nanoTime() val timetaken = end - start import scala.concurrent.duration._ println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n") } | | | | | | | printTimeTaken: (str: String, f: () => Unit)Unit scala> for(i <- 1 to 100000) {printTimeTaken("time to append to hive:", () => { Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })} Time taken for time to append to hive: is 284 Time taken for time to append to hive: is 211 ... ... Time taken for time to append to hive: is 2615 Time taken for time to append to hive: is 3055 Time taken for time to append to hive: is 22425 .... {code} Why does it matter ? In a streaming job it is not possible to append to hive using this dataframe operation. was: In short, please use the following shell transcript for the reproducer. {code:java} Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. scala> def printTimeTaken(str: String, f: () => Unit) { val start = System.nanoTime() f() val end = System.nanoTime() val timetaken = end - start import scala.concurrent.duration._ println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n") } | | | | | | | printTimeTaken: (str: String, f: () => Unit)Unit scala> for(i <- 1 to 10000) {printTimeTaken("time to append to hive:", () => { Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })} Time taken for time to append to hive: is 284 Time taken for time to append to hive: is 211 ... ... Time taken for time to append to hive: is 2615 Time taken for time to append to hive: is 3055 Time taken for time to append to hive: is 22425 .... {code} Why does it matter ? In a streaming job it is not possible to append to hive using this dataframe operation. > df.saveAsTable slows down linearly, with number of appends > ---------------------------------------------------------- > > Key: SPARK-21177 > URL: https://issues.apache.org/jira/browse/SPARK-21177 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Prashant Sharma > > In short, please use the following shell transcript for the reproducer. > {code:java} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) > Type in expressions to have them evaluated. > Type :help for more information. > scala> def printTimeTaken(str: String, f: () => Unit) { > val start = System.nanoTime() > f() > val end = System.nanoTime() > val timetaken = end - start > import scala.concurrent.duration._ > println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n") > } > | | | | | | | printTimeTaken: (str: > String, f: () => Unit)Unit > scala> > for(i <- 1 to 100000) {printTimeTaken("time to append to hive:", () => { > Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })} > Time taken for time to append to hive: is 284 > Time taken for time to append to hive: is 211 > ... > ... > Time taken for time to append to hive: is 2615 > Time taken for time to append to hive: is 3055 > Time taken for time to append to hive: is 22425 > .... > {code} > Why does it matter ? > In a streaming job it is not possible to append to hive using this dataframe > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org