Hi, Just noticed that the input DataFrame is collect'ed and then parallelize'd simply to show it to the console [1]. Why so many fairly expensive operations for show?
I'd appreciate some help understanding this code. Thanks. [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L51-L53 Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org