I think the difference lies somewhere in here: - RDD writes are done with SparkHadoopMapReduceWriter.executeTask, which calls outputMetrics.setRecordsWritten - DF writes are done with InsertIntoHadoopFsRelationCommand.run ? Which I'm not entirely sure how it works.
executeTask appears to be run on the worker, writing a single RDD partition out in a single Spark task. I can grok how it works. I'm not entirely sure where the rubber hits the road for InsertIntoHadoopFsRelationCommand. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org