Github user mateiz commented on a diff in the pull request: https://github.com/apache/incubator-spark/pull/636#discussion_r9983025 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -686,6 +649,47 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) } /** + * Output the RDD to any Hadoop-supported storage system with new Hadoop API, using a Hadoop + * Job object for that storage system. The Job should set an OutputFormat and any output paths + * required (e.g. a table name to write to) in the same way as it would be configured for a Hadoop + * MapReduce job. + */ + def saveAsNewAPIHadoopDataset(job: NewAPIHadoopJob) { --- End diff -- In the new Hadoop API, does this really require a Job or just a Configuration? In the old API we only needed a configuration.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---