viirya commented on code in PR #493: URL: https://github.com/apache/spark-website/pull/493#discussion_r1426318473
########## examples.md: ########## @@ -36,36 +42,38 @@ In this page, we will show examples using RDD API as well as examples using high <div class="tab-pane tab-pane-python active"> <div class="code code-tab"> {% highlight python %} -text_file = sc.textFile("hdfs://...") -counts = text_file.flatMap(lambda line: line.split(" ")) \ - .map(lambda word: (word, 1)) \ - .reduceByKey(lambda a, b: a + b) -counts.saveAsTextFile("hdfs://...") +df = spark.read.text("hdfs://...").toDF("text") + +df.select(explode(split(col("text"), " ")).alias("word")) \ + .groupBy("word") \ + .agg(count(lit(0)).alias("count")) \ + .write.parquet("hdfs://...") {% endhighlight %} </div> </div> <div class="tab-pane tab-pane-scala"> <div class="code code-tab"> {% highlight scala %} -val textFile = sc.textFile("hdfs://...") -val counts = textFile.flatMap(line => line.split(" ")) - .map(word => (word, 1)) - .reduceByKey(_ + _) -counts.saveAsTextFile("hdfs://...") +val df = spark.read.text("hdfs://...").toDF("text") + +df.select(explode(split(col("text"), " ")).alias("word")) + .groupBy("word") + .agg(count(lit(0)).alias("count")) + .write.parquet("hdfs://...") {% endhighlight %} </div> </div> <div class="tab-pane tab-pane-java"> <div class="code code-tab"> {% highlight java %} -JavaRDD<String> textFile = sc.textFile("hdfs://..."); -JavaPairRDD<String, Integer> counts = textFile - .flatMap(s -> Arrays.asList(s.split(" ")).iterator()) - .mapToPair(word -> new Tuple2<>(word, 1)) - .reduceByKey((a, b) -> a + b); -counts.saveAsTextFile("hdfs://..."); +DataFrame df = spark.read.text("hdfs://...").toDF("text"); Review Comment: Hmm, for Java, isn't it `read()`? ```suggestion DataFrame df = spark.read().text("hdfs://...").toDF("text"); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org