Re: [PR] [SPARK-46390] Update the `Example` page [spark-website]

via GitHub Wed, 13 Dec 2023 23:30:03 -0800


viirya commented on code in PR #493:
URL: https://github.com/apache/spark-website/pull/493#discussion_r1426318473



##########
examples.md:
##########
@@ -36,36 +42,38 @@ In this page, we will show examples using RDD API as well 
as examples using high
 <div class="tab-pane tab-pane-python active">
 <div class="code code-tab">
 {% highlight python %}
-text_file = sc.textFile("hdfs://...")
-counts = text_file.flatMap(lambda line: line.split(" ")) \
-             .map(lambda word: (word, 1)) \
-             .reduceByKey(lambda a, b: a + b)
-counts.saveAsTextFile("hdfs://...")
+df = spark.read.text("hdfs://...").toDF("text")
+
+df.select(explode(split(col("text"), " ")).alias("word")) \
+  .groupBy("word") \
+  .agg(count(lit(0)).alias("count")) \
+  .write.parquet("hdfs://...")
 {% endhighlight %}
 </div>
 </div>
 
 <div class="tab-pane tab-pane-scala">
 <div class="code code-tab">
 {% highlight scala %}
-val textFile = sc.textFile("hdfs://...")
-val counts = textFile.flatMap(line => line.split(" "))
-                 .map(word => (word, 1))
-                 .reduceByKey(_ + _)
-counts.saveAsTextFile("hdfs://...")
+val df = spark.read.text("hdfs://...").toDF("text")
+
+df.select(explode(split(col("text"), " ")).alias("word"))
+  .groupBy("word")
+  .agg(count(lit(0)).alias("count"))
+  .write.parquet("hdfs://...")
 {% endhighlight %}
 </div>
 </div>
 
 <div class="tab-pane tab-pane-java">
 <div class="code code-tab">
 {% highlight java %}
-JavaRDD<String> textFile = sc.textFile("hdfs://...");
-JavaPairRDD<String, Integer> counts = textFile
-    .flatMap(s -> Arrays.asList(s.split(" ")).iterator())
-    .mapToPair(word -> new Tuple2<>(word, 1))
-    .reduceByKey((a, b) -> a + b);
-counts.saveAsTextFile("hdfs://...");
+DataFrame df = spark.read.text("hdfs://...").toDF("text");

Review Comment:
   Hmm, for Java, isn't it `read()`?
   ```suggestion
   DataFrame df = spark.read().text("hdfs://...").toDF("text");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46390] Update the `Example` page [spark-website]

Reply via email to