rosetn commented on a change in pull request #12963:
URL: https://github.com/apache/beam/pull/12963#discussion_r516785479
##########
File path: website/www/site/layouts/partials/section-menu/en/get-started.html
##########
@@ -22,12 +22,13 @@
<li><a href="/get-started/quickstart-go/">Quickstart - Go</a></li>
</ul>
</li>
+<li><a href="/get-started/from-spark/">From Apache Spark</a></li>
Review comment:
The actual Overview page has a Getting Started section you can include
this new doc in.
##########
File path: website/www/site/content/en/get-started/from-spark.md
##########
@@ -0,0 +1,261 @@
+---
+title: "Getting started from Apache Spark"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Getting started from Apache Spark
+
+{{< localstorage language language-py >}}
+
+If you already know [_Apache Spark_](http://spark.apache.org/),
+learning _Apache Beam_ is easy.
Review comment:
WDYT about replacing "is easy" with "is familiar" or removing this
sentence, since you explain the connection in the next sentence?
##########
File path: website/www/site/content/en/get-started/from-spark.md
##########
@@ -0,0 +1,261 @@
+---
+title: "Getting started from Apache Spark"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Getting started from Apache Spark
+
+{{< localstorage language language-py >}}
+
+If you already know [_Apache Spark_](http://spark.apache.org/),
+learning _Apache Beam_ is easy.
+The Beam and Spark APIs are similar, so you already know the basic concepts.
+
+Spark stores data _Spark DataFrames_ for structured data,
+and in _Resilient Distributed Datasets_ (RDD) for unstructured data.
+We are using RDDs for this guide.
+
+A _Spark RDD_ represents a collection of elements,
+while in Beam it's called a _Parallel Collection_ (PCollection).
+A PCollection in Beam does _not_ have any ordering guarantees.
+
+Likewise, a transform in Beam is called a _Parallel Transform_ (PTransform).
+
+Here are some examples of common operations and their equivalent between
PySpark and Beam.
+
+## Overview
+
+Here's a simple example of a PySpark pipeline that takes the numbers from one
to four,
+multiplies them by two, adds all the values together, and prints the result.
+
+{{< highlight py >}}
+import pyspark
+
+sc = pyspark.SparkContext()
+result = (
+ sc.parallelize([1, 2, 3, 4])
+ .map(lambda x: x * 2)
+ .reduce(lambda x, y: x + y)
+)
+print(result)
+{{< /highlight >}}
+
+In Beam you _pipe_ your data through the pipeline using the
+_pipe operator_ `|` like `data | beam.Map(...)` instead of chaining
+methods like `data.map(...)`, but they're doing the same thing.
+
+Here's how an equivalent pipeline looks like in Beam.
Review comment:
Replace "how" with "what"
##########
File path: website/www/site/content/en/get-started/from-spark.md
##########
@@ -0,0 +1,261 @@
+---
+title: "Getting started from Apache Spark"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Getting started from Apache Spark
Review comment:
I'd consider only italicizing the new Spark terms. The font emphasis can
create accessibility issues if it's too frequent.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]