[spark] branch branch-3.4 updated: [SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Connect content

gurwls223 Tue, 14 Mar 2023 20:34:16 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new cc662875cee [SPARK-42797][CONNECT][DOCS] Grammatical improvements for 
Spark Connect content
cc662875cee is described below

commit cc662875cee3cccc67c3ee2a30f0d44d5b618ac8
Author: Allan Folting <[email protected]>
AuthorDate: Wed Mar 15 12:33:45 2023 +0900

    [SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Connect 
content
    
    ### What changes were proposed in this pull request?
    Grammatical improvements to the Spark Connect content as a follow-up on 
https://github.com/apache/spark/pull/40324/
    
    ### Why are the changes needed?
    To improve readability of the pages.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes, user-facing documentation is updated.
    
    ### How was this patch tested?
    Built the doc website locally and checked the updates.
    PRODUCTION=1 SKIP_RDOC=1 bundle exec jekyll build
    
    Closes #40428 from allanf-db/connect_overview_doc.
    
    Authored-by: Allan Folting <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
    (cherry picked from commit 88d5c752829722b0b42f2c91fd57fb3e8fa17339)
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 docs/index.md                  | 14 +++++++-------
 docs/spark-connect-overview.md | 28 ++++++++++++++--------------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 4f24ad4edce..37b1311c306 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -21,7 +21,7 @@ license: |
 ---
 
 Apache Spark is a unified analytics engine for large-scale data processing.
-It provides high-level APIs in Java, Scala, Python and R,
+It provides high-level APIs in Java, Scala, Python, and R,
 and an optimized engine that supports general execution graphs.
 It also supports a rich set of higher-level tools including [Spark 
SQL](sql-programming-guide.html) for SQL and structured data processing, 
[pandas API on Spark](api/python/getting_started/quickstart_ps.html) for pandas 
workloads, [MLlib](ml-guide.html) for machine learning, 
[GraphX](graphx-programming-guide.html) for graph processing, and [Structured 
Streaming](structured-streaming-programming-guide.html) for incremental 
computation and stream processing.
 
@@ -39,17 +39,17 @@ source, visit [Building Spark](building-spark.html).
 
 Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it 
should run on any platform that runs a supported version of Java. This should 
include JVMs on x86_64 and ARM64. It's easy to run locally on one machine --- 
all you need is to have `java` installed on your system `PATH`, or the 
`JAVA_HOME` environment variable pointing to a Java installation.
 
-Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.
+Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+, and R 3.5+.
 Python 3.7 support is deprecated as of Spark 3.4.0.
 Java 8 prior to version 8u362 support is deprecated as of Spark 3.4.0.
 When using the Scala API, it is necessary for applications to use the same 
version of Scala that Spark was compiled for.
 For example, when using Scala 2.13, use Spark compiled for 2.13, and compile 
code/applications for Scala 2.13 as well.
 
-For Java 11, `-Dio.netty.tryReflectionSetAccessible=true` is required 
additionally for Apache Arrow library. This prevents 
`java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.(long, int) not available` when Apache Arrow uses 
Netty internally.
+For Java 11, setting `-Dio.netty.tryReflectionSetAccessible=true` is required 
for the Apache Arrow library. This prevents the 
`java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.(long, int) not available` error when Apache Arrow 
uses Netty internally.
 
 # Running the Examples and Shell
 
-Spark comes with several sample programs. Python, Scala, Java and R examples 
are in the
+Spark comes with several sample programs. Python, Scala, Java, and R examples 
are in the
 `examples/src/main` directory.
 
 To run Spark interactively in a Python interpreter, use
@@ -77,14 +77,14 @@ great way to learn the framework.
 The `--master` option specifies the
 [master URL for a distributed 
cluster](submitting-applications.html#master-urls), or `local` to run
 locally with one thread, or `local[N]` to run locally with N threads. You 
should start by using
-`local` for testing. For a full list of options, run Spark shell with the 
`--help` option.
+`local` for testing. For a full list of options, run the Spark shell with the 
`--help` option.
 
-Spark also provides an [R API](sparkr.html) since 1.4 (only DataFrame APIs are 
included).
+Since version 1.4, Spark has provided an [R API](sparkr.html) (only the 
DataFrame APIs are included).
 To run Spark interactively in an R interpreter, use `bin/sparkR`:
 
     ./bin/sparkR --master "local[2]"
 
-Example applications are also provided in R. For example,
+Example applications are also provided in R. For example:
 
     ./bin/spark-submit examples/src/main/r/dataframe.R
 
diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md
index e46fb9ad913..f942a884873 100644
--- a/docs/spark-connect-overview.md
+++ b/docs/spark-connect-overview.md
@@ -44,13 +44,13 @@ The Spark Connect client translates DataFrame operations 
into unresolved
 logical query plans which are encoded using protocol buffers. These are sent
 to the server using the gRPC framework.
 
-The Spark Connect endpoint embedded on the Spark Server, receives and
+The Spark Connect endpoint embedded on the Spark Server receives and
 translates unresolved logical plans into Spark's logical plan operators.
 This is similar to parsing a SQL query, where attributes and relations are
 parsed and an initial parse plan is built. From there, the standard Spark
 execution process kicks in, ensuring that Spark Connect leverages all of
 Spark's optimizations and enhancements. Results are streamed back to the
-client via gRPC as Apache Arrow-encoded row batches.
+client through gRPC as Apache Arrow-encoded row batches.
 
 <p style="text-align: center;">
   <img src="img/spark-connect-communication.png" title="Spark Connect 
communication" alt="Spark Connect communication" />
@@ -67,11 +67,11 @@ own dependencies on the client and don't need to worry 
about potential conflicts
 with the Spark driver.
 
 **Upgradability**: The Spark driver can now seamlessly be upgraded 
independently
-of applications, e.g. to benefit from performance improvements and security 
fixes.
+of applications, for example to benefit from performance improvements and 
security fixes.
 This means applications can be forward-compatible, as long as the server-side 
RPC
 definitions are designed to be backwards compatible.
 
-**Debuggability and Observability**: Spark Connect enables interactive 
debugging
+**Debuggability and observability**: Spark Connect enables interactive 
debugging
 during development directly from your favorite IDE. Similarly, applications can
 be monitored using the application's framework native metrics and logging 
libraries.
 
@@ -106,8 +106,8 @@ Spark Connect, like in this example:
 
 Note that we include a Spark Connect package (`spark-connect_2.12:3.4.0`), 
when starting
 Spark server. This is required to use Spark Connect. Make sure to use the same 
version
-of the package as the Spark version you downloaded above. In the example here, 
Spark 3.4.0
-with Scala 2.12.
+of the package as the Spark version you downloaded previously. In this example,
+Spark 3.4.0 with Scala 2.12.
 
 Now Spark server is running and ready to accept Spark Connect sessions from 
client
 applications. In the next section we will walk through how to use Spark Connect
@@ -116,7 +116,7 @@ when writing client applications.
 ## Use Spark Connect in client applications
 
 When creating a Spark session, you can specify that you want to use Spark 
Connect
-and there are a few ways to do that as outlined below.
+and there are a few ways to do that outlined as follows.
 
 If you do not use one of the mechanisms outlined here, your Spark session will
 work just like before, without leveraging Spark Connect, and your application 
code
@@ -125,12 +125,12 @@ will run on the Spark driver node.
 ### Set SPARK_REMOTE environment variable
 
 If you set the `SPARK_REMOTE` environment variable on the client machine where 
your
-Spark client application is running and create a new Spark Session as 
illustrated
-below, the session will be a Spark Connect session. With this approach, there 
is
-no code change needed to start using Spark Connect.
+Spark client application is running and create a new Spark Session as in the 
following
+example, the session will be a Spark Connect session. With this approach, 
there is no
+code change needed to start using Spark Connect.
 
 In a terminal window, set the `SPARK_REMOTE` environment variable to point to 
the
-local Spark server you started on your computer above:
+local Spark server you started previously on your computer:
 
 {% highlight bash %}
 export SPARK_REMOTE="sc://localhost"
@@ -164,8 +164,8 @@ spark = SparkSession.builder.getOrCreate()
 
 </div>
 
-Which will create a Spark Connect session from your application by reading the
-`SPARK_REMOTE` environment variable we set above.
+This will create a Spark Connect session from your application by reading the
+`SPARK_REMOTE` environment variable we set previously.
 
 ### Specify Spark Connect when creating Spark session
 
@@ -180,7 +180,7 @@ illustrated here.
 <div data-lang="python"  markdown="1">
 To launch the PySpark shell with Spark Connect, simply include the `remote`
 parameter and specify the location of your Spark server. We are using 
`localhost`
-in this example to connect to the local Spark server we started above.
+in this example to connect to the local Spark server we started previously.
 
 {% highlight bash %}
 ./bin/pyspark --remote "sc://localhost"


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.4 updated: [SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Connect content

Reply via email to