spark git commit: [MINOR][DOCS][WIP] Fix Typos

srowen Thu, 29 Nov 2018 08:39:25 -0800

Repository: spark
Updated Branches:
  refs/heads/master 31c4fab3f -> de4228152



[MINOR][DOCS][WIP] Fix Typos

## What changes were proposed in this pull request?
Fix Typos.

## How was this patch tested?
NA

Closes #23145 from kjmrknsn/docUpdate.

Authored-by: Keiji Yoshida <kjmrk...@gmail.com>
Signed-off-by: Sean Owen <sean.o...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/de422815
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/de422815
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/de422815

Branch: refs/heads/master
Commit: de4228152771390b0c5ba15254e9c5b832095366
Parents: 31c4fab
Author: Keiji Yoshida <kjmrk...@gmail.com>
Authored: Thu Nov 29 10:39:00 2018 -0600
Committer: Sean Owen <sean.o...@databricks.com>
Committed: Thu Nov 29 10:39:00 2018 -0600

----------------------------------------------------------------------
 docs/index.md                                  |  4 +--
 docs/rdd-programming-guide.md                  |  8 +++---
 docs/running-on-mesos.md                       |  2 +-
 docs/sql-data-sources-avro.md                  |  6 ++---
 docs/sql-data-sources-hive-tables.md           |  2 +-
 docs/sql-data-sources-jdbc.md                  |  2 +-
 docs/sql-data-sources-load-save-functions.md   |  2 +-
 docs/sql-getting-started.md                    |  2 +-
 docs/sql-programming-guide.md                  |  2 +-
 docs/sql-pyspark-pandas-with-arrow.md          |  2 +-
 docs/sql-reference.md                          |  6 ++---
 docs/streaming-programming-guide.md            |  2 +-
 docs/structured-streaming-programming-guide.md | 28 ++++++++++-----------
 13 files changed, 34 insertions(+), 34 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index bd287e3..8864239 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -66,8 +66,8 @@ Example applications are also provided in Python. For example,
 
     ./bin/spark-submit examples/src/main/python/pi.py 10
 
-Spark also provides an experimental [R API](sparkr.html) since 1.4 (only 
DataFrames APIs included).
-To run Spark interactively in a R interpreter, use `bin/sparkR`:
+Spark also provides an [R API](sparkr.html) since 1.4 (only DataFrames APIs 
included).
+To run Spark interactively in an R interpreter, use `bin/sparkR`:
 
     ./bin/sparkR --master local[2]
 

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/rdd-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
index 9a07d6c..2d1ddae 100644
--- a/docs/rdd-programming-guide.md
+++ b/docs/rdd-programming-guide.md
@@ -332,7 +332,7 @@ One important parameter for parallel collections is the 
number of *partitions* t
 
 Spark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
 
-Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes an URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
+Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes a URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
 
 {% highlight scala %}
 scala> val distFile = sc.textFile("data.txt")
@@ -365,7 +365,7 @@ Apart from text files, Spark's Scala API also supports 
several other data format
 
 Spark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
 
-Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes an URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
+Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes a URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
 
 {% highlight java %}
 JavaRDD<String> distFile = sc.textFile("data.txt");
@@ -397,7 +397,7 @@ Apart from text files, Spark's Java API also supports 
several other data formats
 
 PySpark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
 
-Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes an URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
+Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes a URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
 
 {% highlight python %}
 >>> distFile = sc.textFile("data.txt")
@@ -1122,7 +1122,7 @@ costly operation.
 
 #### Background
 
-To understand what happens during the shuffle we can consider the example of 
the
+To understand what happens during the shuffle, we can consider the example of 
the
 [`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation 
generates a new RDD where all
 values for a single key are combined into a tuple - the key and the result of 
executing a reduce
 function against all values associated with that key. The challenge is that 
not all values for a

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/running-on-mesos.md
----------------------------------------------------------------------
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
index 2502cd4..b3ba4b2 100644
--- a/docs/running-on-mesos.md
+++ b/docs/running-on-mesos.md
@@ -687,7 +687,7 @@ See the [configuration page](configuration.html) for 
information on Spark config
   <td><code>0</code></td>
   <td>
     Set the maximum number GPU resources to acquire for this job. Note that 
executors will still launch when no GPU resources are found
-    since this configuration is just a upper limit and not a guaranteed amount.
+    since this configuration is just an upper limit and not a guaranteed 
amount.
   </td>
   </tr>
 <tr>

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-avro.md
----------------------------------------------------------------------
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index bfe641d..b403a66 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -66,9 +66,9 @@ write.df(select(df, "name", "favorite_color"), 
"namesAndFavColors.avro", "avro")
 ## to_avro() and from_avro()
 The Avro package provides function `to_avro` to encode a column as binary in 
Avro 
 format, and `from_avro()` to decode Avro binary data into a column. Both 
functions transform one column to 
-another column, and the input/output SQL data type can be complex type or 
primitive type.
+another column, and the input/output SQL data type can be a complex type or a 
primitive type.
 
-Using Avro record as columns are useful when reading from or writing to a 
streaming source like Kafka. Each 
+Using Avro record as columns is useful when reading from or writing to a 
streaming source like Kafka. Each
 Kafka key-value record will be augmented with some metadata, such as the 
ingestion timestamp into Kafka, the offset in Kafka, etc.
 * If the "value" field that contains your data is in Avro, you could use 
`from_avro()` to extract your data, enrich it, clean it, and then push it 
downstream to Kafka again or write it out to a file.
 * `to_avro()` can be used to turn structs into Avro records. This method is 
particularly useful when you would like to re-encode multiple columns into a 
single one when writing data out to Kafka.
@@ -151,7 +151,7 @@ Data source options of Avro can be set via:
   <tr>
     <td><code>avroSchema</code></td>
     <td>None</td>
-    <td>Optional Avro schema provided by an user in JSON format. The date type 
and naming of record fields
+    <td>Optional Avro schema provided by a user in JSON format. The date type 
and naming of record fields
     should match the input Avro data or Catalyst data, otherwise the 
read/write action will fail.</td>
     <td>read and write</td>
   </tr>

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-hive-tables.md
----------------------------------------------------------------------
diff --git a/docs/sql-data-sources-hive-tables.md 
b/docs/sql-data-sources-hive-tables.md
index 28e1a39..3b39a32 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -74,7 +74,7 @@ creating table, you can create a table using storage handler 
at Hive side, and u
     <td><code>inputFormat, outputFormat</code></td>
     <td>
       These 2 options specify the name of a corresponding `InputFormat` and 
`OutputFormat` class as a string literal,
-      e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options 
must be appeared in pair, and you can not
+      e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options 
must be appeared in a pair, and you can not
       specify them if you already specified the `fileFormat` option.
     </td>
   </tr>

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-jdbc.md
----------------------------------------------------------------------
diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 057e821..9a5d0fc 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -55,7 +55,7 @@ the following case-insensitive options:
       as a subquery in the <code>FROM</code> clause. Spark will also assign an 
alias to the subquery clause.
       As an example, spark will issue a query of the following form to the 
JDBC Source.<br><br>
       <code> SELECT &lt;columns&gt; FROM (&lt;user_specified_query&gt;) 
spark_gen_alias</code><br><br>
-      Below are couple of restrictions while using this option.<br>
+      Below are a couple of restrictions while using this option.<br>
       <ol>
          <li> It is not allowed to specify `dbtable` and `query` options at 
the same time. </li>
          <li> It is not allowed to specify `query` and `partitionColumn` 
options at the same time. When specifying

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-load-save-functions.md
----------------------------------------------------------------------
diff --git a/docs/sql-data-sources-load-save-functions.md 
b/docs/sql-data-sources-load-save-functions.md
index e4c7b17..4386cae 100644
--- a/docs/sql-data-sources-load-save-functions.md
+++ b/docs/sql-data-sources-load-save-functions.md
@@ -324,4 +324,4 @@ CLUSTERED BY(name) SORTED BY (favorite_numbers) INTO 42 
BUCKETS;
 `partitionBy` creates a directory structure as described in the [Partition 
Discovery](sql-data-sources-parquet.html#partition-discovery) section.
 Thus, it has limited applicability to columns with high cardinality. In 
contrast
  `bucketBy` distributes
-data across a fixed number of buckets and can be used when a number of unique 
values is unbounded.
+data across a fixed number of buckets and can be used when the number of 
unique values is unbounded.

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-getting-started.md
----------------------------------------------------------------------
diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
index 8851220..0c3f0fb 100644
--- a/docs/sql-getting-started.md
+++ b/docs/sql-getting-started.md
@@ -99,7 +99,7 @@ Here we include some basic examples of structured data 
processing using Datasets
 <div data-lang="scala"  markdown="1">
 {% include_example untyped_ops 
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
 
-For a complete list of the types of operations that can be performed on a 
Dataset refer to the [API 
Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).
+For a complete list of the types of operations that can be performed on a 
Dataset, refer to the [API 
Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).
 
 In addition to simple column references and expressions, Datasets also have a 
rich library of functions including string manipulation, date arithmetic, 
common math operations and more. The complete list is available in the 
[DataFrame Function 
Reference](api/scala/index.html#org.apache.spark.sql.functions$).
 </div>

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index eca8915..9c85a15 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -7,7 +7,7 @@ title: Spark SQL and DataFrames
 Spark SQL is a Spark module for structured data processing. Unlike the basic 
Spark RDD API, the interfaces provided
 by Spark SQL provide Spark with more information about the structure of both 
the data and the computation being performed. Internally,
 Spark SQL uses this extra information to perform extra optimizations. There 
are several ways to
-interact with Spark SQL including SQL and the Dataset API. When computing a 
result
+interact with Spark SQL including SQL and the Dataset API. When computing a 
result,
 the same execution engine is used, independent of which API/language you are 
using to express the
 computation. This unification means that developers can easily switch back and 
forth between
 different APIs based on which provides the most natural way to express a given 
transformation.

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-pyspark-pandas-with-arrow.md
----------------------------------------------------------------------
diff --git a/docs/sql-pyspark-pandas-with-arrow.md 
b/docs/sql-pyspark-pandas-with-arrow.md
index d04b955..d18ca0b 100644
--- a/docs/sql-pyspark-pandas-with-arrow.md
+++ b/docs/sql-pyspark-pandas-with-arrow.md
@@ -129,7 +129,7 @@ For detailed usage, please see 
[`pyspark.sql.functions.pandas_udf`](api/python/p
 
 Currently, all Spark SQL data types are supported by Arrow-based conversion 
except `MapType`,
 `ArrayType` of `TimestampType`, and nested `StructType`. `BinaryType` is 
supported only when
-installed PyArrow is equal to or higher then 0.10.0.
+installed PyArrow is equal to or higher than 0.10.0.
 
 ### Setting Arrow Batch Size
 

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-reference.md
----------------------------------------------------------------------
diff --git a/docs/sql-reference.md b/docs/sql-reference.md
index 9e4239b..88d0596 100644
--- a/docs/sql-reference.md
+++ b/docs/sql-reference.md
@@ -38,15 +38,15 @@ Spark SQL and DataFrames support the following data types:
   elements with the type of `elementType`. `containsNull` is used to indicate 
if
   elements in a `ArrayType` value can have `null` values.
   - `MapType(keyType, valueType, valueContainsNull)`:
-  Represents values comprising a set of key-value pairs. The data type of keys 
are
-  described by `keyType` and the data type of values are described by 
`valueType`.
+  Represents values comprising a set of key-value pairs. The data type of keys 
is
+  described by `keyType` and the data type of values is described by 
`valueType`.
   For a `MapType` value, keys are not allowed to have `null` values. 
`valueContainsNull`
   is used to indicate if values of a `MapType` value can have `null` values.
   - `StructType(fields)`: Represents values with the structure described by
   a sequence of `StructField`s (`fields`).
     * `StructField(name, dataType, nullable)`: Represents a field in a 
`StructType`.
     The name of a field is indicated by `name`. The data type of a field is 
indicated
-    by `dataType`. `nullable` is used to indicate if values of this fields can 
have
+    by `dataType`. `nullable` is used to indicate if values of these fields 
can have
     `null` values.
 
 <div class="codetabs">

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 70bee50..94c6120 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -733,7 +733,7 @@ for Java, and 
[StreamingContext](api/python/pyspark.streaming.html#pyspark.strea
 <span class="badge" style="background-color: grey">Python API</span> As of 
Spark {{site.SPARK_VERSION_SHORT}},
 out of these sources, Kafka and Kinesis are available in the Python API.
 
-This category of sources require interfacing with external non-Spark 
libraries, some of them with
+This category of sources requires interfacing with external non-Spark 
libraries, some of them with
 complex dependencies (e.g., Kafka). Hence, to minimize issues related to 
version conflicts
 of dependencies, the functionality to create DStreams from these sources has 
been moved to separate
 libraries that can be [linked](#linking) to explicitly when necessary.

http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/structured-streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 8cea98c..32d61dc 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1493,7 +1493,7 @@ Additional details on supported joins:
 ### Streaming Deduplication
 You can deduplicate records in data streams using a unique identifier in the 
events. This is exactly same as deduplication on static using a unique 
identifier column. The query will store the necessary amount of data from 
previous records such that it can filter duplicate records. Similar to 
aggregations, you can use deduplication with or without watermarking.
 
-- *With watermark* - If there is a upper bound on how late a duplicate record 
may arrive, then you can define a watermark on a event time column and 
deduplicate using both the guid and the event time columns. The query will use 
the watermark to remove old state data from past records that are not expected 
to get any duplicates any more. This bounds the amount of the state the query 
has to maintain.
+- *With watermark* - If there is an upper bound on how late a duplicate record 
may arrive, then you can define a watermark on an event time column and 
deduplicate using both the guid and the event time columns. The query will use 
the watermark to remove old state data from past records that are not expected 
to get any duplicates any more. This bounds the amount of the state the query 
has to maintain.
 
 - *Without watermark* - Since there are no bounds on when a duplicate record 
may arrive, the query stores the data from all the past records as state.
 
@@ -1577,7 +1577,7 @@ event time seen in each input stream, calculates 
watermarks based on the corresp
 and chooses a single global watermark with them to be used for stateful 
operations. By default,
 the minimum is chosen as the global watermark because it ensures that no data 
is
 accidentally dropped as too late if one of the streams falls behind the others
-(for example, one of the streams stop receiving data due to upstream 
failures). In other words,
+(for example, one of the streams stops receiving data due to upstream 
failures). In other words,
 the global watermark will safely move at the pace of the slowest stream and 
the query output will
 be delayed accordingly.
 
@@ -1598,7 +1598,7 @@ Some of them are as follows.
  
 - Multiple streaming aggregations (i.e. a chain of aggregations on a streaming 
DF) are not yet supported on streaming Datasets.
 
-- Limit and take first N rows are not supported on streaming Datasets.
+- Limit and take the first N rows are not supported on streaming Datasets.
 
 - Distinct operations on streaming Datasets are not supported.
 
@@ -1634,7 +1634,7 @@ returned through `Dataset.writeStream()`. You will have 
to specify one or more o
 
 - *Query name:* Optionally, specify a unique name of the query for 
identification.
 
-- *Trigger interval:* Optionally, specify the trigger interval. If it is not 
specified, the system will check for availability of new data as soon as the 
previous processing has completed. If a trigger time is missed because the 
previous processing has not completed, then the system will trigger processing 
immediately.
+- *Trigger interval:* Optionally, specify the trigger interval. If it is not 
specified, the system will check for availability of new data as soon as the 
previous processing has been completed. If a trigger time is missed because the 
previous processing has not been completed, then the system will trigger 
processing immediately.
 
 - *Checkpoint location:* For some output sinks where the end-to-end 
fault-tolerance can be guaranteed, specify the location where the system will 
write all the checkpoint information. This should be a directory in an 
HDFS-compatible fault-tolerant file system. The semantics of checkpointing is 
discussed in more detail in the next section.
 
@@ -2106,7 +2106,7 @@ With `foreachBatch`, you can do the following.
 
 ###### Foreach
 If `foreachBatch` is not an option (for example, corresponding batch data 
writer does not exist, or 
-continuous processing mode), then you can express you custom writer logic 
using `foreach`. 
+continuous processing mode), then you can express your custom writer logic 
using `foreach`. 
 Specifically, you can express the data writing logic by dividing it into three 
methods: `open`, `process`, and `close`.
 Since Spark 2.4, `foreach` is available in Scala, Java and Python. 
 
@@ -2236,8 +2236,8 @@ When the streaming query is started, Spark calls the 
function or the objectâs
   in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
 
 #### Triggers
-The trigger settings of a streaming query defines the timing of streaming data 
processing, whether
-the query is going to executed as micro-batch query with a fixed batch 
interval or as a continuous processing query.
+The trigger settings of a streaming query define the timing of streaming data 
processing, whether
+the query is going to be executed as micro-batch query with a fixed batch 
interval or as a continuous processing query.
 Here are the different kinds of triggers that are supported.
 
 <table class="table">
@@ -2960,7 +2960,7 @@ the effect of the change is not well-defined. For all of 
them:
 
   - Addition/deletion/modification of rate limits is allowed: 
`spark.readStream.format("kafka").option("subscribe", "topic")` to 
`spark.readStream.format("kafka").option("subscribe", 
"topic").option("maxOffsetsPerTrigger", ...)`
 
-  - Changes to subscribed topics/files is generally not allowed as the results 
are unpredictable: `spark.readStream.format("kafka").option("subscribe", 
"topic")` to `spark.readStream.format("kafka").option("subscribe", "newTopic")`
+  - Changes to subscribed topics/files are generally not allowed as the 
results are unpredictable: 
`spark.readStream.format("kafka").option("subscribe", "topic")` to 
`spark.readStream.format("kafka").option("subscribe", "newTopic")`
 
 - *Changes in the type of output sink*: Changes between a few specific 
combinations of sinks 
   are allowed. This needs to be verified on a case-by-case basis. Here are a 
few examples.
@@ -2974,17 +2974,17 @@ the effect of the change is not well-defined. For all 
of them:
 - *Changes in the parameters of output sink*: Whether this is allowed and 
whether the semantics of 
   the change are well-defined depends on the sink and the query. Here are a 
few examples.
 
-  - Changes to output directory of a file sink is not allowed: 
`sdf.writeStream.format("parquet").option("path", "/somePath")` to 
`sdf.writeStream.format("parquet").option("path", "/anotherPath")`
+  - Changes to output directory of a file sink are not allowed: 
`sdf.writeStream.format("parquet").option("path", "/somePath")` to 
`sdf.writeStream.format("parquet").option("path", "/anotherPath")`
 
-  - Changes to output topic is allowed: 
`sdf.writeStream.format("kafka").option("topic", "someTopic")` to 
`sdf.writeStream.format("kafka").option("topic", "anotherTopic")`
+  - Changes to output topic are allowed: 
`sdf.writeStream.format("kafka").option("topic", "someTopic")` to 
`sdf.writeStream.format("kafka").option("topic", "anotherTopic")`
 
-  - Changes to the user-defined foreach sink (that is, the `ForeachWriter` 
code) is allowed, but the semantics of the change depends on the code.
+  - Changes to the user-defined foreach sink (that is, the `ForeachWriter` 
code) are allowed, but the semantics of the change depends on the code.
 
 - *Changes in projection / filter / map-like operations**: Some cases are 
allowed. For example:
 
   - Addition / deletion of filters is allowed: `sdf.selectExpr("a")` to 
`sdf.where(...).selectExpr("a").filter(...)`.
 
-  - Changes in projections with same output schema is allowed: 
`sdf.selectExpr("stringColumn AS json").writeStream` to 
`sdf.selectExpr("anotherStringColumn AS json").writeStream`
+  - Changes in projections with same output schema are allowed: 
`sdf.selectExpr("stringColumn AS json").writeStream` to 
`sdf.selectExpr("anotherStringColumn AS json").writeStream`
 
   - Changes in projections with different output schema are conditionally 
allowed: `sdf.selectExpr("a").writeStream` to `sdf.selectExpr("b").writeStream` 
is allowed only if the output sink allows the schema change from `"a"` to `"b"`.
 
@@ -3000,7 +3000,7 @@ the effect of the change is not well-defined. For all of 
them:
   - *Streaming deduplication*: For example, `sdf.dropDuplicates("a")`. Any 
change in number or type of grouping keys or aggregates is not allowed.
 
   - *Stream-stream join*: For example, `sdf1.join(sdf2, ...)` (i.e. both 
inputs are generated with `sparkSession.readStream`). Changes
-    in the schema or equi-joining columns are not allowed. Changes in join 
type (outer or inner) not allowed. Other changes in the join condition are 
ill-defined.
+    in the schema or equi-joining columns are not allowed. Changes in join 
type (outer or inner) are not allowed. Other changes in the join condition are 
ill-defined.
 
   - *Arbitrary stateful operation*: For example, 
`sdf.groupByKey(...).mapGroupsWithState(...)` or 
`sdf.groupByKey(...).flatMapGroupsWithState(...)`.
     Any change to the schema of the user-defined state and the type of timeout 
is not allowed.
@@ -3083,7 +3083,7 @@ spark \
 </div>
 </div>
 
-A checkpoint interval of 1 second means that the continuous processing engine 
will records the progress of the query every second. The resulting checkpoints 
are in a format compatible with the micro-batch engine, hence any query can be 
restarted with any trigger. For example, a supported query started with the 
micro-batch mode can be restarted in continuous mode, and vice versa. Note that 
any time you switch to continuous mode, you will get at-least-once 
fault-tolerance guarantees.
+A checkpoint interval of 1 second means that the continuous processing engine 
will record the progress of the query every second. The resulting checkpoints 
are in a format compatible with the micro-batch engine, hence any query can be 
restarted with any trigger. For example, a supported query started with the 
micro-batch mode can be restarted in continuous mode, and vice versa. Note that 
any time you switch to continuous mode, you will get at-least-once 
fault-tolerance guarantees.
 
 ## Supported Queries
 {:.no_toc}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOCS][WIP] Fix Typos

Reply via email to