Repository: spark Updated Branches: refs/heads/master 31c4fab3f -> de4228152
[MINOR][DOCS][WIP] Fix Typos ## What changes were proposed in this pull request? Fix Typos. ## How was this patch tested? NA Closes #23145 from kjmrknsn/docUpdate. Authored-by: Keiji Yoshida <kjmrk...@gmail.com> Signed-off-by: Sean Owen <sean.o...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/de422815 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/de422815 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/de422815 Branch: refs/heads/master Commit: de4228152771390b0c5ba15254e9c5b832095366 Parents: 31c4fab Author: Keiji Yoshida <kjmrk...@gmail.com> Authored: Thu Nov 29 10:39:00 2018 -0600 Committer: Sean Owen <sean.o...@databricks.com> Committed: Thu Nov 29 10:39:00 2018 -0600 ---------------------------------------------------------------------- docs/index.md | 4 +-- docs/rdd-programming-guide.md | 8 +++--- docs/running-on-mesos.md | 2 +- docs/sql-data-sources-avro.md | 6 ++--- docs/sql-data-sources-hive-tables.md | 2 +- docs/sql-data-sources-jdbc.md | 2 +- docs/sql-data-sources-load-save-functions.md | 2 +- docs/sql-getting-started.md | 2 +- docs/sql-programming-guide.md | 2 +- docs/sql-pyspark-pandas-with-arrow.md | 2 +- docs/sql-reference.md | 6 ++--- docs/streaming-programming-guide.md | 2 +- docs/structured-streaming-programming-guide.md | 28 ++++++++++----------- 13 files changed, 34 insertions(+), 34 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/index.md ---------------------------------------------------------------------- diff --git a/docs/index.md b/docs/index.md index bd287e3..8864239 100644 --- a/docs/index.md +++ b/docs/index.md @@ -66,8 +66,8 @@ Example applications are also provided in Python. For example, ./bin/spark-submit examples/src/main/python/pi.py 10 -Spark also provides an experimental [R API](sparkr.html) since 1.4 (only DataFrames APIs included). -To run Spark interactively in a R interpreter, use `bin/sparkR`: +Spark also provides an [R API](sparkr.html) since 1.4 (only DataFrames APIs included). +To run Spark interactively in an R interpreter, use `bin/sparkR`: ./bin/sparkR --master local[2] http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/rdd-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md index 9a07d6c..2d1ddae 100644 --- a/docs/rdd-programming-guide.md +++ b/docs/rdd-programming-guide.md @@ -332,7 +332,7 @@ One important parameter for parallel collections is the number of *partitions* t Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop [InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html). -Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation: +Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes a URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation: {% highlight scala %} scala> val distFile = sc.textFile("data.txt") @@ -365,7 +365,7 @@ Apart from text files, Spark's Scala API also supports several other data format Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop [InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html). -Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation: +Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes a URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation: {% highlight java %} JavaRDD<String> distFile = sc.textFile("data.txt"); @@ -397,7 +397,7 @@ Apart from text files, Spark's Java API also supports several other data formats PySpark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, [SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html), and any other Hadoop [InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html). -Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation: +Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes a URI for the file (either a local path on the machine, or a `hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an example invocation: {% highlight python %} >>> distFile = sc.textFile("data.txt") @@ -1122,7 +1122,7 @@ costly operation. #### Background -To understand what happens during the shuffle we can consider the example of the +To understand what happens during the shuffle, we can consider the example of the [`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all values for a single key are combined into a tuple - the key and the result of executing a reduce function against all values associated with that key. The challenge is that not all values for a http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/running-on-mesos.md ---------------------------------------------------------------------- diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index 2502cd4..b3ba4b2 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -687,7 +687,7 @@ See the [configuration page](configuration.html) for information on Spark config <td><code>0</code></td> <td> Set the maximum number GPU resources to acquire for this job. Note that executors will still launch when no GPU resources are found - since this configuration is just a upper limit and not a guaranteed amount. + since this configuration is just an upper limit and not a guaranteed amount. </td> </tr> <tr> http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-avro.md ---------------------------------------------------------------------- diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index bfe641d..b403a66 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -66,9 +66,9 @@ write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", "avro") ## to_avro() and from_avro() The Avro package provides function `to_avro` to encode a column as binary in Avro format, and `from_avro()` to decode Avro binary data into a column. Both functions transform one column to -another column, and the input/output SQL data type can be complex type or primitive type. +another column, and the input/output SQL data type can be a complex type or a primitive type. -Using Avro record as columns are useful when reading from or writing to a streaming source like Kafka. Each +Using Avro record as columns is useful when reading from or writing to a streaming source like Kafka. Each Kafka key-value record will be augmented with some metadata, such as the ingestion timestamp into Kafka, the offset in Kafka, etc. * If the "value" field that contains your data is in Avro, you could use `from_avro()` to extract your data, enrich it, clean it, and then push it downstream to Kafka again or write it out to a file. * `to_avro()` can be used to turn structs into Avro records. This method is particularly useful when you would like to re-encode multiple columns into a single one when writing data out to Kafka. @@ -151,7 +151,7 @@ Data source options of Avro can be set via: <tr> <td><code>avroSchema</code></td> <td>None</td> - <td>Optional Avro schema provided by an user in JSON format. The date type and naming of record fields + <td>Optional Avro schema provided by a user in JSON format. The date type and naming of record fields should match the input Avro data or Catalyst data, otherwise the read/write action will fail.</td> <td>read and write</td> </tr> http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-hive-tables.md ---------------------------------------------------------------------- diff --git a/docs/sql-data-sources-hive-tables.md b/docs/sql-data-sources-hive-tables.md index 28e1a39..3b39a32 100644 --- a/docs/sql-data-sources-hive-tables.md +++ b/docs/sql-data-sources-hive-tables.md @@ -74,7 +74,7 @@ creating table, you can create a table using storage handler at Hive side, and u <td><code>inputFormat, outputFormat</code></td> <td> These 2 options specify the name of a corresponding `InputFormat` and `OutputFormat` class as a string literal, - e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options must be appeared in pair, and you can not + e.g. `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat`. These 2 options must be appeared in a pair, and you can not specify them if you already specified the `fileFormat` option. </td> </tr> http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-jdbc.md ---------------------------------------------------------------------- diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md index 057e821..9a5d0fc 100644 --- a/docs/sql-data-sources-jdbc.md +++ b/docs/sql-data-sources-jdbc.md @@ -55,7 +55,7 @@ the following case-insensitive options: as a subquery in the <code>FROM</code> clause. Spark will also assign an alias to the subquery clause. As an example, spark will issue a query of the following form to the JDBC Source.<br><br> <code> SELECT <columns> FROM (<user_specified_query>) spark_gen_alias</code><br><br> - Below are couple of restrictions while using this option.<br> + Below are a couple of restrictions while using this option.<br> <ol> <li> It is not allowed to specify `dbtable` and `query` options at the same time. </li> <li> It is not allowed to specify `query` and `partitionColumn` options at the same time. When specifying http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-data-sources-load-save-functions.md ---------------------------------------------------------------------- diff --git a/docs/sql-data-sources-load-save-functions.md b/docs/sql-data-sources-load-save-functions.md index e4c7b17..4386cae 100644 --- a/docs/sql-data-sources-load-save-functions.md +++ b/docs/sql-data-sources-load-save-functions.md @@ -324,4 +324,4 @@ CLUSTERED BY(name) SORTED BY (favorite_numbers) INTO 42 BUCKETS; `partitionBy` creates a directory structure as described in the [Partition Discovery](sql-data-sources-parquet.html#partition-discovery) section. Thus, it has limited applicability to columns with high cardinality. In contrast `bucketBy` distributes -data across a fixed number of buckets and can be used when a number of unique values is unbounded. +data across a fixed number of buckets and can be used when the number of unique values is unbounded. http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-getting-started.md ---------------------------------------------------------------------- diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md index 8851220..0c3f0fb 100644 --- a/docs/sql-getting-started.md +++ b/docs/sql-getting-started.md @@ -99,7 +99,7 @@ Here we include some basic examples of structured data processing using Datasets <div data-lang="scala" markdown="1"> {% include_example untyped_ops scala/org/apache/spark/examples/sql/SparkSQLExample.scala %} -For a complete list of the types of operations that can be performed on a Dataset refer to the [API Documentation](api/scala/index.html#org.apache.spark.sql.Dataset). +For a complete list of the types of operations that can be performed on a Dataset, refer to the [API Documentation](api/scala/index.html#org.apache.spark.sql.Dataset). In addition to simple column references and expressions, Datasets also have a rich library of functions including string manipulation, date arithmetic, common math operations and more. The complete list is available in the [DataFrame Function Reference](api/scala/index.html#org.apache.spark.sql.functions$). </div> http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index eca8915..9c85a15 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -7,7 +7,7 @@ title: Spark SQL and DataFrames Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to -interact with Spark SQL including SQL and the Dataset API. When computing a result +interact with Spark SQL including SQL and the Dataset API. When computing a result, the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation. http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-pyspark-pandas-with-arrow.md ---------------------------------------------------------------------- diff --git a/docs/sql-pyspark-pandas-with-arrow.md b/docs/sql-pyspark-pandas-with-arrow.md index d04b955..d18ca0b 100644 --- a/docs/sql-pyspark-pandas-with-arrow.md +++ b/docs/sql-pyspark-pandas-with-arrow.md @@ -129,7 +129,7 @@ For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/p Currently, all Spark SQL data types are supported by Arrow-based conversion except `MapType`, `ArrayType` of `TimestampType`, and nested `StructType`. `BinaryType` is supported only when -installed PyArrow is equal to or higher then 0.10.0. +installed PyArrow is equal to or higher than 0.10.0. ### Setting Arrow Batch Size http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/sql-reference.md ---------------------------------------------------------------------- diff --git a/docs/sql-reference.md b/docs/sql-reference.md index 9e4239b..88d0596 100644 --- a/docs/sql-reference.md +++ b/docs/sql-reference.md @@ -38,15 +38,15 @@ Spark SQL and DataFrames support the following data types: elements with the type of `elementType`. `containsNull` is used to indicate if elements in a `ArrayType` value can have `null` values. - `MapType(keyType, valueType, valueContainsNull)`: - Represents values comprising a set of key-value pairs. The data type of keys are - described by `keyType` and the data type of values are described by `valueType`. + Represents values comprising a set of key-value pairs. The data type of keys is + described by `keyType` and the data type of values is described by `valueType`. For a `MapType` value, keys are not allowed to have `null` values. `valueContainsNull` is used to indicate if values of a `MapType` value can have `null` values. - `StructType(fields)`: Represents values with the structure described by a sequence of `StructField`s (`fields`). * `StructField(name, dataType, nullable)`: Represents a field in a `StructType`. The name of a field is indicated by `name`. The data type of a field is indicated - by `dataType`. `nullable` is used to indicate if values of this fields can have + by `dataType`. `nullable` is used to indicate if values of these fields can have `null` values. <div class="codetabs"> http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/streaming-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 70bee50..94c6120 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -733,7 +733,7 @@ for Java, and [StreamingContext](api/python/pyspark.streaming.html#pyspark.strea <span class="badge" style="background-color: grey">Python API</span> As of Spark {{site.SPARK_VERSION_SHORT}}, out of these sources, Kafka and Kinesis are available in the Python API. -This category of sources require interfacing with external non-Spark libraries, some of them with +This category of sources requires interfacing with external non-Spark libraries, some of them with complex dependencies (e.g., Kafka). Hence, to minimize issues related to version conflicts of dependencies, the functionality to create DStreams from these sources has been moved to separate libraries that can be [linked](#linking) to explicitly when necessary. http://git-wip-us.apache.org/repos/asf/spark/blob/de422815/docs/structured-streaming-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 8cea98c..32d61dc 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1493,7 +1493,7 @@ Additional details on supported joins: ### Streaming Deduplication You can deduplicate records in data streams using a unique identifier in the events. This is exactly same as deduplication on static using a unique identifier column. The query will store the necessary amount of data from previous records such that it can filter duplicate records. Similar to aggregations, you can use deduplication with or without watermarking. -- *With watermark* - If there is a upper bound on how late a duplicate record may arrive, then you can define a watermark on a event time column and deduplicate using both the guid and the event time columns. The query will use the watermark to remove old state data from past records that are not expected to get any duplicates any more. This bounds the amount of the state the query has to maintain. +- *With watermark* - If there is an upper bound on how late a duplicate record may arrive, then you can define a watermark on an event time column and deduplicate using both the guid and the event time columns. The query will use the watermark to remove old state data from past records that are not expected to get any duplicates any more. This bounds the amount of the state the query has to maintain. - *Without watermark* - Since there are no bounds on when a duplicate record may arrive, the query stores the data from all the past records as state. @@ -1577,7 +1577,7 @@ event time seen in each input stream, calculates watermarks based on the corresp and chooses a single global watermark with them to be used for stateful operations. By default, the minimum is chosen as the global watermark because it ensures that no data is accidentally dropped as too late if one of the streams falls behind the others -(for example, one of the streams stop receiving data due to upstream failures). In other words, +(for example, one of the streams stops receiving data due to upstream failures). In other words, the global watermark will safely move at the pace of the slowest stream and the query output will be delayed accordingly. @@ -1598,7 +1598,7 @@ Some of them are as follows. - Multiple streaming aggregations (i.e. a chain of aggregations on a streaming DF) are not yet supported on streaming Datasets. -- Limit and take first N rows are not supported on streaming Datasets. +- Limit and take the first N rows are not supported on streaming Datasets. - Distinct operations on streaming Datasets are not supported. @@ -1634,7 +1634,7 @@ returned through `Dataset.writeStream()`. You will have to specify one or more o - *Query name:* Optionally, specify a unique name of the query for identification. -- *Trigger interval:* Optionally, specify the trigger interval. If it is not specified, the system will check for availability of new data as soon as the previous processing has completed. If a trigger time is missed because the previous processing has not completed, then the system will trigger processing immediately. +- *Trigger interval:* Optionally, specify the trigger interval. If it is not specified, the system will check for availability of new data as soon as the previous processing has been completed. If a trigger time is missed because the previous processing has not been completed, then the system will trigger processing immediately. - *Checkpoint location:* For some output sinks where the end-to-end fault-tolerance can be guaranteed, specify the location where the system will write all the checkpoint information. This should be a directory in an HDFS-compatible fault-tolerant file system. The semantics of checkpointing is discussed in more detail in the next section. @@ -2106,7 +2106,7 @@ With `foreachBatch`, you can do the following. ###### Foreach If `foreachBatch` is not an option (for example, corresponding batch data writer does not exist, or -continuous processing mode), then you can express you custom writer logic using `foreach`. +continuous processing mode), then you can express your custom writer logic using `foreach`. Specifically, you can express the data writing logic by dividing it into three methods: `open`, `process`, and `close`. Since Spark 2.4, `foreach` is available in Scala, Java and Python. @@ -2236,8 +2236,8 @@ When the streaming query is started, Spark calls the function or the objectâs in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. #### Triggers -The trigger settings of a streaming query defines the timing of streaming data processing, whether -the query is going to executed as micro-batch query with a fixed batch interval or as a continuous processing query. +The trigger settings of a streaming query define the timing of streaming data processing, whether +the query is going to be executed as micro-batch query with a fixed batch interval or as a continuous processing query. Here are the different kinds of triggers that are supported. <table class="table"> @@ -2960,7 +2960,7 @@ the effect of the change is not well-defined. For all of them: - Addition/deletion/modification of rate limits is allowed: `spark.readStream.format("kafka").option("subscribe", "topic")` to `spark.readStream.format("kafka").option("subscribe", "topic").option("maxOffsetsPerTrigger", ...)` - - Changes to subscribed topics/files is generally not allowed as the results are unpredictable: `spark.readStream.format("kafka").option("subscribe", "topic")` to `spark.readStream.format("kafka").option("subscribe", "newTopic")` + - Changes to subscribed topics/files are generally not allowed as the results are unpredictable: `spark.readStream.format("kafka").option("subscribe", "topic")` to `spark.readStream.format("kafka").option("subscribe", "newTopic")` - *Changes in the type of output sink*: Changes between a few specific combinations of sinks are allowed. This needs to be verified on a case-by-case basis. Here are a few examples. @@ -2974,17 +2974,17 @@ the effect of the change is not well-defined. For all of them: - *Changes in the parameters of output sink*: Whether this is allowed and whether the semantics of the change are well-defined depends on the sink and the query. Here are a few examples. - - Changes to output directory of a file sink is not allowed: `sdf.writeStream.format("parquet").option("path", "/somePath")` to `sdf.writeStream.format("parquet").option("path", "/anotherPath")` + - Changes to output directory of a file sink are not allowed: `sdf.writeStream.format("parquet").option("path", "/somePath")` to `sdf.writeStream.format("parquet").option("path", "/anotherPath")` - - Changes to output topic is allowed: `sdf.writeStream.format("kafka").option("topic", "someTopic")` to `sdf.writeStream.format("kafka").option("topic", "anotherTopic")` + - Changes to output topic are allowed: `sdf.writeStream.format("kafka").option("topic", "someTopic")` to `sdf.writeStream.format("kafka").option("topic", "anotherTopic")` - - Changes to the user-defined foreach sink (that is, the `ForeachWriter` code) is allowed, but the semantics of the change depends on the code. + - Changes to the user-defined foreach sink (that is, the `ForeachWriter` code) are allowed, but the semantics of the change depends on the code. - *Changes in projection / filter / map-like operations**: Some cases are allowed. For example: - Addition / deletion of filters is allowed: `sdf.selectExpr("a")` to `sdf.where(...).selectExpr("a").filter(...)`. - - Changes in projections with same output schema is allowed: `sdf.selectExpr("stringColumn AS json").writeStream` to `sdf.selectExpr("anotherStringColumn AS json").writeStream` + - Changes in projections with same output schema are allowed: `sdf.selectExpr("stringColumn AS json").writeStream` to `sdf.selectExpr("anotherStringColumn AS json").writeStream` - Changes in projections with different output schema are conditionally allowed: `sdf.selectExpr("a").writeStream` to `sdf.selectExpr("b").writeStream` is allowed only if the output sink allows the schema change from `"a"` to `"b"`. @@ -3000,7 +3000,7 @@ the effect of the change is not well-defined. For all of them: - *Streaming deduplication*: For example, `sdf.dropDuplicates("a")`. Any change in number or type of grouping keys or aggregates is not allowed. - *Stream-stream join*: For example, `sdf1.join(sdf2, ...)` (i.e. both inputs are generated with `sparkSession.readStream`). Changes - in the schema or equi-joining columns are not allowed. Changes in join type (outer or inner) not allowed. Other changes in the join condition are ill-defined. + in the schema or equi-joining columns are not allowed. Changes in join type (outer or inner) are not allowed. Other changes in the join condition are ill-defined. - *Arbitrary stateful operation*: For example, `sdf.groupByKey(...).mapGroupsWithState(...)` or `sdf.groupByKey(...).flatMapGroupsWithState(...)`. Any change to the schema of the user-defined state and the type of timeout is not allowed. @@ -3083,7 +3083,7 @@ spark \ </div> </div> -A checkpoint interval of 1 second means that the continuous processing engine will records the progress of the query every second. The resulting checkpoints are in a format compatible with the micro-batch engine, hence any query can be restarted with any trigger. For example, a supported query started with the micro-batch mode can be restarted in continuous mode, and vice versa. Note that any time you switch to continuous mode, you will get at-least-once fault-tolerance guarantees. +A checkpoint interval of 1 second means that the continuous processing engine will record the progress of the query every second. The resulting checkpoints are in a format compatible with the micro-batch engine, hence any query can be restarted with any trigger. For example, a supported query started with the micro-batch mode can be restarted in continuous mode, and vice versa. Note that any time you switch to continuous mode, you will get at-least-once fault-tolerance guarantees. ## Supported Queries {:.no_toc} --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org