Repository: spark Updated Branches: refs/heads/branch-2.2 576fd4c3a -> ab12848d6
[SPARK-21069][SS][DOCS] Add rate source to programming guide. ## What changes were proposed in this pull request? SPARK-20979 added a new structured streaming source: Rate source. This patch adds the corresponding documentation to programming guide. ## How was this patch tested? Tested by running jekyll locally. Author: Prashant Sharma <[email protected]> Author: Prashant Sharma <[email protected]> Closes #18562 from ScrapCodes/spark-21069/rate-source-docs. (cherry picked from commit d0bfc6733521709e453d643582df2bdd68f28de7) Signed-off-by: Shixiong Zhu <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ab12848d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ab12848d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ab12848d Branch: refs/heads/branch-2.2 Commit: ab12848d624f6b74d401e924255c0b4fcc535231 Parents: 576fd4c Author: Prashant Sharma <[email protected]> Authored: Fri Jul 7 23:33:12 2017 -0700 Committer: Shixiong Zhu <[email protected]> Committed: Fri Jul 7 23:33:20 2017 -0700 ---------------------------------------------------------------------- docs/structured-streaming-programming-guide.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/ab12848d/docs/structured-streaming-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3bc377c..8f64faa 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -499,6 +499,8 @@ There are a few built-in sources. - **Socket source (for testing)** - Reads UTF8 text data from a socket connection. The listening server socket is at the driver. Note that this should be used only for testing as this does not provide end-to-end fault-tolerance guarantees. + - **Rate source (for testing)** - Generates data at the specified number of rows per second, each output row contains a `timestamp` and `value`. Where `timestamp` is a `Timestamp` type containing the time of message dispatch, and `value` is of `Long` type containing the message count, starting from 0 as the first row. This source is intended for testing and benchmarking. + Some sources are not fault-tolerant because they do not guarantee that data can be replayed using checkpointed offsets after a failure. See the earlier section on [fault-tolerance semantics](#fault-tolerance-semantics). @@ -547,6 +549,19 @@ Here are the details of all the sources in Spark. <td></td> </tr> <tr> + <td><b>Rate Source</b></td> + <td> + <code>rowsPerSecond</code> (e.g. 100, default: 1): How many rows should be generated per second.<br/><br/> + <code>rampUpTime</code> (e.g. 5s, default: 0s): How long to ramp up before the generating speed becomes <code>rowsPerSecond</code>. Using finer granularities than seconds will be truncated to integer seconds. <br/><br/> + <code>numPartitions</code> (e.g. 10, default: Spark's default parallelism): The partition number for the generated rows. <br/><br/> + + The source will try its best to reach <code>rowsPerSecond</code>, but the query may be resource constrained, and <code>numPartitions</code> can be tweaked to help reach the desired speed. + </td> + <td>Yes</td> + <td></td> + </tr> + + <tr> <td><b>Kafka Source</b></td> <td> See the <a href="structured-streaming-kafka-integration.html">Kafka Integration Guide</a>. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
