Repository: spark Updated Branches: refs/heads/branch-2.2 8b0cb3a7b -> 556ad019f
[DSTREAM][DOC] Add documentation for kinesis retry configurations ## What changes were proposed in this pull request? The changes were merged as part of - https://github.com/apache/spark/pull/17467. The documentation was missed somewhere in the review iterations. Adding the documentation where it belongs. ## How was this patch tested? Docs. Not tested. cc budde , brkyvz Author: Yash Sharma <ysha...@atlassian.com> Closes #18028 from yssharma/ysharma/kinesis_retry_docs. (cherry picked from commit 92580bd0eae5dbf739573093cca1b12fd0c14049) Signed-off-by: Burak Yavuz <brk...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/556ad019 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/556ad019 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/556ad019 Branch: refs/heads/branch-2.2 Commit: 556ad019fa49deb40ba8da3aa6067484ab3d6331 Parents: 8b0cb3a Author: Yash Sharma <ysha...@atlassian.com> Authored: Thu May 18 11:24:33 2017 -0700 Committer: Burak Yavuz <brk...@gmail.com> Committed: Thu May 18 11:24:44 2017 -0700 ---------------------------------------------------------------------- docs/streaming-kinesis-integration.md | 4 ++++ 1 file changed, 4 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/556ad019/docs/streaming-kinesis-integration.md ---------------------------------------------------------------------- diff --git a/docs/streaming-kinesis-integration.md b/docs/streaming-kinesis-integration.md index 6be0b54..9709bd3 100644 --- a/docs/streaming-kinesis-integration.md +++ b/docs/streaming-kinesis-integration.md @@ -216,3 +216,7 @@ de-aggregate records during consumption. - If no Kinesis checkpoint info exists when the input DStream starts, it will start either from the oldest record available (`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip (`InitialPositionInStream.LATEST`). This is configurable. - `InitialPositionInStream.LATEST` could lead to missed records if data is added to the stream while no input DStreams are running (and no checkpoint info is being stored). - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of records where the impact is dependent on checkpoint frequency and processing idempotency. + +#### Kinesis retry configuration + - `spark.streaming.kinesis.retry.waitTime` : Wait time between Kinesis retries as a duration string. When reading from Amazon Kinesis, users may hit `ProvisionedThroughputExceededException`'s, when consuming faster than 5 transactions/second or, exceeding the maximum read rate of 2 MB/second. This configuration can be tweaked to increase the sleep between fetches when a fetch fails to reduce these exceptions. Default is "100ms". + - `spark.streaming.kinesis.retry.maxAttempts` : Max number of retries for Kinesis fetches. This config can also be used to tackle the Kinesis `ProvisionedThroughputExceededException`'s in scenarios mentioned above. It can be increased to have more number of retries for Kinesis reads. Default is 3. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org