Repository: spark
Updated Branches:
  refs/heads/branch-2.0 02d584ccb -> 81d7f484a


[MINOR][STREAMING][DOCS] Minor changes on kinesis integration

## What changes were proposed in this pull request?

Some minor changes for documentation page "Spark Streaming + Kinesis 
Integration".

Moved "streaming-kinesis-arch.png" before the bullet list, not in between the 
bullets.

## How was this patch tested?

Tested manually, on my local machine.

Author: Xin Ren <iamsh...@126.com>

Closes #14097 from keypointt/kinesisDoc.

(cherry picked from commit 05d7151ccbccdd977ec2f2301d5b12566018c988)
Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/81d7f484
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/81d7f484
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/81d7f484

Branch: refs/heads/branch-2.0
Commit: 81d7f484ac3b68792e49e47c2b7c9994cf17487a
Parents: 02d584c
Author: Xin Ren <iamsh...@126.com>
Authored: Mon Jul 11 18:09:14 2016 -0700
Committer: Tathagata Das <tathagata.das1...@gmail.com>
Committed: Mon Jul 11 18:09:27 2016 -0700

----------------------------------------------------------------------
 docs/streaming-kinesis-integration.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/81d7f484/docs/streaming-kinesis-integration.md
----------------------------------------------------------------------
diff --git a/docs/streaming-kinesis-integration.md 
b/docs/streaming-kinesis-integration.md
index 5b9a755..96198dd 100644
--- a/docs/streaming-kinesis-integration.md
+++ b/docs/streaming-kinesis-integration.md
@@ -111,7 +111,7 @@ A Kinesis stream can be set up at one of the valid Kinesis 
endpoints with 1 or m
 
        - `[checkpoint interval]`: The interval (e.g., Duration(2000) = 2 
seconds) at which the Kinesis Client Library saves its position in the stream.  
For starters, set it to the same as the batch interval of the streaming 
application.
 
-       - `[initial position]`: Can be either 
`InitialPositionInStream.TRIM_HORIZON` or `InitialPositionInStream.LATEST` (see 
Kinesis Checkpointing section and Amazon Kinesis API documentation for more 
details).
+       - `[initial position]`: Can be either 
`InitialPositionInStream.TRIM_HORIZON` or `InitialPositionInStream.LATEST` (see 
[`Kinesis Checkpointing`](#kinesis-checkpointing) section and [`Amazon Kinesis 
API 
documentation`](http://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html)
 for more details).
 
        - `[message handler]`: A function that takes a Kinesis `Record` and 
outputs generic `T`.
 
@@ -128,14 +128,6 @@ A Kinesis stream can be set up at one of the valid Kinesis 
endpoints with 1 or m
        Alternatively, you can also download the JAR of the Maven artifact 
`spark-streaming-kinesis-asl-assembly` from the
        [Maven 
repository](http://search.maven.org/#search|ga|1|a%3A%22spark-streaming-kinesis-asl-assembly_{{site.SCALA_BINARY_VERSION}}%22%20AND%20v%3A%22{{site.SPARK_VERSION_SHORT}}%22)
 and add it to `spark-submit` with `--jars`.
 
-       *Points to remember at runtime:*
-
-       - Kinesis data processing is ordered per partition and occurs at-least 
once per message.
-
-       - Multiple applications can read from the same Kinesis stream.  Kinesis 
will maintain the application-specific shard and checkpoint info in DynamoDB.
-
-       - A single Kinesis stream shard is processed by one input DStream at a 
time.
-
        <p style="text-align: center;">
                <img src="img/streaming-kinesis-arch.png"
                        title="Spark Streaming Kinesis Architecture"
@@ -145,6 +137,14 @@ A Kinesis stream can be set up at one of the valid Kinesis 
endpoints with 1 or m
                <!-- Images are downsized intentionally to improve quality on 
retina displays -->
        </p>
 
+       *Points to remember at runtime:*
+
+       - Kinesis data processing is ordered per partition and occurs at-least 
once per message.
+
+       - Multiple applications can read from the same Kinesis stream.  Kinesis 
will maintain the application-specific shard and checkpoint info in DynamoDB.
+
+       - A single Kinesis stream shard is processed by one input DStream at a 
time.
+
        - A single Kinesis input DStream can read from multiple shards of a 
Kinesis stream by creating multiple KinesisRecordProcessor threads.
 
        - Multiple input DStreams running in separate processes/instances can 
read from a Kinesis stream.
@@ -173,7 +173,7 @@ To run the example,
 
 - Set up Kinesis stream (see earlier section) within AWS. Note the name of the 
Kinesis stream and the endpoint URL corresponding to the region where the 
stream was created.
 
-- Set up the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_KEY with 
your AWS credentials.
+- Set up the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_KEY` 
with your AWS credentials.
 
 - In the Spark root directory, run the example as
 
@@ -216,6 +216,6 @@ de-aggregate records during consumption.
 
 - Checkpointing too frequently will cause excess load on the AWS checkpoint 
storage layer and may lead to AWS throttling.  The provided example handles 
this throttling with a random-backoff-retry strategy.
 
-- If no Kinesis checkpoint info exists when the input DStream starts, it will 
start either from the oldest record available 
(InitialPositionInStream.TRIM_HORIZON) or from the latest tip 
(InitialPositionInStream.LATEST).  This is configurable.
-- InitialPositionInStream.LATEST could lead to missed records if data is added 
to the stream while no input DStreams are running (and no checkpoint info is 
being stored).
-- InitialPositionInStream.TRIM_HORIZON may lead to duplicate processing of 
records where the impact is dependent on checkpoint frequency and processing 
idempotency.
+- If no Kinesis checkpoint info exists when the input DStream starts, it will 
start either from the oldest record available 
(`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip 
(`InitialPositionInStream.LATEST`).  This is configurable.
+  - `InitialPositionInStream.LATEST` could lead to missed records if data is 
added to the stream while no input DStreams are running (and no checkpoint info 
is being stored).
+  - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of 
records where the impact is dependent on checkpoint frequency and processing 
idempotency.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to