Brian ONeill created SPARK-14421:
------------------------------------
Summary: Kinesis deaggregation with PySpark
Key: SPARK-14421
URL: https://issues.apache.org/jira/browse/SPARK-14421
Project: Spark
Issue Type: Bug
Affects Versions: 1.6.1
Environment: PySpark w/ Kinesis word count example
Reporter: Brian ONeill
I'm creating this issue as a precaution...
We have some preliminary evidence that indicates that KPL de-aggregation for
Kinesis streams may not work in Spark 1.6.1. Using the PySpark Kinesis Word
Count example, we don't receive records when KPL is used to produce the data,
with aggregation turned on, using masterUrl = local[16].
At the same time, I noticed this thread:
https://forums.aws.amazon.com/message.jspa?messageID=707122
Following the instructions here:
http://brianoneill.blogspot.com/2016/03/pyspark-on-amazon-emr-w-kinesis.html
The example will sometimes work. When aggregation is disabled, it appears to
always work. I'm going to dig a bit deeper, but thought you might have some
pointers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]