[jira] [Created] (SPARK-14421) Kinesis deaggregation with PySpark

Brian ONeill (JIRA) Tue, 05 Apr 2016 18:17:48 -0700

Brian ONeill created SPARK-14421:
------------------------------------

             Summary: Kinesis deaggregation with PySpark
                 Key: SPARK-14421
                 URL: https://issues.apache.org/jira/browse/SPARK-14421
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.1
         Environment: PySpark w/ Kinesis word count example
            Reporter: Brian ONeill



I'm creating this issue as a precaution...

We have some preliminary evidence that indicates that KPL de-aggregation for 
Kinesis streams may not work in Spark 1.6.1.  Using the PySpark Kinesis Word 
Count example, we don't receive records when KPL is used to produce the data, 
with aggregation turned on, using masterUrl = local[16].

At the same time, I noticed this thread:
https://forums.aws.amazon.com/message.jspa?messageID=707122

Following the instructions here:
http://brianoneill.blogspot.com/2016/03/pyspark-on-amazon-emr-w-kinesis.html

The example will sometimes work.   When aggregation is disabled, it appears to 
always work.  I'm going to dig a bit deeper, but thought you might have some 
pointers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-14421) Kinesis deaggregation with PySpark

Reply via email to