Brian ONeill created SPARK-14421: ------------------------------------ Summary: Kinesis deaggregation with PySpark Key: SPARK-14421 URL: https://issues.apache.org/jira/browse/SPARK-14421 Project: Spark Issue Type: Bug Affects Versions: 1.6.1 Environment: PySpark w/ Kinesis word count example Reporter: Brian ONeill
I'm creating this issue as a precaution... We have some preliminary evidence that indicates that KPL de-aggregation for Kinesis streams may not work in Spark 1.6.1. Using the PySpark Kinesis Word Count example, we don't receive records when KPL is used to produce the data, with aggregation turned on, using masterUrl = local[16]. At the same time, I noticed this thread: https://forums.aws.amazon.com/message.jspa?messageID=707122 Following the instructions here: http://brianoneill.blogspot.com/2016/03/pyspark-on-amazon-emr-w-kinesis.html The example will sometimes work. When aggregation is disabled, it appears to always work. I'm going to dig a bit deeper, but thought you might have some pointers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org