Hi everyone

I started looking at Kinesis integration and it looks promising.  However,
I feel like it can be improved. Here are my thoughts:

1. It assumes that AWS credentials are provided
by DefaultAWSCredentialsProviderChain and there is no way to change the
behavior. I would have liked to have an ability to provide a different
AWSCredentialsProvider.

2. I feel like modules in extras need to be independent from Spark build
and should perhaps be in separate repository/repositories. I had to
download most recent checkout of Spark and slap kinesis-asl into Spark
1.0.2 to create a custom spark-streaming-kinesis-asl_2.10-1.0.2.jar that I
can use in my Spark jobs. Ideally, people would want extra modules to be
cross built against different versions of Spark. Having independent
repositories can enable us to deliver build for extras packages faster than
Spark releases and they would be readily available to earlier versions of
Spark. This can free up Spark developers to focus on enhancements in the
core framework instead of managing spark-* integration pull requests.

3. Maybe it's just me, but I could have liked a Context like API for
creating Kinesis streams instead of using KinesisUtils. It makes it a
little more consistent with rest of the Spark API. We could have have
a KinesisContext which goes like this:
class KinesisStreamingContext(@transient ssc: StreamingContext,
endpointUrl: String, defaultCredentialsProvider: AWSCredentialsProvider) {

  def createStream(streamName: String,
      checkpointInterval: Duration,
      initialPositionInStream: InitialPositionInStream,
      storageLevel: StorageLevel,
      credentialsProvider: AWSCredentialsProvider =
defaultCredentialsProvider) {...}
}

4. The example KinesisWordCountASL creates numShards receiver instances
which makes sense. Maybe the API should provide ability to provide
parallelism and default to numShards?

I can submit pull requests for some of the above items, provided the
community agrees and nobody else is working on it.

Thanks,
Aniket

Reply via email to