mosche commented on pull request #16077: URL: https://github.com/apache/beam/pull/16077#issuecomment-1050811277
@aromanenko-dev This should be in a good shape as an initial version. In my tests I was able to outperform the KPL based writer of SDK v1. Using Spark I run into some interesting issues at high scale. These were caused by having classes of multiple Netty 4 versions on the classpath (see [here](https://github.com/aws/aws-sdk-java-v2/issues/1803)). It took me quite a while to figure this out as it only happened at very high throughput. I'm enforcing Beam's version of Netty over the more recent one used by AWS. In my tests this worked very well. Another change is the output type of the writer. As recommended [here](https://docs.google.com/document/d/1V2FkGGunVgvLwi1dKHr-7mtDuwYjTuvuESV_oPzVnfQ/edit?resourcekey=0-KvfQq-5iCcMlu3f3MFJ-GQ#heading=h.xl97lw3dyot7) i changed it from `Void` to `Result` to allow for future backwards compatible changes. Currently the writer is fail fast (after retries of course), but adding a deadletter output would make sense... The next step is to provide a more powerful internal partitioner that is aware of hashkey ranges assigned to KinesisShards. With that record aggregation would be as powerful as the one provided by KPL. cc @echauchot I'd be more than happy to get another review if you have the capacity for it ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
