mosche commented on pull request #16077:
URL: https://github.com/apache/beam/pull/16077#issuecomment-1050811277


   @aromanenko-dev This should be in a good shape as an initial version. In my 
tests I was able to outperform the KPL based writer of SDK v1.
   
   Using Spark I run into some interesting issues at high scale. These were 
caused by having classes of multiple Netty 4 versions on the classpath (see 
[here](https://github.com/aws/aws-sdk-java-v2/issues/1803)). It took me quite a 
while to figure this out as it only happened at very high throughput. 
   
   I'm enforcing Beam's version of Netty over the more recent one used by AWS. 
In my tests this worked very well.
   
   Another change is the output type of the writer. As recommended 
[here](https://docs.google.com/document/d/1V2FkGGunVgvLwi1dKHr-7mtDuwYjTuvuESV_oPzVnfQ/edit?resourcekey=0-KvfQq-5iCcMlu3f3MFJ-GQ#heading=h.xl97lw3dyot7)
 i changed it from `Void` to `Result` to allow for future backwards compatible 
changes. Currently the writer is fail fast (after retries of course), but 
adding a deadletter output would make sense...
   
   The next step is to provide a more powerful internal partitioner that is 
aware of hashkey ranges assigned to KinesisShards.
   With that record aggregation would be as powerful as the one provided by KPL.
   
   cc @echauchot I'd be more than happy to get another review if you have the 
capacity for it ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to