[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

via GitHub Tue, 28 Feb 2023 07:44:35 -0800


echauchot commented on PR #3:
URL: 
https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1448409889


   @zentol I addressed all your comments and changed the splits architecture:
   I Introduced a table size estimation (based on Cassandra statistical size 
estimates). I added an optional user conf to specify max split memory size. If 
set, the source generates splits of `maxSplitMemorySize` with protection 
measures (in comparison to task parallelism) for number of splits. 
   I now read a split as a whole (no state needed).
   I added the related splits and size tests. They require to use JMX to force 
mem tables flush on the cassandra cluster so that the system size estimates can 
be updated (as we just wrote test data). The official Cassandra image 
deactivates jmx, to enable it we need to provide authentication and modify 
cassandra-env.sh so I had to create my own image (!)
   Also the flush is very long (30s) so for all split tests I wrote and flush 
only once (contrary to other tests that write test data for each test)
   
   PTAL. I hope it will be last round of review as I changed a lot and spent so 
much time on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-connector-cassandra] echauchot commented on pull request #3: [FLINK-26822] Add Cassandra Source

Reply via email to