echauchot commented on PR #3: URL: https://github.com/apache/flink-connector-cassandra/pull/3#issuecomment-1448409889
@zentol I addressed all your comments and changed the splits architecture: I Introduced a table size estimation (based on Cassandra statistical size estimates). I added an optional user conf to specify max split memory size. If set, the source generates splits of `maxSplitMemorySize` with protection measures (in comparison to task parallelism) for number of splits. I now read a split as a whole (no state needed). I added the related splits and size tests. They require to use JMX to force mem tables flush on the cassandra cluster so that the system size estimates can be updated (as we just wrote test data). The official Cassandra image deactivates jmx, to enable it we need to provide authentication and modify cassandra-env.sh so I had to create my own image (!) Also the flush is very long (30s) so for all split tests I wrote and flush only once (contrary to other tests that write test data for each test) PTAL. I hope it will be last round of review as I changed a lot and spent so much time on that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
