echauchot commented on pull request #17849: URL: https://github.com/apache/flink/pull/17849#issuecomment-979979676
> Were you able to replicate the issue locally, or is this more of a "throw-stuff-at-a-wall-and-see-what-sticks" kind of situation? (Which I wouldn't mind for this particular test...) No, as this timeout is a flakiness issue which happens under load (see [my comment](https://issues.apache.org/jira/browse/FLINK-22775?focusedCommentId=17446552&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17446552) in the ticket) I did not manage to reproduce it out of 30 local runs. But I have pretty good confidence that avoiding the cluster to wait for a replicate on write would avoid the timeout under load. As explain in my comment in the ticket I plan to monitor the ITest for some weeks and see if it is still flaky with my fix. If it is still flaky then we could consider migrate the cassandra test cluster from embedded daemon to either testContainers (relies on docker so less sensitive to load) or ASF v2 licenced test component such as Achilles (that I used in Apache Beam and that I contributed to) which has a lot of knobs for configuring the cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
