Martijn Visser created FLINK-25480:
--------------------------------------

             Summary: Create dashboard/monitoring to see resource usage per E2E 
test
                 Key: FLINK-25480
                 URL: https://issues.apache.org/jira/browse/FLINK-25480
             Project: Flink
          Issue Type: Improvement
          Components: Test Infrastructure
    Affects Versions: 1.15.0, 1.13.6, 1.14.3
            Reporter: Martijn Visser
             Fix For: 1.15.0, 1.13.6, 1.14.3


Over the past couple of weeks, we've encountered multiple problems with tests 
failing due to out-of-memory errors and/or exit code 137 happening. These are 
happening both on Alibaba CI machines, as well as Azure hosted agents. For the 
Alibaba CI machines, we've mitigated the problem by reducing the number of 
workers per CI machine from 7 to 5. These workers can spin up multiple Docker 
containers, especially with Testcontainers getting used more and more. 

If we can get insights in the resource usage per end-to-end test, it will also 
help in debugging test infrastructure problems more quickly. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to