damondouglas commented on issue #32144: URL: https://github.com/apache/beam/issues/32144#issuecomment-2521820063
Unassigning myself but relaying my research on this ticket. # Situation This workflow's test failed roughly every 2 to 3 days in the past two weeks. # Background This workflow is scheduled to run twice daily. Recent inspection of the latest failures shows that a timeout (`Failed: Timeout >1800.0s`) when the actual Dataflow Job for that execution succeeded. The stack trace of each failure is not the same for the past two weeks' failures. In each build scans' timeline we see that `:sdks:python:test-suites:dataflow:py39:runPerformanceTest` takes `~30m` cutting off at the configured timeout. Said timeout is set on the `runPerformanceTest` gradle task per https://github.com/pytest-dev/pytest-timeout. Dataflow Jobs for these failed tests take `~10 to 13m`. Successful tests do not print out any information about the Dataflow Job to compare. There are additional tasks performed by the `_run_workcount_it` method such as cleanup and publishing metrics to BigQuery. Further analysis of the cleanup and publishing to metrics only requires information about artifacts and metadata generated during the test, such as the Job Id, Google Cloud storage files, etc. Notably, there's a usage of an influx DB to read and then write to BigQuery. # Assessment We can rule out any failing Dataflow Jobs as a root cause of the failure incidences. Moreover, there seems to be `~15m` of extra work outside the Dataflow Job execution that is being executed within the test code. There seems like a lot of unnecessary coupling of after test functions with running the test. # Recommendations - Remove the after test clean up and consider using a Google Cloud storage wildcard approach to schedule a deletion of test artifacts outside test execution. - Remove the influx DB read and write to BigQuery. Perhaps use a scheduled batch or streaming Pipeline to collect these results into BigQuery. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
