[
https://issues.apache.org/jira/browse/DAFFODIL-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Interrante resolved DAFFODIL-2751.
---------------------------------------
Resolution: Fixed
Fixed in commit 04944afce95aefaea61cc74c714661f26d82f59d
(thanks to Steve)
> Occasional network timeout exceptions can hang a CI job now
> -----------------------------------------------------------
>
> Key: DAFFODIL-2751
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2751
> Project: Daffodil
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: 3.5.0
> Reporter: John Interrante
> Assignee: Steve Lawrence
> Priority: Minor
> Fix For: 3.5.0
>
>
> Please see these 2 runs in GitHub Actions:
> [Add Daffodil Developer Guide · apache/daffodil@9d114c3
> (github.com)|https://github.com/apache/daffodil/actions/runs/3464760904/jobs/5786683343]
> [Add Daffodil Developer Guide · apache/daffodil@0bc99e6
> (github.com)|https://github.com/apache/daffodil/actions/runs/3475210535/jobs/5809186675]
> One job in both runs hanged for 5 hours 54 minutes so GitHub Actions had to
> kill the job. Both jobs were running on the same runner (Java 8, Scala
> 2.12.17, ubuntu-20.04) and had failed in the following unit tests with the
> same error message:
> org.apache.daffodil.io.TestInputSourceDataInputStream8.networkReadPartial1
> org.apache.daffodil.io.TestSocketPairTestRig.testHangDetection1
> org.apache.daffodil.io.TestSocketPairTestRig.testHangDetection2
> org.apache.daffodil.io.TestSocketPairTestRig.testSocketPairTestRig1
> failed: java.util.concurrent.TimeoutException: Futures timed out after [1000
> milliseconds], took 1.002 sec
> The rest of the jobs ran all of the unit tests successfully without any
> timeout exceptions. We have had an occasional timeout exception fail 1 out
> of 6 jobs in a run before but they had not caused the job to hang before (the
> job had simply terminated after running the unit tests).
> I do not think there was a change in the GitHub Actions runner. I checked
> the last CI job on the main branch ([Update sbt to 1.8.0 ·
> apache/daffodil@6d4b2b6
> (github.com)|https://github.com/apache/daffodil/actions/runs/3462161126/jobs/5780684309])
> and the runner version numbers were the same in the setup job details. We
> have had several CI jobs since the recent changes to the integration tests so
> it seems unlikely they had anything to do with the new hangs, even though
> hangs can happen due to non-daemon threads still running in a JVM.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)