[ 
https://issues.apache.org/jira/browse/DAFFODIL-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Interrante resolved DAFFODIL-2751.
---------------------------------------
    Resolution: Fixed

Fixed in commit 04944afce95aefaea61cc74c714661f26d82f59d

(thanks to Steve)

> Occasional network timeout exceptions can hang a CI job now
> -----------------------------------------------------------
>
>                 Key: DAFFODIL-2751
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2751
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: 3.5.0
>            Reporter: John Interrante
>            Assignee: Steve Lawrence
>            Priority: Minor
>             Fix For: 3.5.0
>
>
> Please see these 2 runs in GitHub Actions:
> [Add Daffodil Developer Guide · apache/daffodil@9d114c3 
> (github.com)|https://github.com/apache/daffodil/actions/runs/3464760904/jobs/5786683343]
> [Add Daffodil Developer Guide · apache/daffodil@0bc99e6 
> (github.com)|https://github.com/apache/daffodil/actions/runs/3475210535/jobs/5809186675]
> One job in both runs hanged for 5 hours 54 minutes so GitHub Actions had to 
> kill the job.  Both jobs were running on the same runner (Java 8, Scala 
> 2.12.17, ubuntu-20.04) and had failed in the following unit tests with the 
> same error message:
> org.apache.daffodil.io.TestInputSourceDataInputStream8.networkReadPartial1 
> org.apache.daffodil.io.TestSocketPairTestRig.testHangDetection1
> org.apache.daffodil.io.TestSocketPairTestRig.testHangDetection2
> org.apache.daffodil.io.TestSocketPairTestRig.testSocketPairTestRig1
> failed: java.util.concurrent.TimeoutException: Futures timed out after [1000 
> milliseconds], took 1.002 sec
> The rest of the jobs ran all of the unit tests successfully without any 
> timeout exceptions.  We have had an occasional timeout exception fail 1 out 
> of 6 jobs in a run before but they had not caused the job to hang before (the 
> job had simply terminated after running the unit tests).
> I do not think there was a change in the GitHub Actions runner.  I checked 
> the last CI job on the main branch ([Update sbt to 1.8.0 · 
> apache/daffodil@6d4b2b6 
> (github.com)|https://github.com/apache/daffodil/actions/runs/3462161126/jobs/5780684309])
>  and the runner version numbers were the same in the setup job details.  We 
> have had several CI jobs since the recent changes to the integration tests so 
> it seems unlikely they had anything to do with the new hangs, even though 
> hangs can happen due to non-daemon threads still running in a JVM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to