[
https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756210#comment-16756210
]
Sanket Reddy commented on SPARK-25692:
--------------------------------------
Did some further digging
How to reproduce
./build/mvn test
-Dtest=org.apache.spark.network.RequestTimeoutIntegrationSuite,org.apache.spark.network.ChunkFetchIntegrationSuite
-DwildcardSuites=None test
furtherRequestsDelay Test within RequestTimeoutIntegrationSuite was holding
onto worker references. The test does close the server context but since the
threads are global and there is sleep of 60 secs to fetch a specific chunk
within this test, it grabs on it and waits for the client to consume but
however the test is testing for a request timeout and it times out after 10
secs, so the workers are just waiting there for the buffer to be consumed as
per my understanding. I think we dont need this to be static as the server just
initializes the TransportContext object once. I did some manual tests and it
looks good
> Flaky test: ChunkFetchIntegrationSuite.fetchBothChunks
> ------------------------------------------------------
>
> Key: SPARK-25692
> URL: https://issues.apache.org/jira/browse/SPARK-25692
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Shixiong Zhu
> Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot
> 2018-11-01 at 10.17.16 AM.png
>
>
> Looks like the whole test suite is pretty flaky. See:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/
> This may be a regression in 3.0 as this didn't happen in 2.4 branch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]