[ 
https://issues.apache.org/jira/browse/FLINK-30328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678727#comment-17678727
 ] 

Roman Khachatryan commented on FLINK-30328:
-------------------------------------------

Sorry for the late reply, [~mapohl] 

I've checked it locally and made sure that it is a test issue, not a production 
code issue. In particular, I've found no issues with:
 - memory sharing (the feature that's tested)
 - concurrency - except metric reading, which is expected

The problem in test is caused by the timings of metrics collections.

The interval is hardcoded to 5s in Flink currently, but even if changed it 
still doesn't solve the problem completely. There seems to be no better 
alternative to testing via metrics either.

So I opened drafted a PR to check deviation instead of min/max which should 
eliminate false positives: https://github.com/apache/flink/pull/21733
I'll try to finalize it this week.

> TaskManagerWideRocksDbMemorySharingITCase.testBlockCache failed
> ---------------------------------------------------------------
>
>                 Key: FLINK-30328
>                 URL: https://issues.apache.org/jira/browse/FLINK-30328
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Runtime / State Backends
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>              Labels: test-stability
>
> {{TaskManagerWideRocksDbMemorySharingITCase.testBlockCache}} failed in this 
> build: 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=43763&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=9836]
> {code:java}
> Dec 06 16:33:59 [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 12.926 s <<< FAILURE! - in 
> org.apache.flink.test.state.TaskManagerWideRocksDbMemorySharingITCase
> Dec 06 16:33:59 [ERROR] 
> org.apache.flink.test.state.TaskManagerWideRocksDbMemorySharingITCase.testBlockCache
>   Time elapsed: 12.907 s  <<< FAILURE!
> Dec 06 16:33:59 java.lang.AssertionError: 
> Dec 06 16:33:59 Block cache usage reported by different tasks varies too 
> much: DoubleSummaryStatistics{count=20, sum=3783523840.000000, 
> min=189045056.000000, average=189176192.000000, max=189569600.000000}
> Dec 06 16:33:59 That likely mean that they use different cache objects 
> expected:<1.895696E8> but was:<1.89045056E8>
> Dec 06 16:33:59       at org.junit.Assert.fail(Assert.java:89)
> Dec 06 16:33:59       at org.junit.Assert.failNotEquals(Assert.java:835)
> Dec 06 16:33:59       at org.junit.Assert.assertEquals(Assert.java:555)
> Dec 06 16:33:59       at 
> org.apache.flink.test.state.TaskManagerWideRocksDbMemorySharingITCase.testBlockCache(TaskManagerWideRocksDbMemorySharingITCase.java:133)
> [...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to