[ 
https://issues.apache.org/jira/browse/FLUME-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327321#comment-14327321
 ] 

Santiago M. Mola commented on FLUME-2625:
-----------------------------------------

[~trex58] Could you provide specific errors (with stacktraces) for 
TestLoadBalancingRpcClient abd TestThriftClient?

> There are several unstable tests within FLUME
> ---------------------------------------------
>
>                 Key: FLUME-2625
>                 URL: https://issues.apache.org/jira/browse/FLUME-2625
>             Project: Flume
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: v1.5.0.1
>         Environment: RHEL 7.1 / x86_64 / Open JDK 1.7
>            Reporter: Tony Reix
>
> Hi,
> I'm working on porting FLUME in a RHEL 7.1 / PPC64LE / IBM JVM 1.7 
> environment.
> As an example, I've found that the test .source.TestSyslogUdpSource fails, 
> but not always, only 7 times out of 10 tries. Testing on RHEL 7.1 / x86_64 / 
> IBM JVM, I've also had random failures.
> Running the same .source.TestSyslogUdpSource test in RHEL 7.1 / x86_64 / Open 
> JDK 1.7 environment, I've found that this test fails only once out of 30 
> tries: it is an "unstable" test.
> In order to find which test issues are specific to PPC64 or IBMJVM 
> environment, I've run 10 times all the FLUME tests in the RHEL 7.1 / x86_64 / 
> Open JDK 1.7 environment, which I call my "reference" environment.
> Then, using a tool that compares all the results, I've found that there are 
> 16 tests that are "unstable" in my "reference" (x86_64/OpenJDK) .
> By "unstable", I mean to say that the results vary, though the environment is 
> exactly the same.
> These tests are:
> .api.TestLoadBalancingRpcClient
> .api.TestThriftRpcClient
> .channel.file.TestFileChannelRestart
> .channel.TestSpillableMemoryChannel
> .instrumentation.http.TestHTTPMetricsServer
> .sink.TestAvroSink
> .sink.TestThriftSink
> .source.avroLegacy.TestLegacyAvroSource
> .source.http.TestHTTPSource
> .source.TestAvroSource
> .source.TestExecSource
> .source.TestMultiportSyslogTCPSource
> .source.TestSyslogTcpSource
> .source.TestSyslogUdpSource
> .source.TestThriftSource
> .source.thriftLegacy.TestThriftLegacySource
> About ".source.TestSyslogUdpSource" test, my analysis is that the test code 
> is not reliable since the test checks that some data is correct without 
> checking that all the "messages" have arrived (sometimes, a message has not 
> arrived in time, and a reference is NULL).
> Adding "sleep(1000) to the test with IBM JVM, the test then failed only 3 
> times out of 10.
> So, I think that several FLUME tests are coded in a way that is not 100% 
> reliable. Or it could also be that some core code of FLUME is not 100% 
> reliable.
> I mean to say that some code may have been written based on the specific 
> behaviour of the OpenJDK Java Virtual Machine, which was used for testing. 
> Some change about how the order of threads are launched, or about the time 
> needed to send messages in the JVM/OS, may lead to issues that are not 
> correctly handled by the code (mainly test code, but maybe core code too). 
> And it seems that, though being perfectly correct, the IBM JVM does not work 
> the same way compared to OpenJDK.
> So, this is a pain. Mainly in my PPC64LE/IBMJVM environment.
> I think that these 16 tests must be analysed and improved.
> Also, running tests with OpenJDK  AND  IBM JVM in your development and 
> test/Jenkins environments would help to see these random issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to