[ 
https://issues.apache.org/jira/browse/TEZ-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4364:
------------------------------
    Description: 
TLDR: after TEZ-4388, 

TestFaultTolerance test becomes flakier recently.  It's important to be 
investigated because a unit test failure could also imply a product bug while 
handling failure scenarios.

According to surefire process' jstack, it can be reproduced only by 
TestFaultTolerance.testBasicInputFailureWithoutExitDeadline 
[^surefire_jstack.log]
{code}
"Thread-1355" #1569 prio=5 os_prio=31 tid=0x00007fe76660c800 nid=0x43d07 
waiting on condition [0x000070002ab38000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:155)
        at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:142)
        at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:138)
        at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExitDeadline(TestFaultTolerance.java:351)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

this is when it waits for the DAG to finish

  was:
TestFaultTolerance test becomes flakier recently.  It's important to be 
investigated because a unit test failure could also imply a product bug while 
handling failure scenarios.

According to surefire process' jstack, it can be reproduced only by 
TestFaultTolerance.testBasicInputFailureWithoutExitDeadline 
[^surefire_jstack.log]
{code}
"Thread-1355" #1569 prio=5 os_prio=31 tid=0x00007fe76660c800 nid=0x43d07 
waiting on condition [0x000070002ab38000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:155)
        at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:142)
        at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:138)
        at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExitDeadline(TestFaultTolerance.java:351)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

this is when it waits for the DAG to finish


> TestFaultTolerance timeout on master - TestInput fix after TEZ-4338
> -------------------------------------------------------------------
>
>                 Key: TEZ-4364
>                 URL: https://issues.apache.org/jira/browse/TEZ-4364
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>         Attachments: surefire_jstack.log, 
> syslog_attempt_1640554229092_0001_1_01_000002_0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> TLDR: after TEZ-4388, 
> TestFaultTolerance test becomes flakier recently.  It's important to be 
> investigated because a unit test failure could also imply a product bug while 
> handling failure scenarios.
> According to surefire process' jstack, it can be reproduced only by 
> TestFaultTolerance.testBasicInputFailureWithoutExitDeadline 
> [^surefire_jstack.log]
> {code}
> "Thread-1355" #1569 prio=5 os_prio=31 tid=0x00007fe76660c800 nid=0x43d07 
> waiting on condition [0x000070002ab38000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:155)
>       at 
> org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:142)
>       at 
> org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:138)
>       at 
> org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExitDeadline(TestFaultTolerance.java:351)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}
> this is when it waits for the DAG to finish



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to