We replace System.out/err to capture user logs and forward the logs for the
Dataflow worker[1]. It could be that this test[2] is not resetting it
afterwards which leaves it at null and then some future code causes it to
fail.

1:
https://github.com/apache/beam/blob/e69d69d72dc5b9c3d6069c0b71825c3c2b0b4e61/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java#L132
2:
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java

On Fri, Apr 5, 2019 at 1:42 AM Michael Luckey <[email protected]> wrote:

> FWIW, the TimerRecieverTest is also failing consistently on my macOS.
> Running on my ubuntu VM, they pass.
>
> Now the stacktrace indicates an NullPinterException thrown out of the
> finally block [1]
>
> As this is really bad and of course would hide the cause, I added some
>
> diff --git
> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>
> index 708b669112..8c21928da1 100644
>
> ---
> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>
> +++
> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>
> @@ -169,7 +169,12 @@ public class FnHarness {
>
>        LOG.info("Entering instruction processing loop");
>
>        control.processInstructionRequests(options.as
> (GcsOptions.class).getExecutorService());
>
>      } finally {
>
> -      System.out.println("Shutting SDK harness down.");
>
> +      try {
>
> +        System.out.println("Shutting SDK harness down.");
>
> +      } catch (NullPointerException npe) {
>
> +        LOG.warn("NPE sys.out=" + System.out, npe);
>
> +      }
>
>      }
>
>    }
>
>  }
>
> No my test shows outputs
>
> Apr 05, 2019 9:29:59 AM org.apache.beam.fn.harness.FnHarness main
> WARNING: NPE  sys.out=null
> java.lang.NullPointerException
>       at org.apache.beam.fn.harness.FnHarness.main(FnHarness.java:173)
>       at 
> org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.lambda$setUp$0(TimerReceiverTest.java:123)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>
>
>
> and pass (sic!)
>
> Something weird is going on here....
>
> Now replacing that 'System.out' with 'LOG.info' seems also to be working.
> At least I could not reproduce the failure trying several times. I am lost
> here, as there is probably a good reason to use sys out here.
>
> Btw. After the first failure with NullPointerExceptions. successive runs
> seem to fail for different reasons. Getting timeout in test setup. Unsure,
> might indicate some grpc port/server startup issue because previous run did
> not do proper cleanup.
>
> best,
>
> michel
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L172
>
> On Thu, Apr 4, 2019 at 10:42 PM Lukasz Cwik <[email protected]> wrote:
>
>> I looked at the failures you were experiencing and the error message
>> doesn't provide enough information to figure out why it is failing.
>>
>> On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai <[email protected]> wrote:
>>
>>> Oh, I just missed it then :)
>>> Thank you Lukasz for connecting us.
>>>
>>> Yeah, the two TimerReceiverTest tests fail reliably for me.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik <[email protected]> wrote:
>>>
>>>> +Ahmed
>>>>
>>>> I have added you as a contributor.
>>>>
>>>> It seems as though Ahmed had just picked up BEAM-3489 yesterday. Reach
>>>> out to Ahmed if you would like to help them out with the task.
>>>>
>>>> Was TimerReceiverTest failing reliably when performing a parallel build
>>>> or is it flaky?
>>>>
>>>> I have asked Chamikara to take a look for PR 8180.
>>>>
>>>>
>>>> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai <[email protected]> wrote:
>>>>
>>>>> Hi All!
>>>>>
>>>>> I am Csabi, I would be happy to contribute to Beam.
>>>>> Could you grant me contributor role and assign issue BEAM-3489
>>>>> <https://issues.apache.org/jira/browse/BEAM-3489>  to me? My user
>>>>> name is "csabakassai".
>>>>>
>>>>> After I checked out the code and tried to do a gradle check I found
>>>>> these issues:
>>>>>
>>>>>    1. *jUnit tests fails:* the TimerReceiverTest fails in the
>>>>>    ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and the
>>>>>    ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" tasks. 
>>>>> When I
>>>>>    execute tests independently everything is fine, so I disabled the 
>>>>> parallel
>>>>>    build and this solves the problem. I have not investigated further, do 
>>>>> you
>>>>>    have any more insights on this issue? I have attached the test reports.
>>>>>    2. *python test fail*: there is a python test which fails if the
>>>>>    current offset of your timezone differs from the offset in 1970. In my 
>>>>> case
>>>>>    the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I created a 
>>>>> ticket
>>>>>    for this issue where I I describe the problem in details:
>>>>>    https://jira.apache.org/jira/browse/BEAM-6947. Could you assign
>>>>>    the ticket to me? Also I created a PR with a possible fix:
>>>>>    https://github.com/apache/beam/pull/8180. Could you suggest me a
>>>>>    reviewer?
>>>>>
>>>>>
>>>>> Thank you,
>>>>> Csabi
>>>>>
>>>>>
>>>>>
>>>>>

Reply via email to