Yes, it seems like the reset() method resets System.out even if it never
was initialized. Seems like a simple fix to have reset() be safe to call at
all times. Csaba or Michael, would either of you like to open a PR and send
it my way?

On Fri, Apr 5, 2019 at 1:39 PM Michael Luckey <[email protected]> wrote:

> Ah...
>
> Did not yet debug. But wouldn't [1] mean setting system.out  to 'null' on
> first call to @setup ? As there was no previous call to
> DataflowWorkerLoggingInitializer.initialize?
>
>
> https://github.com/apache/beam/blame/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java#L81
>
> On Fri, Apr 5, 2019 at 10:12 PM Lukasz Cwik <[email protected]> wrote:
>
>> We replace System.out/err to capture user logs and forward the logs for
>> the Dataflow worker[1]. It could be that this test[2] is not resetting it
>> afterwards which leaves it at null and then some future code causes it to
>> fail.
>>
>> 1:
>> https://github.com/apache/beam/blob/e69d69d72dc5b9c3d6069c0b71825c3c2b0b4e61/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializer.java#L132
>> 2:
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/logging/DataflowWorkerLoggingInitializerTest.java
>>
>> On Fri, Apr 5, 2019 at 1:42 AM Michael Luckey <[email protected]>
>> wrote:
>>
>>> FWIW, the TimerRecieverTest is also failing consistently on my macOS.
>>> Running on my ubuntu VM, they pass.
>>>
>>> Now the stacktrace indicates an NullPinterException thrown out of the
>>> finally block [1]
>>>
>>> As this is really bad and of course would hide the cause, I added some
>>>
>>> diff --git
>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>
>>> index 708b669112..8c21928da1 100644
>>>
>>> ---
>>> a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>
>>> +++
>>> b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java
>>>
>>> @@ -169,7 +169,12 @@ public class FnHarness {
>>>
>>>        LOG.info("Entering instruction processing loop");
>>>
>>>        control.processInstructionRequests(options.as
>>> (GcsOptions.class).getExecutorService());
>>>
>>>      } finally {
>>>
>>> -      System.out.println("Shutting SDK harness down.");
>>>
>>> +      try {
>>>
>>> +        System.out.println("Shutting SDK harness down.");
>>>
>>> +      } catch (NullPointerException npe) {
>>>
>>> +        LOG.warn("NPE sys.out=" + System.out, npe);
>>>
>>> +      }
>>>
>>>      }
>>>
>>>    }
>>>
>>>  }
>>>
>>> No my test shows outputs
>>>
>>> Apr 05, 2019 9:29:59 AM org.apache.beam.fn.harness.FnHarness main
>>> WARNING: NPE  sys.out=null
>>> java.lang.NullPointerException
>>>     at org.apache.beam.fn.harness.FnHarness.main(FnHarness.java:173)
>>>     at 
>>> org.apache.beam.runners.dataflow.worker.fn.control.TimerReceiverTest.lambda$setUp$0(TimerReceiverTest.java:123)
>>>     at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>  at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>>
>>> and pass (sic!)
>>>
>>> Something weird is going on here....
>>>
>>> Now replacing that 'System.out' with 'LOG.info' seems also to be
>>> working. At least I could not reproduce the failure trying several times. I
>>> am lost here, as there is probably a good reason to use sys out here.
>>>
>>> Btw. After the first failure with NullPointerExceptions. successive runs
>>> seem to fail for different reasons. Getting timeout in test setup. Unsure,
>>> might indicate some grpc port/server startup issue because previous run did
>>> not do proper cleanup.
>>>
>>> best,
>>>
>>> michel
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L172
>>>
>>> On Thu, Apr 4, 2019 at 10:42 PM Lukasz Cwik <[email protected]> wrote:
>>>
>>>> I looked at the failures you were experiencing and the error message
>>>> doesn't provide enough information to figure out why it is failing.
>>>>
>>>> On Wed, Apr 3, 2019 at 9:23 PM Csaba Kassai <[email protected]> wrote:
>>>>
>>>>> Oh, I just missed it then :)
>>>>> Thank you Lukasz for connecting us.
>>>>>
>>>>> Yeah, the two TimerReceiverTest tests fail reliably for me.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 2 Apr 2019 at 23:53, Lukasz Cwik <[email protected]> wrote:
>>>>>
>>>>>> +Ahmed
>>>>>>
>>>>>> I have added you as a contributor.
>>>>>>
>>>>>> It seems as though Ahmed had just picked up BEAM-3489 yesterday.
>>>>>> Reach out to Ahmed if you would like to help them out with the task.
>>>>>>
>>>>>> Was TimerReceiverTest failing reliably when performing a parallel
>>>>>> build or is it flaky?
>>>>>>
>>>>>> I have asked Chamikara to take a look for PR 8180.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 2, 2019 at 8:33 AM Csaba Kassai <[email protected]> wrote:
>>>>>>
>>>>>>> Hi All!
>>>>>>>
>>>>>>> I am Csabi, I would be happy to contribute to Beam.
>>>>>>> Could you grant me contributor role and assign issue BEAM-3489
>>>>>>> <https://issues.apache.org/jira/browse/BEAM-3489>  to me? My user
>>>>>>> name is "csabakassai".
>>>>>>>
>>>>>>> After I checked out the code and tried to do a gradle check I found
>>>>>>> these issues:
>>>>>>>
>>>>>>>    1. *jUnit tests fails:* the TimerReceiverTest fails in the
>>>>>>>    ":beam-runners-google-cloud-dataflow-java-fn-api-worker:test" and the
>>>>>>>    ":beam-runners-google-cloud-dataflow-java-legacy-worker:test" tasks. 
>>>>>>> When I
>>>>>>>    execute tests independently everything is fine, so I disabled the 
>>>>>>> parallel
>>>>>>>    build and this solves the problem. I have not investigated further, 
>>>>>>> do you
>>>>>>>    have any more insights on this issue? I have attached the test 
>>>>>>> reports.
>>>>>>>    2. *python test fail*: there is a python test which fails if the
>>>>>>>    current offset of your timezone differs from the offset in 1970. In 
>>>>>>> my case
>>>>>>>    the Singapore is now GMT+8 and it was GMT+7:30 in 1970. I created a 
>>>>>>> ticket
>>>>>>>    for this issue where I I describe the problem in details:
>>>>>>>    https://jira.apache.org/jira/browse/BEAM-6947. Could you assign
>>>>>>>    the ticket to me? Also I created a PR with a possible fix:
>>>>>>>    https://github.com/apache/beam/pull/8180. Could you suggest me a
>>>>>>>    reviewer?
>>>>>>>
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Csabi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Reply via email to