[ 
https://issues.apache.org/jira/browse/BEAM-4649?focusedWorklogId=116624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116624
 ]

ASF GitHub Bot logged work on BEAM-4649:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Jun/18 21:40
            Start Date: 27/Jun/18 21:40
    Worklog Time Spent: 10m 
      Work Description: aaltay closed pull request #5780: [BEAM-4649] Fix 
pipeline failures by waiting for logging server to be ready
URL: https://github.com/apache/beam/pull/5780
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/worker/log_handler.py 
b/sdks/python/apache_beam/runners/worker/log_handler.py
index 152659e0a3f..8e55aa52f45 100644
--- a/sdks/python/apache_beam/runners/worker/log_handler.py
+++ b/sdks/python/apache_beam/runners/worker/log_handler.py
@@ -49,9 +49,10 @@ class FnApiLogRecordHandler(logging.Handler):
 
   def __init__(self, log_service_descriptor):
     super(FnApiLogRecordHandler, self).__init__()
-    self._log_channel = grpc.intercept_channel(
-        grpc.insecure_channel(log_service_descriptor.url),
-        WorkerIdInterceptor())
+    # Make sure the channel is ready to avoid [BEAM-4649]
+    ch = grpc.insecure_channel(log_service_descriptor.url)
+    grpc.channel_ready_future(ch).result(timeout=60)
+    self._log_channel = grpc.intercept_channel(ch, WorkerIdInterceptor())
     self._logging_stub = beam_fn_api_pb2_grpc.BeamFnLoggingStub(
         self._log_channel)
     self._log_entry_queue = queue.Queue()
diff --git a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py 
b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
index 46670e88964..cbd28568343 100644
--- a/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
+++ b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py
@@ -93,6 +93,7 @@ def main(unused_argv):
     # TODO(vikasrk): This should be picked up from pipeline options.
     logging.getLogger().setLevel(logging.INFO)
     logging.getLogger().addHandler(fn_log_handler)
+    logging.info('Logging handler created.')
   else:
     fn_log_handler = None
 


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 116624)
    Time Spent: 50m  (was: 40m)

> Failures in beam_PostCommit_Py_ValCont due to exception in 
> read_log_control_messages
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-4649
>                 URL: https://issues.apache.org/jira/browse/BEAM-4649
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-harness
>    Affects Versions: 2.6.0
>            Reporter: Alan Myrvold
>            Assignee: Alan Myrvold
>            Priority: Major
>             Fix For: 2.6.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Example
> https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-06-26_11_04_04-14383422363721841333?project=apache-beam-testing
>  
> All of the failures have the same exception in the docker logs:
> I  2018/06/26 18:05:12 Executing: python -m 
> apache_beam.runners.worker.sdk_worker_main
> I  Exception in thread read_log_control_messages:
> I  Traceback (most recent call last):
> I    File "/usr/local/lib/python2.7/threading.py", line 801, in 
> __bootstrap_inner
> I      self.run()
> I    File "/usr/local/lib/python2.7/threading.py", line 754, in run
> I      self.__target(*self.__args, **self.__kwargs)
> I    File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/log_handler.py",
>  line 61, in <lambda>
> I      target=lambda: self._read_log_control_messages(log_control_messages),
> I    File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/log_handler.py",
>  line 107, in _read_log_control_messages
> I      for _ in log_control_iterator:
> I    File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 
> 344, in next
> I      return self._next()
> I    File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 
> 335, in _next
> I      raise self
> I  _Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.UNAVAILABLE, Connect Failed)>
> I  
> I  Traceback (most recent call last):
> I    File "/usr/local/lib/python2.7/runpy.py", line 174, in 
> _run_module_as_main
> I      "__main__", fname, loader, pkg_name)
> I    File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
> I      exec code in run_globals
> I    File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 195, in <module>
> I      main(sys.argv)
> I    File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 134, in main
> I      worker_count=_get_worker_count(sdk_pipeline_options)).run()
> I    File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 104, in run
> I      for work_request in control_stub.Control(get_responses()):
> I    File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 
> 344, in next
> I      return self._next()
> I    File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 
> 324, in _next
> I      raise self
> I  grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with 
> (StatusCode.UNAVAILABLE, Connect Failed)>
> Looks like a race condition?
> Harness log with startup message is appearing *after* docker log with 
> connection exception.
> Harness log:
> 2018-06-26 10:40:59.328 PDT Launched Beam Fn Logging service url: 
> "localhost:12370"
> Docker log:
> 2018-06-26 10:40:53.361 PDT Exception in thread read_log_control_messages:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to