[ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=315236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315236
 ]

ASF GitHub Bot logged work on BEAM-3342:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Sep/19 18:49
            Start Date: 19/Sep/19 18:49
    Worklog Time Spent: 10m 
      Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-533260992
 
 
   During the latest series of tests using the Dataflow, the following error 
appears:
   
   ```
   ...
   INFO:root:2019-09-19T12:19:07.484Z: JOB_MESSAGE_DETAILED: Autoscaling: 
Raised the number of workers to 10 based on the rate of progress in the 
currently running step(s).
   INFO:root:2019-09-19T12:20:15.500Z: JOB_MESSAGE_DETAILED: Workers have 
started successfully.
   INFO:root:2019-09-19T12:20:15.528Z: JOB_MESSAGE_DETAILED: Workers have 
started successfully.
   INFO:root:2019-09-19T12:24:39.237Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
   INFO:root:2019-09-19T12:30:39.238Z: JOB_MESSAGE_DETAILED: Checking 
permissions granted to controller Service Account.
   INFO:root:2019-09-19T12:32:51.436Z: JOB_MESSAGE_BASIC: Finished operation 
Bigtable Read/Impulse+Bigtable Read/Split+Bigtable 
Read/Reshuffle/AddRandomKeys+Bigtable 
Read/Reshuffle/ReshufflePerKey/Map(reify_timestamps)+Bigtable 
Read/Reshuffle/ReshufflePerKey/GroupByKe
   y/Reify+Bigtable Read/Reshuffle/ReshufflePerKey/GroupByKey/Write
   INFO:root:2019-09-19T12:32:51.616Z: JOB_MESSAGE_DEBUG: Executing failure 
step failure32
   INFO:root:2019-09-19T12:32:51.677Z: JOB_MESSAGE_ERROR: Workflow failed. 
Causes: S03:Bigtable Read/Impulse+Bigtable Read/Split+Bigtable 
Read/Reshuffle/AddRandomKeys+Bigtable 
Read/Reshuffle/ReshufflePerKey/Map(reify_timestamps)+Bigtable 
Read/Reshuffle/ReshufflePerKey
   /GroupByKey/Reify+Bigtable Read/Reshuffle/ReshufflePerKey/GroupByKey/Write 
failed., The job failed because a work item has failed 4 times. Look in 
previous log entries for the cause of each one of the 4 failures. For more 
information, see https://cloud.google.com/d
   ataflow/docs/guides/common-errors. The work item was attempted on these 
workers:
     bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
         Root cause: The worker lost contact with the service.,
     bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
         Root cause: The worker lost contact with the service.,
     bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
         Root cause: The worker lost contact with the service.,
     bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
         Root cause: The worker lost contact with the service.
   INFO:root:2019-09-19T12:32:51.797Z: JOB_MESSAGE_DETAILED: Cleaning up.
   INFO:root:2019-09-19T12:32:51.877Z: JOB_MESSAGE_DEBUG: Starting worker pool 
teardown.
   INFO:root:2019-09-19T12:32:51.903Z: JOB_MESSAGE_BASIC: Stopping worker 
pool...
   INFO:root:2019-09-19T12:37:45.412Z: JOB_MESSAGE_DETAILED: Autoscaling: 
Reduced the number of workers to 0 based on the rate of progress in the 
currently running step(s).
   INFO:root:2019-09-19T12:37:45.506Z: JOB_MESSAGE_BASIC: Worker pool stopped.
   INFO:root:2019-09-19T12:37:45.544Z: JOB_MESSAGE_DEBUG: Tearing down pending 
resources...
   INFO:root:Job 2019-09-19_05_18_33-1351245893518858257 is in state 
JOB_STATE_FAILED
   ERROR
   ...
   ```
   
   The code used in this test is essentially the same code that ran 
successfully a couple of months ago. What was changed since then is the test 
code of `bigtableio_it_test.py` file that was refactored to eliminate redundant 
steps in parsing the pipeline parameters from the command line arguments. The 
parameters themselves were not changed and, based on the log output, are 
properly set within the `beam.Pipeline` object that runs the pipeline during 
the test. Based on the message timings, it seems like the pipeline is able to 
start the workers, after which everything hangs. The "Checking permissions..." 
message could suggest an authorization issue, but the same type of the pipeline 
with the same parameters (except for the job name) is perfectly able to do the 
write sequence during the 'write' part of the same test.
   
   Reverting the code to the very first, initial version of the current PR, 
that had been tested several times prior to its submission and shown to work, 
now throws a pickling error:
   
   ```
   ...
   ======================================================================
   ERROR: test_bigtable_io (__main__.BigtableIOTest)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File 
"c:\git\beam-MF\sdks\python\apache_beam\io\gcp\bigtableio_it_test.py", line 
120, in test_bigtable_io
       self.result.wait_until_finish()
     File 
"C:\Python27\Beam-MF\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py",
 line 1338, in wait_until_finish
       (self.state, getattr(self._runner, 'last_error_msg', None)), self)
   DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
   java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error 
received from SDK harness for instruction -31: Traceback (most recent call 
last):
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 157, in _execute
       response = task()
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 190, in <lambda>
       self._execute(lambda: worker.do_instruction(work), work)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction
       request.instruction_id)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 362, in process_bundle
       instruction_id, request.process_bundle_descriptor_reference)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 305, in get
       self.data_channel_factory)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 508, in __init__
       self.ops = self.create_execution_tree(self.process_bundle_descriptor)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 552, in create_execution_tree
       descriptor.transforms, key=topological_height, reverse=True)])
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 476, in wrapper
       result = cache[args] = func(*args)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 535, in get_operation
       in descriptor.transforms[transform_id].outputs.items()
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 534, in <dictcomp>
       for tag, pcoll_id
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 476, in wrapper
       result = cache[args] = func(*args)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 538, in get_operation
       transform_id, transform_consumers)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 797, in create_operation
       return creator(self, transform_id, transform_proto, payload, consumers)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1040, in create
       serialized_fn, parameter)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 1078, in _create_pardo_operation
       dofn_data = pickler.loads(serialized_fn)
     File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 
258, in loads
       return dill.loads(s)
     File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 317, in 
loads
       return load(file, ignore)
     File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 305, in 
load
       obj = pik.load()
     File "/usr/lib/python2.7/pickle.py", line 864, in load
       dispatch[key](self)
     File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
       klass = self.find_class(module, name)
     File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 474, in 
find_class
       return StockUnpickler.find_class(self, module, name)
     File "/usr/lib/python2.7/pickle.py", line 1132, in find_class
       klass = getattr(mod, name)
   AttributeError: 'module' object has no attribute 'BigtableSource'
   ...
   ```
   
   The above error log has been produced by the exactly the same code that used 
to run successfully some months ago in virtually the same local environment 
(same PC, IDE, Python version etc.). Such flaky behavior suggests factors 
outside of the connector code that may adversely affect its otherwise 
successful execution.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 315236)
    Time Spent: 39h  (was: 38h 50m)

> Create a Cloud Bigtable IO connector for Python
> -----------------------------------------------
>
>                 Key: BEAM-3342
>                 URL: https://issues.apache.org/jira/browse/BEAM-3342
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Solomon Duskis
>            Assignee: Solomon Duskis
>            Priority: Major
>          Time Spent: 39h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to