[
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=315236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-315236
]
ASF GitHub Bot logged work on BEAM-3342:
----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Sep/19 18:49
Start Date: 19/Sep/19 18:49
Worklog Time Spent: 10m
Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-533260992
During the latest series of tests using the Dataflow, the following error
appears:
```
...
INFO:root:2019-09-19T12:19:07.484Z: JOB_MESSAGE_DETAILED: Autoscaling:
Raised the number of workers to 10 based on the rate of progress in the
currently running step(s).
INFO:root:2019-09-19T12:20:15.500Z: JOB_MESSAGE_DETAILED: Workers have
started successfully.
INFO:root:2019-09-19T12:20:15.528Z: JOB_MESSAGE_DETAILED: Workers have
started successfully.
INFO:root:2019-09-19T12:24:39.237Z: JOB_MESSAGE_DETAILED: Checking
permissions granted to controller Service Account.
INFO:root:2019-09-19T12:30:39.238Z: JOB_MESSAGE_DETAILED: Checking
permissions granted to controller Service Account.
INFO:root:2019-09-19T12:32:51.436Z: JOB_MESSAGE_BASIC: Finished operation
Bigtable Read/Impulse+Bigtable Read/Split+Bigtable
Read/Reshuffle/AddRandomKeys+Bigtable
Read/Reshuffle/ReshufflePerKey/Map(reify_timestamps)+Bigtable
Read/Reshuffle/ReshufflePerKey/GroupByKe
y/Reify+Bigtable Read/Reshuffle/ReshufflePerKey/GroupByKey/Write
INFO:root:2019-09-19T12:32:51.616Z: JOB_MESSAGE_DEBUG: Executing failure
step failure32
INFO:root:2019-09-19T12:32:51.677Z: JOB_MESSAGE_ERROR: Workflow failed.
Causes: S03:Bigtable Read/Impulse+Bigtable Read/Split+Bigtable
Read/Reshuffle/AddRandomKeys+Bigtable
Read/Reshuffle/ReshufflePerKey/Map(reify_timestamps)+Bigtable
Read/Reshuffle/ReshufflePerKey
/GroupByKey/Reify+Bigtable Read/Reshuffle/ReshufflePerKey/GroupByKey/Write
failed., The job failed because a work item has failed 4 times. Look in
previous log entries for the cause of each one of the 4 failures. For more
information, see https://cloud.google.com/d
ataflow/docs/guides/common-errors. The work item was attempted on these
workers:
bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
Root cause: The worker lost contact with the service.,
bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
Root cause: The worker lost contact with the service.,
bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
Root cause: The worker lost contact with the service.,
bigtableio-it-test-10k-20-09190518-70i4-harness-h8n9
Root cause: The worker lost contact with the service.
INFO:root:2019-09-19T12:32:51.797Z: JOB_MESSAGE_DETAILED: Cleaning up.
INFO:root:2019-09-19T12:32:51.877Z: JOB_MESSAGE_DEBUG: Starting worker pool
teardown.
INFO:root:2019-09-19T12:32:51.903Z: JOB_MESSAGE_BASIC: Stopping worker
pool...
INFO:root:2019-09-19T12:37:45.412Z: JOB_MESSAGE_DETAILED: Autoscaling:
Reduced the number of workers to 0 based on the rate of progress in the
currently running step(s).
INFO:root:2019-09-19T12:37:45.506Z: JOB_MESSAGE_BASIC: Worker pool stopped.
INFO:root:2019-09-19T12:37:45.544Z: JOB_MESSAGE_DEBUG: Tearing down pending
resources...
INFO:root:Job 2019-09-19_05_18_33-1351245893518858257 is in state
JOB_STATE_FAILED
ERROR
...
```
The code used in this test is essentially the same code that ran
successfully a couple of months ago. What was changed since then is the test
code of `bigtableio_it_test.py` file that was refactored to eliminate redundant
steps in parsing the pipeline parameters from the command line arguments. The
parameters themselves were not changed and, based on the log output, are
properly set within the `beam.Pipeline` object that runs the pipeline during
the test. Based on the message timings, it seems like the pipeline is able to
start the workers, after which everything hangs. The "Checking permissions..."
message could suggest an authorization issue, but the same type of the pipeline
with the same parameters (except for the job name) is perfectly able to do the
write sequence during the 'write' part of the same test.
Reverting the code to the very first, initial version of the current PR,
that had been tested several times prior to its submission and shown to work,
now throws a pickling error:
```
...
======================================================================
ERROR: test_bigtable_io (__main__.BigtableIOTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"c:\git\beam-MF\sdks\python\apache_beam\io\gcp\bigtableio_it_test.py", line
120, in test_bigtable_io
self.result.wait_until_finish()
File
"C:\Python27\Beam-MF\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py",
line 1338, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error
received from SDK harness for instruction -31: Traceback (most recent call
last):
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 157, in _execute
response = task()
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 190, in <lambda>
self._execute(lambda: worker.do_instruction(work), work)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 342, in do_instruction
request.instruction_id)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 362, in process_bundle
instruction_id, request.process_bundle_descriptor_reference)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 305, in get
self.data_channel_factory)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 508, in __init__
self.ops = self.create_execution_tree(self.process_bundle_descriptor)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 552, in create_execution_tree
descriptor.transforms, key=topological_height, reverse=True)])
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 476, in wrapper
result = cache[args] = func(*args)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 535, in get_operation
in descriptor.transforms[transform_id].outputs.items()
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 534, in <dictcomp>
for tag, pcoll_id
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 476, in wrapper
result = cache[args] = func(*args)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 538, in get_operation
transform_id, transform_consumers)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 797, in create_operation
return creator(self, transform_id, transform_proto, payload, consumers)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 1040, in create
serialized_fn, parameter)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 1078, in _create_pardo_operation
dofn_data = pickler.loads(serialized_fn)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line
258, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 317, in
loads
return load(file, ignore)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 305, in
load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1096, in load_global
klass = self.find_class(module, name)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 474, in
find_class
return StockUnpickler.find_class(self, module, name)
File "/usr/lib/python2.7/pickle.py", line 1132, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'BigtableSource'
...
```
The above error log has been produced by the exactly the same code that used
to run successfully some months ago in virtually the same local environment
(same PC, IDE, Python version etc.). Such flaky behavior suggests factors
outside of the connector code that may adversely affect its otherwise
successful execution.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 315236)
Time Spent: 39h (was: 38h 50m)
> Create a Cloud Bigtable IO connector for Python
> -----------------------------------------------
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Solomon Duskis
> Assignee: Solomon Duskis
> Priority: Major
> Time Spent: 39h
> Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)