[ https://issues.apache.org/jira/browse/BEAM-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368449#comment-15368449 ]
Ahmet Altay commented on BEAM-391: ---------------------------------- Another type of Exception that result in the same behavior: Exception in thread Thread-10: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 160, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line 563, in _start_upload self.client.objects.Insert(self.insert_request, upload=self.upload) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/storage/storage_v1_client.py", line 970, in Insertdownload=download) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 687, in _RunMethodhttp_request, client=self.client) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/transfer.py", line 838, in InitializeUploadretries=self.num_retries) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", line 351, in MakeRequestmax_retry_wait, total_wait_sec)) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", line 341, in MakeRequestcheck_response_func=check_response_func) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", line 391, in _MakeRequestNoRetry redirections=redirections, connection_type=connection_type) File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 616, in new_request self._refresh(request_orig) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/auth.py", line 90, in _refresh token_data = json.loads(urllib2.urlopen(req).read()) File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 449, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open raise URLError(err) Error is coming from auth.py _refresh(). That may require retries based on the type of error. > Exceptions in gcsio upload thread causes pipeline to stall > ---------------------------------------------------------- > > Key: BEAM-391 > URL: https://issues.apache.org/jira/browse/BEAM-391 > Project: Beam > Issue Type: Bug > Components: sdk-py > Reporter: Ahmet Altay > > gcsio got stuck with invalid bucket name > GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket > does not exist. This causes upload thread to silenty fail. It logs exception > to the log but this does not stop the pipeline or closes the receiving end of > the multiprocessing.Pipe(). Later a call in to write() blocks at > self.conn.send_bytes(). Note that send may block if the buffer is full. > Upload thread should have a finally clause to close the socket connection. Or > better propagating the exception to its parent. This is true for other types > of exceptions also. > Another small issue in the GcsBufferedWriter.close(). It does not self > self.close to True. > reproduction: python -m apache_beam.examples.wordcount --output > gs://no-such-thing/ > Prints the exception but goes on forever. Ctrl + C breaks the main thread > shows where it got stuck. > Similarly reproducible on the service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)