[ 
https://issues.apache.org/jira/browse/BEAM-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368449#comment-15368449
 ] 

Ahmet Altay commented on BEAM-391:
----------------------------------

Another type of Exception that result in the same behavior:

Exception in thread Thread-10:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner 
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run 
self.__target(*self.__args, **self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 
160, in wrapper return fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line 
563, in _start_upload self.client.objects.Insert(self.insert_request, 
upload=self.upload)
File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/internal/clients/storage/storage_v1_client.py",
 line 970, in Insertdownload=download)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", 
line 687, in _RunMethodhttp_request, client=self.client)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/transfer.py", 
line 838, in InitializeUploadretries=self.num_retries)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", 
line 351, in MakeRequestmax_retry_wait, total_wait_sec))
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", 
line 341, in MakeRequestcheck_response_func=check_response_func)
File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/http_wrapper.py", 
line 391, in _MakeRequestNoRetry redirections=redirections, 
connection_type=connection_type)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 616, 
in new_request self._refresh(request_orig)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/auth.py", 
line 90, in _refresh token_data = json.loads(urllib2.urlopen(req).read())
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen return 
opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 431, in open response = 
self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 449, in _open '_open', req)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain result = 
func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open return 
self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open raise URLError(err)

Error is coming from auth.py _refresh(). That may require retries based on the 
type of error.

> Exceptions in gcsio upload thread causes pipeline to stall
> ----------------------------------------------------------
>
>                 Key: BEAM-391
>                 URL: https://issues.apache.org/jira/browse/BEAM-391
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Ahmet Altay
>
> gcsio got stuck with invalid bucket name
> GcsBufferedWriter._start_upload (gcsio.py) raises an exception if the bucket 
> does not exist. This causes upload thread to silenty fail. It logs exception 
> to the log but this does not stop the pipeline or closes the receiving end of 
> the multiprocessing.Pipe(). Later a call in to write() blocks at 
> self.conn.send_bytes(). Note that send may block if the buffer is full.
> Upload thread should have a finally clause to close the socket connection. Or 
> better propagating the exception to its parent. This is true for other types 
> of exceptions also.
> Another small issue in the GcsBufferedWriter.close(). It does not self 
> self.close to True.
> reproduction: python -m apache_beam.examples.wordcount --output 
> gs://no-such-thing/
> Prints the exception but goes on forever. Ctrl + C breaks the main thread 
> shows where it got stuck.
> Similarly reproducible on the service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to