[ 
https://issues.apache.org/jira/browse/BEAM-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu resolved BEAM-6154.
----------------------------
       Resolution: Fixed
    Fix Version/s: 2.11.0

> Gcsio batch delete broken in Python 3
> -------------------------------------
>
>                 Key: BEAM-6154
>                 URL: https://issues.apache.org/jira/browse/BEAM-6154
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Mark Liu
>            Assignee: Mark Liu
>            Priority: Major
>             Fix For: 2.11.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio 
> error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1077, in <genexpr>
>     window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 315, in finalize_write
>     num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", 
> line 145, in run_using_threadpool
>     return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
>     return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
>     raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
>     result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
>     return list(map(*args))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 299, in _rename_batch
>     FileSystems.rename(source_files, destination_files)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 252, in rename
>     return filesystem.rename(source_file_names, destination_file_names)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", 
> line 229, in rename
>     copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 322, in copy_batch
>     api_calls = batch_request.Execute(self.client._http)  # pylint: 
> disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 222, in Execute
>     batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 480, in Execute
>     self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", 
> line 450, in _Execute
>     mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content 
> that's returned via http request to gcs is bytes and apitools didn't handle 
> this scenario. This can be a blocker to any pipeline depending on gcsio and 
> apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in 
> [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to