[
https://issues.apache.org/jira/browse/BEAM-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826278#comment-16826278
]
Valentyn Tymofieiev edited comment on BEAM-7137 at 4/25/19 5:51 PM:
--------------------------------------------------------------------
[~yoshiki.obata], thanks a lot for trying out Beam on Python 3 and reporting
this!
We have to either encode header to bytes
[here|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L393],
or require header to be bytes in the first place. Looking through the IO
codebase, it seems that we assume header to be a string in quite a few places
starting from [WriteToText
PTransform|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L599],
so encoding header to bytes may be the path of least resistance. We can revise
this if we find a strong reason to require header to be bytes.
cc: [~chamikara]
Also, we should find out why this was not caught by our postcommit integration
tests, and improve test coverage so that we have confidence that that both
write and read path work correctly.
cc: [~Juta]
was (Author: tvalentyn):
[~yoshiki.obata], thanks a lot for trying out Beam on Python 3 and reporting
this!
We have to either encode header to bytes
[here|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L393],
or require header to be bytes in the first place. Looking through the IO
codebase, it seems that we assume header to be a string in quite a few places
starting from [WriteToText
PTransform|https://github.com/apache/beam/blob/2cb44a81b258a64544fd8ca387305b2d5ccce13b/sdks/python/apache_beam/io/textio.py#L599],
so encoding header to bytes may be the past of least resistance. We can revise
this if we find a strong reason to require header to be bytes.
cc: [~chamikara]
Also, we should find out why this was not caught by our postcommit integration
tests, and improve test coverage so that we have confidence that that both
write and read path work correctly.
cc: [~Juta]
> TypeError caused by using str variable as header argument in
> apache_beam.io.textio.WriteToText
> ----------------------------------------------------------------------------------------------
>
> Key: BEAM-7137
> URL: https://issues.apache.org/jira/browse/BEAM-7137
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Affects Versions: 2.11.0
> Environment: Python 3.5.6
> macOS Mojave 10.14.4
> Reporter: yoshiki obata
> Assignee: yoshiki obata
> Priority: Major
>
> Using str header to apache_beam.io.textio.WriteToText as argument cause
> TypeError with Python 3.5.6 - or maybe higher - despite docstring says header
> is str.
> This error occurred by writing header to file without encoding to bytes at
> apache_beam.io.textio._TextSink.open.
>
> {code:java}
> Traceback (most recent call last):
> File "apache_beam/runners/common.py", line 727, in
> apache_beam.runners.common.DoFnRunner.process
> File "apache_beam/runners/common.py", line 555, in
> apache_beam.runners.common.PerWindowInvoker.invoke_process
> File "apache_beam/runners/common.py", line 625, in
> apache_beam.runners.common.PerWindowInvoker._invoke_per_window
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/iobase.py",
> line 1033, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
> line 137, in _f
> return fnc(self, *args, **kwargs)
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py",
> line 185, in open_writer
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py",
> line 389, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
> File
> "/Users/yob/.local/share/virtualenvs/test/lib/python3.5/site-packages/apache_beam/io/textio.py",
> line 393, in open
> file_handle.write(self._header)
> TypeError: a bytes-like object is required, not 'str'
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)