quentin lhoest created BEAM-10022:
-------------------------------------
Summary: [Python] Error with `WriteToParquet` with empty buffer
Key: BEAM-10022
URL: https://issues.apache.org/jira/browse/BEAM-10022
Project: Beam
Issue Type: Bug
Components: io-py-parquet
Affects Versions: 2.20.0
Reporter: quentin lhoest
While using `WriteToParquet` I encounter this issue
{noformat}
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line
1066, in finish_bundle
self.writer.close(),
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py",
line 423, in close
self.sink.close(self.temp_handle)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py",
line 538, in close
self._flush_buffer()
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py",
line 570, in _flush_buffer
size = size + b.size
AttributeError: 'NoneType' object has no attribute 'size'
{noformat}
This is because when instantiating an empty array `array=pa.array([])`, then
`array.buffers()` returns `[None]`. However right now `_flush_buffer` always
assume that buffers are not empty when incrementing the `size`.
One simple fix would be simply to add `if b is not None:` before incrementing
`size`
--
This message was sent by Atlassian Jira
(v8.3.4#803005)