[
https://issues.apache.org/jira/browse/BEAM-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
quentin lhoest closed BEAM-10022.
---------------------------------
Fix Version/s: 2.22.0
Resolution: Fixed
> [Python] Error with `WriteToParquet` with empty buffer
> ------------------------------------------------------
>
> Key: BEAM-10022
> URL: https://issues.apache.org/jira/browse/BEAM-10022
> Project: Beam
> Issue Type: Bug
> Components: io-py-parquet
> Affects Versions: 2.20.0
> Reporter: quentin lhoest
> Priority: P2
> Fix For: 2.22.0
>
>
> While using `WriteToParquet` I encounter this issue
> {noformat}
> File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line
> 1066, in finish_bundle
> self.writer.close(),
> File
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py",
> line 423, in close
> self.sink.close(self.temp_handle)
> File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py",
> line 538, in close
> self._flush_buffer()
> File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py",
> line 570, in _flush_buffer
> size = size + b.size
> AttributeError: 'NoneType' object has no attribute 'size'
> {noformat}
> This is because when instantiating an empty array `array=pa.array([])`, then
> `array.buffers()` returns `[None]`. However right now `_flush_buffer` always
> assume that buffers are not empty when incrementing the `size`.
> One simple fix would be simply to add `if b is not None:` before incrementing
> `size`
--
This message was sent by Atlassian Jira
(v8.3.4#803005)