[ 
https://issues.apache.org/jira/browse/BEAM-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

quentin lhoest closed BEAM-10022.
---------------------------------
    Fix Version/s: 2.22.0
       Resolution: Fixed

> [Python] Error with `WriteToParquet` with empty buffer
> ------------------------------------------------------
>
>                 Key: BEAM-10022
>                 URL: https://issues.apache.org/jira/browse/BEAM-10022
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-parquet
>    Affects Versions: 2.20.0
>            Reporter: quentin lhoest
>            Priority: P2
>             Fix For: 2.22.0
>
>
> While using `WriteToParquet` I encounter this issue
> {noformat}
> File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 
> 1066, in finish_bundle
>  self.writer.close(),
>  File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 423, in close
>  self.sink.close(self.temp_handle)
>  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", 
> line 538, in close
>  self._flush_buffer()
>  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", 
> line 570, in _flush_buffer
>  size = size + b.size
> AttributeError: 'NoneType' object has no attribute 'size'
> {noformat}
> This is because when instantiating an empty array `array=pa.array([])`, then 
> `array.buffers()` returns `[None]`. However right now `_flush_buffer` always 
> assume that buffers are not empty when incrementing the `size`.
> One simple fix would be simply to add `if b is not None:` before incrementing 
> `size`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to