Joris Van den Bossche created ARROW-10482:
---------------------------------------------
Summary: [Python] Specifying compression type on a column basis
when writing Parquet not working
Key: ARROW-10482
URL: https://issues.apache.org/jira/browse/ARROW-10482
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Joris Van den Bossche
>From
>https://stackoverflow.com/questions/64666270/using-per-column-compression-codec-in-parquet-write-table
According to the docs, you can specify the compression type on a
column-by-column basis, but that doesn't seem to be working:
{code}
In [5]: table = pa.table([[1, 2], [3, 4], [5, 6]], names=["foo", "bar", "baz"])
In [6]: pq.write_table(table, 'test1.parquet',
compression=dict(foo='zstd',bar='snappy',baz='brotli'))
...
~/scipy/repos/arrow/python/pyarrow/_parquet.cpython-37m-x86_64-linux-gnu.so in
string.from_py.__pyx_convert_string_from_py_std__in_string()
TypeError: expected bytes, str found
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)