[
https://issues.apache.org/jira/browse/ARROW-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Thøgersen updated ARROW-15484:
-------------------------------------
Description:
When supplying `kwargs` such as `basename_template` or `existing_data_behavior`
to `pyarrow.parquet.write_to_dataset()`, it fails as below.
{code:python}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
df = pd.DataFrame({
'int': [1, 2],
'str': ['a', 'b']
})
table = pa.Table.from_pandas(df)
"""
**kwargs : dict,
Additional kwargs for write_table function. See docstring for write_table
or ParquetWriter for more information.
"""
pq.write_to_dataset(table, root_path='foo',
use_legacy_dataset=False,
# kwargs:
basename_template="prefix-{i}.parquet",
existing_data_behavior="error"
)
{code}
{noformat}
TypeError Traceback (most recent call last)
...test.py in <module>
16 Additional kwargs for write_table function. See docstring for
write_table or ParquetWriter for more information.
17 """
---> 18 pq.write_to_dataset(table, root_path='foo',
19 use_legacy_dataset=False,
20 # kwargs:
...lib/python3.8/site-packages/pyarrow/parquet.py in write_to_dataset(table,
root_path, partition_cols, partition_filename_cb, filesystem,
use_legacy_dataset, **kwargs)
2144 # map format arguments
2145 parquet_format = ds.ParquetFileFormat()
-> 2146 write_options = parquet_format.make_write_options(**kwargs)
2147
2148 # map old filesystems to new one
...lib/python3.8/site-packages/pyarrow/_dataset.pyx in
pyarrow._dataset.ParquetFileFormat.make_write_options()
...lib/python3.8/site-packages/pyarrow/_dataset.pyx in
pyarrow._dataset.ParquetFileWriteOptions.update()
TypeError: unexpected parquet write option: basename_template
{noformat}
was:
When supplying `kwargs` such as `basename_template` or
`existing_data_behaviour` to `pyarrow.parquet.write_to_dataset()`, it fails as
below.
{code:python}
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
df = pd.DataFrame({
'int': [1, 2],
'str': ['a', 'b']
})
table = pa.Table.from_pandas(df)
"""
**kwargs : dict,
Additional kwargs for write_table function. See docstring for write_table
or ParquetWriter for more information.
"""
pq.write_to_dataset(table, root_path='foo',
use_legacy_dataset=False,
# kwargs:
basename_template="prefix-{i}.parquet",
existing_data_behaviour="error"
)
{code}
{noformat}
TypeError Traceback (most recent call last)
...test.py in <module>
16 Additional kwargs for write_table function. See docstring for
write_table or ParquetWriter for more information.
17 """
---> 18 pq.write_to_dataset(table, root_path='foo',
19 use_legacy_dataset=False,
20 # kwargs:
...lib/python3.8/site-packages/pyarrow/parquet.py in write_to_dataset(table,
root_path, partition_cols, partition_filename_cb, filesystem,
use_legacy_dataset, **kwargs)
2144 # map format arguments
2145 parquet_format = ds.ParquetFileFormat()
-> 2146 write_options = parquet_format.make_write_options(**kwargs)
2147
2148 # map old filesystems to new one
...lib/python3.8/site-packages/pyarrow/_dataset.pyx in
pyarrow._dataset.ParquetFileFormat.make_write_options()
...lib/python3.8/site-packages/pyarrow/_dataset.pyx in
pyarrow._dataset.ParquetFileWriteOptions.update()
TypeError: unexpected parquet write option: basename_template
{noformat}
> [Python] kwargs fails for pyarrow.parquet.write_to_dataset()
> ------------------------------------------------------------
>
> Key: ARROW-15484
> URL: https://issues.apache.org/jira/browse/ARROW-15484
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet, Python
> Affects Versions: 6.0.1
> Reporter: Martin Thøgersen
> Priority: Major
>
> When supplying `kwargs` such as `basename_template` or
> `existing_data_behavior` to `pyarrow.parquet.write_to_dataset()`, it fails as
> below.
>
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pyarrow.dataset as ds
> df = pd.DataFrame({
> 'int': [1, 2],
> 'str': ['a', 'b']
> })
> table = pa.Table.from_pandas(df)
> """
> **kwargs : dict,
> Additional kwargs for write_table function. See docstring for write_table
> or ParquetWriter for more information.
> """
> pq.write_to_dataset(table, root_path='foo',
> use_legacy_dataset=False,
> # kwargs:
> basename_template="prefix-{i}.parquet",
> existing_data_behavior="error"
> )
> {code}
> {noformat}
> TypeError Traceback (most recent call last)
> ...test.py in <module>
> 16 Additional kwargs for write_table function. See docstring for
> write_table or ParquetWriter for more information.
> 17 """
> ---> 18 pq.write_to_dataset(table, root_path='foo',
> 19 use_legacy_dataset=False,
> 20 # kwargs:
> ...lib/python3.8/site-packages/pyarrow/parquet.py in write_to_dataset(table,
> root_path, partition_cols, partition_filename_cb, filesystem,
> use_legacy_dataset, **kwargs)
> 2144 # map format arguments
> 2145 parquet_format = ds.ParquetFileFormat()
> -> 2146 write_options = parquet_format.make_write_options(**kwargs)
> 2147
> 2148 # map old filesystems to new one
> ...lib/python3.8/site-packages/pyarrow/_dataset.pyx in
> pyarrow._dataset.ParquetFileFormat.make_write_options()
> ...lib/python3.8/site-packages/pyarrow/_dataset.pyx in
> pyarrow._dataset.ParquetFileWriteOptions.update()
> TypeError: unexpected parquet write option: basename_template
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)