[
https://issues.apache.org/jira/browse/ARROW-18370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yu Zhu reassigned ARROW-18370:
------------------------------
Assignee: (was: Yu Zhu)
Description:
`ds.write_dataset` allows specifying Parquet compression, for example:
```
import pandas as pd
import pyarrow as pa
import pyarrow.dataset as ds
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df = pa.Table.from_pandas(df)
ds.write_dataset(
df,
base_dir='test',
format='parquet',
file_options=ds.ParquetFileFormat().make_write_options(compression='snappy'))
```
However, such trick (the `file_options` argument) doesn't work for feather, as
the following code gives me an error:
```
import pandas as pd
import pyarrow as pa
import pyarrow.dataset as ds
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df = pa.Table.from_pandas(df)
ds.write_dataset(
df,
base_dir='test',
format='feather',
file_options=ds.FeatherFileFormat().make_write_options(compression='uncompressed'))
```
The error: `TypeError: FeatherFileFormat.make_write_options() takes no keyword
arguments`
was:
`ds.write_dataset` allows specifying Parquet compression, for example:
```
import pandas as pd
import pyarrow as pa
import pyarrow.dataset as ds
df = pd.DataFrame(\{'a': [1, 2, 3], 'b': [4, 5, 6]})
df = pa.Table.from_pandas(df)
ds.write_dataset(
df,
base_dir='test',
format='parquet',
file_options=ds.ParquetFileFormat().make_write_options(compression='snappy'))
```
However, such tick (the `file_options` argument) doesn't work for feather, as
the following code gives me an error:
```
import pandas as pd
import pyarrow as pa
import pyarrow.dataset as ds
df = pd.DataFrame(\{'a': [1, 2, 3], 'b': [4, 5, 6]})
df = pa.Table.from_pandas(df)
ds.write_dataset(
df,
base_dir='test',
format='feather',
file_options=ds.FeatherFileFormat().make_write_options(compression='uncompressed'))
```
The error: `TypeError: FeatherFileFormat.make_write_options() takes no keyword
arguments`
> [Python] `ds.write_dataset` doesn't allow feather compression
> -------------------------------------------------------------
>
> Key: ARROW-18370
> URL: https://issues.apache.org/jira/browse/ARROW-18370
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 10.0.0
> Environment: Ubuntu 22.04
> Reporter: Yu Zhu
> Priority: Major
>
> `ds.write_dataset` allows specifying Parquet compression, for example:
> ```
> import pandas as pd
> import pyarrow as pa
> import pyarrow.dataset as ds
> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
> df = pa.Table.from_pandas(df)
> ds.write_dataset(
> df,
> base_dir='test',
> format='parquet',
>
> file_options=ds.ParquetFileFormat().make_write_options(compression='snappy'))
> ```
> However, such trick (the `file_options` argument) doesn't work for feather,
> as the following code gives me an error:
> ```
> import pandas as pd
> import pyarrow as pa
> import pyarrow.dataset as ds
> df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
> df = pa.Table.from_pandas(df)
> ds.write_dataset(
> df,
> base_dir='test',
> format='feather',
>
> file_options=ds.FeatherFileFormat().make_write_options(compression='uncompressed'))
> ```
> The error: `TypeError: FeatherFileFormat.make_write_options() takes no
> keyword arguments`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)