[
https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche reassigned ARROW-8290:
--------------------------------------------
Assignee: Joris Van den Bossche
> [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor
> -----------------------------------------------------------------------
>
> Key: ARROW-8290
> URL: https://issues.apache.org/jira/browse/ARROW-8290
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Labels: dataset
>
> Currently, to manually create a FileSystemDataset, you can do something like:
> {code}
> dataset = ds.FileSystemDataset(
> schema, None, ds.ParquetFileFormat(), pa.fs.LocalFileSystem(),
> ["data_file1.parquet", "data_file2.parquet"],
> [ds.field('file') == 1, ds.field('file') == 2])
> {code}
> There are some usibility improvements we can do though:
> - Allow passing the arguments by name to improve readability of the calling
> code (now they all need to be passed positionally, due to the way they are
> implemented in cython as {{not None}})
> - I would maybe change the order of the arguments (eg start with the paths,
> we don't need to match the order of the C++ constructor)
> - Potentially allow {{partitions}} to be optional, in which case they need to
> be set to a list of ScalarExpression(True) values.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)