[ 
https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche reassigned ARROW-8290:
--------------------------------------------

    Assignee: Joris Van den Bossche

> [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor
> -----------------------------------------------------------------------
>
>                 Key: ARROW-8290
>                 URL: https://issues.apache.org/jira/browse/ARROW-8290
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Major
>              Labels: dataset
>
> Currently, to manually create a FileSystemDataset, you can do something like:
> {code}
> dataset = ds.FileSystemDataset(
>         schema, None, ds.ParquetFileFormat(), pa.fs.LocalFileSystem(),
>         ["data_file1.parquet", "data_file2.parquet"],
>         [ds.field('file') == 1, ds.field('file') == 2])
> {code}
> There are some usibility improvements we can do though:
> - Allow passing the arguments by name to improve readability of the calling 
> code (now they all need to be passed positionally, due to the way they are 
> implemented in cython as {{not None}})
> - I would maybe change the order of the arguments (eg start with the paths, 
> we don't need to match the order of the C++ constructor)
> - Potentially allow {{partitions}} to be optional, in which case they need to 
> be set to a list of ScalarExpression(True) values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to