[jira] [Commented] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

Ben Kietzman (Jira) Tue, 31 Mar 2020 07:55:18 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071858#comment-17071858
 ]


Ben Kietzman commented on ARROW-8290:
-------------------------------------

Small amenity: if an empty vector is passed for {{partitions}} we will populate 
it with {{scalar(true)}} automatically

> [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor
> -----------------------------------------------------------------------
>
>                 Key: ARROW-8290
>                 URL: https://issues.apache.org/jira/browse/ARROW-8290
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: dataset
>
> Currently, to manually create a FileSystemDataset, you can do something like:
> {code}
> dataset = ds.FileSystemDataset(
>         schema, None, ds.ParquetFileFormat(), pa.fs.LocalFileSystem(),
>         ["data_file1.parquet", "data_file2.parquet"],
>         [ds.field('file') == 1, ds.field('file') == 2])
> {code}
> There are some usibility improvements we can do though:
> - Allow passing the arguments by name to improve readability of the calling 
> code (now they all need to be passed positionally, due to the way they are 
> implemented in cython as {{not None}})
> - I would maybe change the order of the arguments (eg start with the paths, 
> we don't need to match the order of the C++ constructor)
> - Potentially allow {{partitions}} to be optional, in which case they need to 
> be set to a list of ScalarExpression(True) values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

Reply via email to