[
https://issues.apache.org/jira/browse/ARROW-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487097#comment-17487097
]
Dewey Dunnington commented on ARROW-15317:
------------------------------------------
My use-case should definitely be revisited once I understand a bit more about
Substrait and Dataset!
> [R] Expose API to create Dataset from Fragments
> -----------------------------------------------
>
> Key: ARROW-15317
> URL: https://issues.apache.org/jira/browse/ARROW-15317
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 6.0.1
> Reporter: Will Jones
> Priority: Minor
>
> Third-party packages may define dataset factories for table formats like
> Delta Lake and Apache Iceberg. These formats store metadata like schema, file
> lists, and file-level statistics on the side, and can construct a dataset
> without a discovery process needed. Python exposed enough API to do this
> successfully for [a Delta Lake dataset reader
> here|https://github.com/delta-io/delta-rs/blob/6a8195d6e3cbdcb0c58a14a3ffccc472dd094de0/python/deltalake/table.py#L267-L280].
> I propose adding the following to the R API:
> * Expose {{Fragment}} as an R6 object
> * Add the {{MakeFragment}} method to various file format objects. It's key
> that {{partition_expression}} is included as an argument. ([See Python
> equivalent
> here|https://github.com/apache/arrow/blob/ab86daf3f7c8a67bee6a175a749575fd40417d27/python/pyarrow/_dataset_parquet.pyx#L209-L210])
> * Add a dataset constructor that takes a list of {{Fragments}}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)