[
https://issues.apache.org/jira/browse/ARROW-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17486786#comment-17486786
]
Weston Pace commented on ARROW-15317:
-------------------------------------
[~willjones127] makes some good points regarding a table format probably being
overkill for this problem. Also, I do agree the use case is valid.
Mostly I'm just trying to keep the number of specs as low as possible. Is it
possible that Substrait alone an answer for this? It sounds like the key thing
missing is the ability to attach a partition expression (i.e. guarantee) to a
piece of input data. I wonder if we could add that into Substrait's "in memory
table" spec or something. Or am I still missing something?
> [R] Expose API to create Dataset from Fragments
> -----------------------------------------------
>
> Key: ARROW-15317
> URL: https://issues.apache.org/jira/browse/ARROW-15317
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 6.0.1
> Reporter: Will Jones
> Priority: Minor
>
> Third-party packages may define dataset factories for table formats like
> Delta Lake and Apache Iceberg. These formats store metadata like schema, file
> lists, and file-level statistics on the side, and can construct a dataset
> without a discovery process needed. Python exposed enough API to do this
> successfully for [a Delta Lake dataset reader
> here|https://github.com/delta-io/delta-rs/blob/6a8195d6e3cbdcb0c58a14a3ffccc472dd094de0/python/deltalake/table.py#L267-L280].
> I propose adding the following to the R API:
> * Expose {{Fragment}} as an R6 object
> * Add the {{MakeFragment}} method to various file format objects. It's key
> that {{partition_expression}} is included as an argument. ([See Python
> equivalent
> here|https://github.com/apache/arrow/blob/ab86daf3f7c8a67bee6a175a749575fd40417d27/python/pyarrow/_dataset_parquet.pyx#L209-L210])
> * Add a dataset constructor that takes a list of {{Fragments}}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)