[
https://issues.apache.org/jira/browse/ARROW-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460970#comment-17460970
]
Weston Pace commented on ARROW-15135:
-------------------------------------
So off the top of my head I think this integration might take the form of a
dataset factory:
The dataset factory would, given an iceberg table, consult the iceberg
metadata. From that metadata we can get:
- The list of files
- The format of the files (Parquet vs Orc)
- The partitioning scheme
- Potentially the filesystem?
We could then take those three things and create an ordinary FileSystemDataset.
Alternatively, we could create an IcebergDataset and IcebergFragment but I'm
not sure there would be anything to gain by doing so.
> [C++][R][Python] Support reading from Apache Iceberg tables
> -----------------------------------------------------------
>
> Key: ARROW-15135
> URL: https://issues.apache.org/jira/browse/ARROW-15135
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Will Jones
> Priority: Major
>
> This is an umbrella issue for supporting the [Apache Iceberg table
> format|https://iceberg.apache.org/].
> Dremio has a good overview of the format here:
> https://www.dremio.com/apache-iceberg-an-architectural-look-under-the-covers/
--
This message was sent by Atlassian Jira
(v8.20.1#820001)