[ 
https://issues.apache.org/jira/browse/ARROW-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470951#comment-17470951
 ] 

QP Hou commented on ARROW-14730:
--------------------------------

Sounds good to me Will. Please feel free to ping me if you need anything.

> [C++][R][Python] Support reading from Delta Lake tables
> -------------------------------------------------------
>
>                 Key: ARROW-14730
>                 URL: https://issues.apache.org/jira/browse/ARROW-14730
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Will Jones
>            Priority: Major
>
> [Delta Lake|https://delta.io/] is a parquet table format that supports ACID 
> transactions. It's popularized by Databricks, which uses it as the default 
> table format in their platform. Previously, it's only been readable from 
> Spark, but now there is an effort in 
> [delta-rs|https://github.com/delta-io/delta-rs] to make it accessible from 
> elsewhere. There is already some integration with DataFusion (see: 
> https://github.com/apache/arrow-datafusion/issues/525).
> There does already exist [a method to read Delta Lake tables into Arrow 
> tables in 
> Python|https://delta-io.github.io/delta-rs/python/api_reference.html#deltalake.table.DeltaTable.to_pyarrow_table]
>  in the delta-rs Python bindings. This includes filtering by partitions.
> Is there a good way we could integrate this functionality with Arrow C++ 
> Dataset and expose that in Python and R? Would that be something that should 
> be implemented in Arrow libraries or in delta-rs?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to