[ 
https://issues.apache.org/jira/browse/ARROW-18063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-18063:
---------------------------------
    Component/s: C++

> [C++][Python] 
> --------------
>
>                 Key: ARROW-18063
>                 URL: https://issues.apache.org/jira/browse/ARROW-18063
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ben Kietzman
>            Priority: Major
>
> [Mailing list 
> thread|https://lists.apache.org/thread/r484sqrd6xjdd058prbrcwh3t5vg91so]
> The goal is to:
> - generate a substrait plan in Python using Ibis
> - ... wherein tables are specified using custom URLs
> - use the python API {{run_query}} to execute the plan
> - ... against source data which is *streamed* from those URLs rather than 
> pulled fully into local memory
> The obstacles include:
> - The API for constructing a data stream from the custom URLs is only 
> available in c++
> - The python {{run_query}} function requires tables as input and cannot 
> accept a RecordBatchReader even if one could be constructed from a custom URL
> - Writing custom cython is not preferred
> Some potential solutions:
> - Use ExecuteSerializedPlan() directly usable from c++ so that construction 
> of data sources need not be handled in python. Passing a buffer from 
> python/ibis down to C++ is much simpler and can be navigated without writing 
> cython
> - Refactor NamedTableProvider from a lambda mapping {{names -> data source}} 
> into a registry so that data source factories can be added from c++ then 
> referenced by name from python
> - Extend {{run_query}} to support non-Table sources and require the user to 
> write a python mapping from URLs to {{pa.RecordBatchReader}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to