Weston Pace created ARROW-17521:
-----------------------------------
Summary: [Python] Add python bindings for NamedTableProvider for
Substrait consumer
Key: ARROW-17521
URL: https://issues.apache.org/jira/browse/ARROW-17521
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Weston Pace
The C++ Substrait consumer currently supports a named table provider to handle
the NamedTable relation:
{noformat}
using NamedTableProvider =
std::function<Result<compute::Declaration>(const
std::vector<std::string>&)>;
static NamedTableProvider kDefaultNamedTableProvider;
/// Options that control the conversion between Substrait and Acero
representations of a
/// plan.
struct ConversionOptions {
/// \brief How strictly the converter should adhere to the structure of the
input.
ConversionStrictness strictness = ConversionStrictness::BEST_EFFORT;
/// \brief A custom strategy to be used for providing named tables
///
/// The default behavior will return an invalid status if the plan has any
/// named table relations.
NamedTableProvider named_table_provider = kDefaultNamedTableProvider;
};
{noformat}
This is very useful for testing and experimenting as it allows you to provide
tables from memory (using a table_source node for example). We should add
pyarrow bindings. I don't think they need to expose the full
compute::DeclarationInfo range of table sources. A simple approach might be a
function that, given a list of names, returns either a table, an iterable of
batches, or a record batch reader.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)