westonpace commented on code in PR #35320: URL: https://github.com/apache/arrow/pull/35320#discussion_r1192739068
########## docs/source/cpp/acero/substrait.rst: ########## @@ -0,0 +1,248 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +.. default-domain:: cpp +.. highlight:: cpp +.. cpp:namespace:: arrow::engine::substrait + +.. _acero-substrait: + +========================== +Using Acero with Substrait +========================== + +In order to use Acero you will need to create an execution plan. This is the +model that describes the computation you want to apply to your data. Acero has +its own internal representation for execution plans but most users should not +interact with this directly as it will couple their code to Acero. + +`Substrait <https://substrait.io>`_ is an open standard for execution plans. +Acero implements the Substrait "consumer" interface. This means that Acero can +accept a Substrait plan and fulfill the plan, loading the requested data and +applying the desired computation. By using Substrait plans users can easily +switch out to a different execution engine at a later time. + +Substrait Conformance +--------------------- + +Substrait defines a broad set of operators and functions for many different +situations and it is unlikely that Acero will ever completely satisfy all +defined Substrait operators and functions. To help understand what features +are available the following sections define which features have been currently +implemented in Acero and any caveats that apply. + +Plans +^^^^^ + + * A plan should have a single top-level relation. + * The consumer is currently based on version 0.20.0 of Substrait. + Any features added that are newer will not be supported. + * Due to a breaking change in 0.20.0 any Substrait plan older than 0.20.0 + will be rejected. + +Extensions +^^^^^^^^^^ + + * If a plan contains any extension type variations it will be rejected. + * Advanced extensions can be provided by supplying a custom implementation of + :class:`arrow::engine::ExtensionProvider`. + +Relations (in general) +^^^^^^^^^^^^^^^^^^^^^^ + + * Any relation not explicitly listed below will not be supported + and will cause the plan to be rejected. + +Read Relations +^^^^^^^^^^^^^^ + + * The ``projection`` property is not supported and plans containing this + property will be rejected. + * The ``VirtualTable`` and ``ExtensionTable`` read types are not supported. + Plans containing these types will be rejected. + * Only the parquet and arrow file formats are currently supported. + * All URIs must use the ``file`` scheme + * ``partition_index``, ``start``, and ``length`` are not supported. Plans containing + non-default values for these properties will be rejected. + * The Substrait spec requires that a ``filter`` be completely satisfied by a read + relation. However, Acero only uses a read filter for pushdown projection and + it may not be fully satisfied. Users should generally attach an additional + filter relation with the same filter expression after the read relation. Review Comment: I think that's a fair point. Let's address in a follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
