Hello all,

As requested during community meetings, I have finalized the first draft of
the proposed protocol changes in 2 languages.

For Java the change is contained in a single repo:
https://github.com/apache/arrow-java/pull/1064 This applies the change in
the JDBC driver for *Statement.execute() *and *PreparedStatement.execute() *

Three PRs were necessary for Python ADBC Flight SQL, with the end goal of
fixing *cursor.execute()*:

   1. Making the change to the .proto:
   https://github.com/apache/arrow/pull/49498
   2. Exposing the new field in Prepared Statements in Go Flight SQL:
   https://github.com/apache/arrow-go/pull/732
   3. Exposing the new field from the Go ADBC Flight SQL driver (which the
   Python driver wraps), and using the new field in the Python driver:
   https://github.com/apache/arrow-adbc/pull/4161

I'm sharing this to get early feedback on the protocol change's direction.
The Java PR is ready for a full review. The ADBC PR needs some code quality
improvements (highlighted in the PR). I will address them if the consensus
supports continuing with the proposed change.

I tested both drivers end-to-end with server implementations both with and
without the change to ensure backwards compatibility (I used Dremio as a
backend).

Integration tests also exist in both Arrow-Java and Arrow-Go.

Let me know what you think.

Best,

Pedro




On Tue, Mar 10, 2026 at 10:32 PM Pedro Matias <[email protected]>
wrote:

> *Compatibility chart of driver implementations for APIs that support
> queries and updates in the same function*
>
> As Martin Prammer requested in the previous biweekly meeting, I analyzed
> the Arrow JDBC Flight SQL driver, the ODBC Flight SQL driver, and the ADBC
> Flight SQL drivers in Python and Go to list how current implementations
> decide whether to use CommandPreparedStatementQuery or
> CommandPreparedStatementUpdate for API method calls that allow both.
>
> All drivers except the JDBC driver use CommandPreparedStatementQuery for
> these methods, indicating a lack of DML support.
>
> Below are links to the implementations:
>
>
>    1. ODBC Flight SQL driver: For ODBC SQLExecute(), the driver always
>    uses CommandPreparedStatementQuery:
>    
> https://github.com/apache/arrow/blob/ca6845248b014db7131ba6dccec5f91b04b4543d/cpp/src/arrow/flight/sql/client.cc#L644
>    2. JDBC Flight SQL driver: For Statement.execute(), the driver uses
>    CommandPreparedStatementQuery if the dataset_schema of the
>    ActionCreatePreparedStatementResult is not empty, otherwise
>    CommandPreparedStatementUpdate:
>    
> https://github.com/apache/arrow-java/blob/7390f551267798d4670eae6b2894c527dbc90403/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java#L458
>
>    3. JDBC Flight SQL driver: For PreparedStatement.execute(), the last
>    released driver version, 18.3.0, always uses
>    CommandPreparedStatementQuery. There is a merged PR that uses the same
>    heuristic as Statement.execute(), see
>    https://github.com/apache/arrow-java/pull/811, which has not yet been
>    released
>    4. Python ADBC Flight SQL driver: DB-API 2.0 allows for both in
>    cursor.execute(), see https://peps.python.org/pep-0249/#id20 The
>    driver always uses CommandPreparedStatementQuery:
>    
> https://github.com/apache/arrow-adbc/blob/b0611a123166b1e3778e26258e75c8a46b0e903b/python/adbc_driver_manager/adbc_driver_manager/dbapi.py#L817
>
>    5. Go ADBC Flight SQL driver: I don't think there is any API method
>    that allows for both result set generating queries and result counts. Based
>    on the docs at
>    https://pkg.go.dev/github.com/apache/arrow-adbc/go/adbc#Statement, it
>    seems like Statement.ExecuteQuery() is only for result set generating
>    queries and Statement.ExecuteUpdate() is for updates.
>
> Let me know if I should expand this list with other implementations, I
> only checked the ones I am aware of.
>
> *Backward compatibility of the proposed change*
> The change being proposed (adding a boolean field to
> ActionCreatePreparedStatementResult to determine the network flow used by
> Flight SQL clients) is fully backward compatible. This follows directly
> from using a new proto3 optional field. See the section "Adding new
> fields is safe" in
> https://protobuf.dev/programming-guides/editions/?utm_source=chatgpt.com#wire-safe-changes
>
>
> For ease of understanding, I will outline the scenarios below:
>
> New client <-> Old server
> If a server does not set the new field that the client expects, the client
> can detect the field's absence (directly from the protobuf generated files)
> and follow the logic it previously used to determine the network flow. I
> have a draft PR in the JDBC driver that exemplifies this; I updated the
> driver but didn't change the server. All tests pass locally, and I tested
> it successfully end to end with a backend server that wasn't updated:
> https://github.com/apache/arrow-java/pull/1064
>
> Old client <-> New server
> A client implementation that receives an unknown field will merely ignore
> it during parsing.
>
> Best,
>
> Pedro
>
> On Tue, Mar 3, 2026 at 10:50 PM David Li <[email protected]> wrote:
>
>> Sounds good. I think it would also be reasonable to raise a PR with the
>> spec change for discussion as well.
>>
>> I would much prefer to not cram more things into existing endpoints, but
>> I suppose it's not clear to me if it's possible to fix that at this point.
>>
>> On Wed, Mar 4, 2026, at 04:38, Pedro Matias wrote:
>> > I agree with consolidating the two execution modes. I don't think these
>> > approaches are mutually exclusive: we can fix the current execution
>> split
>> > for correctness (which should be an easier and quicker fix) and
>> introduce a
>> > new consolidated endpoint to include row counts in query cases as well.
>> >
>> > Having the same endpoint allows us to use it for ad hoc queries, which
>> > reduces the number of roundtrips per query in Statement.execute().
>> >
>> > Do you intend to use DoExchange for this new endpoint?
>> >
>> > In the meantime I'm working on some action items raised in the last sync
>> > regarding my proposed fix. I will send an email highlighting backward
>> > compatibility concerns and the current status of the different drivers
>> > before the next meeting.
>> >
>> > Pedro
>> >
>> > On Thu, Feb 26, 2026 at 2:24 AM David Li <[email protected]> wrote:
>> >
>> >> It seems reasonable to me if you want to raise a pull request to
>> discuss,
>> >> but we could consider consolidating the two execution modes? API-wise I
>> >> feel it would be better to just have one endpoint and let the server
>> return
>> >> what is appropriate. (Also because interfaces like PEP 249 and
>> protocols
>> >> like Postgres's allow for a row count in both query and update cases,
>> >> albeit JDBC does not.)
>> >>
>> >> On Mon, Feb 23, 2026, at 10:17, Pedro Matias wrote:
>> >> > Hello all,
>> >> >
>> >> > @Hélder Gregório <[email protected]>  and I identified a
>> gap
>> >> > between common database API execution patterns and Arrow Flight SQL
>> >> > prepared statements. To address this, we propose adding an optional
>> >> boolean
>> >> > field to ActionCreatePreparedStatementResult.
>> >> > Background
>> >> >
>> >> > A common pattern in database APIs is:
>> >> >
>> >> >    1.
>> >> >
>> >> >    Create a prepared statement
>> >> >    2.
>> >> >
>> >> >    Execute the prepared statement, returning either a result set or
>> an
>> >> >    update count
>> >> >
>> >> > This pattern exists in:
>> >> >
>> >> >    -
>> >> >
>> >> >    *JDBC* (Connection.prepareStatement() +
>> PreparedStatement.execute())
>> >> >    -
>> >> >
>> >> >    *Python PEP 249* (both steps condensed in cursor.execute())
>> >> >    -
>> >> >
>> >> >    *ODBC* (SQLPrepare() + SQLExecute())
>> >> >
>> >> > In Arrow Flight SQL, there are two mutually exclusive communication
>> paths
>> >> > for executing prepared statements. Both begin with
>> >> > ActionCreatePreparedStatementRequest, after which the client must
>> choose
>> >> > between:
>> >> >
>> >> >    -
>> >> >
>> >> >    CommandPreparedStatementQuery (returns a result set), or
>> >> >    -
>> >> >
>> >> >    CommandPreparedStatementUpdate (returns an update count).
>> >> >
>> >> > (For simplicity, we ignore parameter binding here.)
>> >> >
>> >> > The issue is that ActionCreatePreparedStatementResult, returned by
>> the
>> >> > server in the first call, does not contain information indicating
>> which
>> >> > execution path the client should take.
>> >> >
>> >> > *Proposal*
>> >> >
>> >> > We propose adding the following field to
>> >> ActionCreatePreparedStatementResult
>> >> > :
>> >> >
>> >> > optional bool is_update = 4;
>> >> >
>> >> >
>> >> >    -
>> >> >
>> >> >    true → clients should use CommandPreparedStatementUpdate
>> >> >    -
>> >> >
>> >> >    false → clients should use CommandPreparedStatementQuery
>> >> >
>> >> > This makes the intended execution path explicit.
>> >> >
>> >> > The behavior of clients when the server does not set this field is
>> >> outside
>> >> > the scope of this proposal, though discussion is welcome. We would be
>> >> happy
>> >> > to open follow-up PRs to standardize client behavior across drivers
>> if
>> >> > desired.
>> >> > Current state of driver implementations
>> >> >
>> >> >    -
>> >> >
>> >> >    The Arrow Flight SQL JDBC driver uses a heuristic to choose the
>> >> >    execution path:
>> >> >    https://github.com/apache/arrow-java/issues/797
>> >> >    <
>> >> https://github.com/apache/arrow-java/issues/797?utm_source=chatgpt.com
>> >
>> >> >    -
>> >> >
>> >> >    The PEP 249 Python Flight SQL driver (in ADBC) always uses
>> >> >    CommandPreparedStatementQuery in cursor.execute().
>> >> >
>> >> > We believe making the execution path explicit improves protocol
>> >> > completeness and alignment with widely used database APIs.
>> >> >
>> >> > Let us know your thoughts.
>> >> >
>> >> > Best,
>> >> > Pedro Matias
>> >>
>>
>

Reply via email to