Hello all, As requested during community meetings, I have finalized the first draft of the proposed protocol changes in 2 languages.
For Java the change is contained in a single repo: https://github.com/apache/arrow-java/pull/1064 This applies the change in the JDBC driver for *Statement.execute() *and *PreparedStatement.execute() * Three PRs were necessary for Python ADBC Flight SQL, with the end goal of fixing *cursor.execute()*: 1. Making the change to the .proto: https://github.com/apache/arrow/pull/49498 2. Exposing the new field in Prepared Statements in Go Flight SQL: https://github.com/apache/arrow-go/pull/732 3. Exposing the new field from the Go ADBC Flight SQL driver (which the Python driver wraps), and using the new field in the Python driver: https://github.com/apache/arrow-adbc/pull/4161 I'm sharing this to get early feedback on the protocol change's direction. The Java PR is ready for a full review. The ADBC PR needs some code quality improvements (highlighted in the PR). I will address them if the consensus supports continuing with the proposed change. I tested both drivers end-to-end with server implementations both with and without the change to ensure backwards compatibility (I used Dremio as a backend). Integration tests also exist in both Arrow-Java and Arrow-Go. Let me know what you think. Best, Pedro On Tue, Mar 10, 2026 at 10:32 PM Pedro Matias <[email protected]> wrote: > *Compatibility chart of driver implementations for APIs that support > queries and updates in the same function* > > As Martin Prammer requested in the previous biweekly meeting, I analyzed > the Arrow JDBC Flight SQL driver, the ODBC Flight SQL driver, and the ADBC > Flight SQL drivers in Python and Go to list how current implementations > decide whether to use CommandPreparedStatementQuery or > CommandPreparedStatementUpdate for API method calls that allow both. > > All drivers except the JDBC driver use CommandPreparedStatementQuery for > these methods, indicating a lack of DML support. > > Below are links to the implementations: > > > 1. ODBC Flight SQL driver: For ODBC SQLExecute(), the driver always > uses CommandPreparedStatementQuery: > > https://github.com/apache/arrow/blob/ca6845248b014db7131ba6dccec5f91b04b4543d/cpp/src/arrow/flight/sql/client.cc#L644 > 2. JDBC Flight SQL driver: For Statement.execute(), the driver uses > CommandPreparedStatementQuery if the dataset_schema of the > ActionCreatePreparedStatementResult is not empty, otherwise > CommandPreparedStatementUpdate: > > https://github.com/apache/arrow-java/blob/7390f551267798d4670eae6b2894c527dbc90403/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java#L458 > > 3. JDBC Flight SQL driver: For PreparedStatement.execute(), the last > released driver version, 18.3.0, always uses > CommandPreparedStatementQuery. There is a merged PR that uses the same > heuristic as Statement.execute(), see > https://github.com/apache/arrow-java/pull/811, which has not yet been > released > 4. Python ADBC Flight SQL driver: DB-API 2.0 allows for both in > cursor.execute(), see https://peps.python.org/pep-0249/#id20 The > driver always uses CommandPreparedStatementQuery: > > https://github.com/apache/arrow-adbc/blob/b0611a123166b1e3778e26258e75c8a46b0e903b/python/adbc_driver_manager/adbc_driver_manager/dbapi.py#L817 > > 5. Go ADBC Flight SQL driver: I don't think there is any API method > that allows for both result set generating queries and result counts. Based > on the docs at > https://pkg.go.dev/github.com/apache/arrow-adbc/go/adbc#Statement, it > seems like Statement.ExecuteQuery() is only for result set generating > queries and Statement.ExecuteUpdate() is for updates. > > Let me know if I should expand this list with other implementations, I > only checked the ones I am aware of. > > *Backward compatibility of the proposed change* > The change being proposed (adding a boolean field to > ActionCreatePreparedStatementResult to determine the network flow used by > Flight SQL clients) is fully backward compatible. This follows directly > from using a new proto3 optional field. See the section "Adding new > fields is safe" in > https://protobuf.dev/programming-guides/editions/?utm_source=chatgpt.com#wire-safe-changes > > > For ease of understanding, I will outline the scenarios below: > > New client <-> Old server > If a server does not set the new field that the client expects, the client > can detect the field's absence (directly from the protobuf generated files) > and follow the logic it previously used to determine the network flow. I > have a draft PR in the JDBC driver that exemplifies this; I updated the > driver but didn't change the server. All tests pass locally, and I tested > it successfully end to end with a backend server that wasn't updated: > https://github.com/apache/arrow-java/pull/1064 > > Old client <-> New server > A client implementation that receives an unknown field will merely ignore > it during parsing. > > Best, > > Pedro > > On Tue, Mar 3, 2026 at 10:50 PM David Li <[email protected]> wrote: > >> Sounds good. I think it would also be reasonable to raise a PR with the >> spec change for discussion as well. >> >> I would much prefer to not cram more things into existing endpoints, but >> I suppose it's not clear to me if it's possible to fix that at this point. >> >> On Wed, Mar 4, 2026, at 04:38, Pedro Matias wrote: >> > I agree with consolidating the two execution modes. I don't think these >> > approaches are mutually exclusive: we can fix the current execution >> split >> > for correctness (which should be an easier and quicker fix) and >> introduce a >> > new consolidated endpoint to include row counts in query cases as well. >> > >> > Having the same endpoint allows us to use it for ad hoc queries, which >> > reduces the number of roundtrips per query in Statement.execute(). >> > >> > Do you intend to use DoExchange for this new endpoint? >> > >> > In the meantime I'm working on some action items raised in the last sync >> > regarding my proposed fix. I will send an email highlighting backward >> > compatibility concerns and the current status of the different drivers >> > before the next meeting. >> > >> > Pedro >> > >> > On Thu, Feb 26, 2026 at 2:24 AM David Li <[email protected]> wrote: >> > >> >> It seems reasonable to me if you want to raise a pull request to >> discuss, >> >> but we could consider consolidating the two execution modes? API-wise I >> >> feel it would be better to just have one endpoint and let the server >> return >> >> what is appropriate. (Also because interfaces like PEP 249 and >> protocols >> >> like Postgres's allow for a row count in both query and update cases, >> >> albeit JDBC does not.) >> >> >> >> On Mon, Feb 23, 2026, at 10:17, Pedro Matias wrote: >> >> > Hello all, >> >> > >> >> > @Hélder Gregório <[email protected]> and I identified a >> gap >> >> > between common database API execution patterns and Arrow Flight SQL >> >> > prepared statements. To address this, we propose adding an optional >> >> boolean >> >> > field to ActionCreatePreparedStatementResult. >> >> > Background >> >> > >> >> > A common pattern in database APIs is: >> >> > >> >> > 1. >> >> > >> >> > Create a prepared statement >> >> > 2. >> >> > >> >> > Execute the prepared statement, returning either a result set or >> an >> >> > update count >> >> > >> >> > This pattern exists in: >> >> > >> >> > - >> >> > >> >> > *JDBC* (Connection.prepareStatement() + >> PreparedStatement.execute()) >> >> > - >> >> > >> >> > *Python PEP 249* (both steps condensed in cursor.execute()) >> >> > - >> >> > >> >> > *ODBC* (SQLPrepare() + SQLExecute()) >> >> > >> >> > In Arrow Flight SQL, there are two mutually exclusive communication >> paths >> >> > for executing prepared statements. Both begin with >> >> > ActionCreatePreparedStatementRequest, after which the client must >> choose >> >> > between: >> >> > >> >> > - >> >> > >> >> > CommandPreparedStatementQuery (returns a result set), or >> >> > - >> >> > >> >> > CommandPreparedStatementUpdate (returns an update count). >> >> > >> >> > (For simplicity, we ignore parameter binding here.) >> >> > >> >> > The issue is that ActionCreatePreparedStatementResult, returned by >> the >> >> > server in the first call, does not contain information indicating >> which >> >> > execution path the client should take. >> >> > >> >> > *Proposal* >> >> > >> >> > We propose adding the following field to >> >> ActionCreatePreparedStatementResult >> >> > : >> >> > >> >> > optional bool is_update = 4; >> >> > >> >> > >> >> > - >> >> > >> >> > true → clients should use CommandPreparedStatementUpdate >> >> > - >> >> > >> >> > false → clients should use CommandPreparedStatementQuery >> >> > >> >> > This makes the intended execution path explicit. >> >> > >> >> > The behavior of clients when the server does not set this field is >> >> outside >> >> > the scope of this proposal, though discussion is welcome. We would be >> >> happy >> >> > to open follow-up PRs to standardize client behavior across drivers >> if >> >> > desired. >> >> > Current state of driver implementations >> >> > >> >> > - >> >> > >> >> > The Arrow Flight SQL JDBC driver uses a heuristic to choose the >> >> > execution path: >> >> > https://github.com/apache/arrow-java/issues/797 >> >> > < >> >> https://github.com/apache/arrow-java/issues/797?utm_source=chatgpt.com >> > >> >> > - >> >> > >> >> > The PEP 249 Python Flight SQL driver (in ADBC) always uses >> >> > CommandPreparedStatementQuery in cursor.execute(). >> >> > >> >> > We believe making the execution path explicit improves protocol >> >> > completeness and alignment with widely used database APIs. >> >> > >> >> > Let us know your thoughts. >> >> > >> >> > Best, >> >> > Pedro Matias >> >> >> >
