tokoko commented on code in PR #4317:
URL: https://github.com/apache/arrow-adbc/pull/4317#discussion_r3322056269
##########
c/include/arrow-adbc/adbc.h:
##########
@@ -1946,6 +1989,235 @@ AdbcStatusCode AdbcConnectionReadPartition(struct
AdbcConnection* connection,
/// @}
+/// \defgroup adbc-connection-ingest-partition Partitioned Bulk Ingest
+/// @{
+
+/// \brief Driver-owned bytes returned by
+/// AdbcConnectionBeginIngestPartitions.
+///
+/// The bytes are opaque and serializable: the caller may copy
+/// `bytes[0..length)` and ship that copy to workers (other processes
+/// or hosts) which can pass it to AdbcConnectionWriteIngestPartition
+/// directly.
+///
+/// The struct itself is owned by the driver. Call `release` exactly
+/// once to free it. Releasing the handle does NOT roll back the
+/// ingest — call AdbcConnectionAbortIngestPartitions for that.
+///
+/// \since ADBC API revision 1.2.0
+struct AdbcIngestHandle {
+ /// \brief The length of `bytes`.
+ size_t length;
+
+ /// \brief The serialized handle bytes (driver-owned).
+ const uint8_t* bytes;
+
+ /// \brief Private driver state.
+ void* private_data;
+
+ /// \brief Release the handle's memory. Sets `release` to NULL.
+ void (*release)(struct AdbcIngestHandle* self);
+};
+
+/// \brief Driver-owned bytes returned by
+/// AdbcConnectionWriteIngestPartition.
+///
+/// Mirror of AdbcIngestHandle: opaque, serializable, single-use
+/// `release`. Releasing a receipt does NOT discard the underlying
+/// write; that happens at Commit (commit it) or Abort (drop it).
+///
+/// \since ADBC API revision 1.2.0
+struct AdbcIngestReceipt {
+ /// \brief The length of `bytes`.
+ size_t length;
+
+ /// \brief The serialized receipt bytes (driver-owned).
+ const uint8_t* bytes;
+
+ /// \brief Private driver state.
+ void* private_data;
+
+ /// \brief Release the receipt's memory. Sets `release` to NULL.
+ void (*release)(struct AdbcIngestReceipt* self);
+};
+/// @}
+
+/// \addtogroup adbc-connection-ingest-partition
+/// Some drivers can accept bulk writes from a distributed writer: a
+/// coordinator configures an ingest, many workers write partitions in
+/// parallel (possibly from different processes or hosts), and the
+/// coordinator commits or aborts atomically.
+///
+/// This mirrors the read-side partitioned execution model. The
+/// coordinator calls AdbcConnectionBeginIngestPartitions to obtain an
+/// opaque, serializable handle. The handle is shipped to workers by
+/// the caller (e.g. a Spark driver sending it to executors). Workers
+/// call AdbcConnectionWriteIngestPartition on their own connections —
+/// the connection does not have to be the same one that created the
+/// handle. Each write returns an opaque receipt. The coordinator
+/// collects receipts and calls AdbcConnectionCommitIngestPartitions
+/// (or AdbcConnectionAbortIngestPartitions on failure).
+///
+/// Handles and receipts are driver-defined opaque byte strings. They
+/// are safe to transmit between processes and to use concurrently
+/// from multiple connections.
+///
+/// Drivers are not required to support partitioned ingest.
+///
+/// \since ADBC API revision 1.2.0
+///
+/// @{
+
+/// \brief Begin a partitioned bulk ingest.
+///
+/// Uses the same semantics as the ADBC_INGEST_OPTION_* options on
+/// AdbcStatement. For ADBC_INGEST_OPTION_MODE_CREATE,
+/// ADBC_INGEST_OPTION_MODE_CREATE_APPEND, and
+/// ADBC_INGEST_OPTION_MODE_REPLACE, `schema` is required and the
+/// driver creates (or recreates) the target table at this call. For
+/// ADBC_INGEST_OPTION_MODE_APPEND, `schema` is optional; if provided,
+/// the driver validates it against the target and returns
+/// ADBC_STATUS_ALREADY_EXISTS on mismatch.
+///
+/// The returned handle is opaque, serializable, and usable from any
+/// connection that can open the same database. The caller releases
+/// it via `out_handle->release`; the bytes can be copied and shipped
+/// to workers before release.
+///
+/// \since ADBC API revision 1.2.0
+/// \param[in] connection The coordinator's connection.
+/// \param[in] target_catalog Catalog of the target table, or NULL.
+/// \param[in] target_db_schema Schema of the target table, or NULL.
+/// \param[in] target_table Name of the target table. Required.
+/// \param[in] mode One of ADBC_INGEST_OPTION_MODE_*. Required.
+/// \param[in] schema Arrow schema of the data to be written.
+/// Required for create/replace/create_append modes; optional for
+/// append.
+/// \param[out] out_handle Driver-owned handle. Must be released by
+/// the caller via `out_handle->release`.
+/// \param[out] error Error details, if any.
+/// \return ADBC_STATUS_INVALID_ARGUMENT if mode requires a schema
+/// but none was provided.
+/// \return ADBC_STATUS_ALREADY_EXISTS if append mode is requested
+/// and the target schema disagrees with the provided schema.
+/// \return ADBC_STATUS_NOT_IMPLEMENTED if the driver does not
+/// support partitioned ingest.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionBeginIngestPartitions(
+ struct AdbcConnection* connection, const char* target_catalog,
+ const char* target_db_schema, const char* target_table, const char* mode,
Review Comment:
one caveat there is that existing options are statement-level in practice
rather than connection-level. all these new functions are connection-level. are
you also suggesting moving all (or some) to statement-level?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]