alamb commented on issue #1107: URL: https://github.com/apache/arrow-adbc/issues/1107#issuecomment-1738918032
Thank you @lidavidm and @joellubi -- I also think this basic idea makes sense. In terms of the modes, SQL systems I am familiar with typically support two types, which are typically dialect specific and fairly complex: 1. `INSERT/COPY`: appends the new rows to the target table 2. `UPSERT/MERGE` potentially updates existing rows if present, and inserts new rows if not (for example, [snowflake](https://community.snowflake.com/s/article/how-to-perform-a-mergeupsert-from-a-flat-file-staged-on-s3)) In order to implement `UPSERT / MERGE` you typically need to specify the criteria of what qualifies as an update. For some systems this is done via a declaration of `PRIAMARY KEY` but many others you can also potentially specify a custom matching condition Thus I suggest supporting bulk insert in a generic way via a SQL query rather than an enum and table names, which would constrain how this feature gets used. Perhaps we could add something like the follow (to mirror Update). It would likely make sense to have a prepared statement version of this as well ```protobuf /* * Represents a SQL bulk insert / upsert query. Used in the command member of FlightDescriptor * for the the RPC call DoPut to cause the server to execute the included SQL INSERT/COPY/UPSERT/MERGE or similar * command, with the data in the batches in the DoPut call. */ message CommandStatementInsert { option (experimental) = true; // The SQL syntax. string query = 1; // Include the query as part of this transaction (if unset, the query is auto-committed). optional bytes transaction_id = 2; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
