joellubi commented on code in PR #38385:
URL: https://github.com/apache/arrow/pull/38385#discussion_r1388725396
##########
format/FlightSql.proto:
##########
@@ -1778,6 +1794,47 @@ message CommandPreparedStatementUpdate {
bytes prepared_statement_handle = 1;
}
+/*
+ * Represents a bulk ingestion request. Used in the command member of
FlightDescriptor
+ * for the the RPC call DoPut to cause the server load the contents of the
stream's
+ * FlightData into the target destination.
+ */
+message CommandStatementIngest {
+ option (experimental) = true;
+
+ // Describes the behavior for loading bulk data.
+ enum IngestMode {
+ // Ingestion behavior unspecified.
+ INGEST_MODE_UNSPECIFIED = 0;
+ // Create the target table. Fail if the target table already exists.
+ INGEST_MODE_CREATE = 1;
+ // Append to an existing target table. Fail if the target table does not
exist.
+ INGEST_MODE_APPEND = 2;
+ // Drop the target table if it exists. Then follow INGEST_MODE_CREATE
behavior.
+ INGEST_MODE_REPLACE = 3;
+ // Create the target table if it does not exist. Then follow
INGEST_MODE_APPEND behavior.
+ INGEST_MODE_CREATE_APPEND = 4;
+ }
+
+ // The ingestion behavior.
+ IngestMode mode = 1;
+ // The table to load data into.
+ string table = 2;
+ // The db_schema of the destination table to load data into. If unset, a
backend-specific default may be used.
+ optional string schema = 3;
+ // The catalog of the destination table to load data into. If unset, a
backend-specific default may be used.
+ optional string catalog = 4;
+ // Use a temporary table.
+ optional bool temporary = 5;
Review Comment:
I took some time to map out how this could work in the various ingestion
scenarios and ran into some trouble with certain edge cases. After reviewing
how several DB engines I'm familiar with handle these scenarios, I can see the
benefit of making an opinionated choice regarding the semantics.
Based on my observations, there are at least **two** qualities of temporary
tables that most clients would expect:
1. They are dropped automatically by the DB at the end of the session. The
scope of the session could be a connection, transaction, etc. The specific
scope seems to be backend-specific, and sometimes configurable.
2. Another quality that seems common to implementations I've seen is that
you treat temp tables _exactly the same_ as a "regular" table after they've
been created, for the duration of the session. This means that the only time a
client would specify that a table is temporary is during creation (i.e.
`CREATE` vs `CREATE TEMP`). For this reason I take back my statement implying
that `temporary` is unrelated to table creation. To the contrary it seems that
table creation is the only place where `temporary` should have an effect.
This is based on DBs I'm familiar with, so I would appreciate your
perspectives on whether these expectations are broadly applicable.
Putting these observations together, I would propose the following semantics
when used with `temporary`:
- `INGEST_MODE_CREATE`: Create a temp table. Fail if it already exists.
Should create even if a non-temp table with the same name already exists.
- `INGEST_MODE_APPEND`: Ignore the temporary argument. Append to the table
specified. The backend can tell whether it's temporary or not. (I'm ok with
returning an error here instead if there's a good reason to do so)
- `INGEST_MODE_REPLACE`: Drop the table if it exists. Then follow
INGEST_MODE_CREATE behavior. Fail if the target is not a temp table (might help
avoid unexpected data loss).
- `INGEST_MODE_CREATE_APPEND`: Create a temp table if it does not exist.
Then follow INGEST_MODE_APPEND behavior.
Thoughts on these conclusions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]