liujiwen-up opened a new pull request, #660:
URL: https://github.com/apache/doris-flink-connector/pull/660
# Proposed changes
Issue Number: close #xxx
## Problem Summary:
This PR changes Doris Flink Connector `INSERT OVERWRITE` sink behavior to
avoid truncating the target table before data is successfully written.
Previously, `INSERT OVERWRITE` executed `TRUNCATE TABLE` before writing. If
the Flink job failed after truncate but before the write completed, the target
table could be left empty.
The new implementation uses a staging-table based flow:
1. Create a staging table with `CREATE TABLE staging LIKE target`.
2. Write data into the staging table through Stream Load 2PC.
3. After all committed data is available, finalize the overwrite with:
`ALTER TABLE target REPLACE WITH TABLE staging
PROPERTIES('swap'='false')`.
This implementation is currently limited to bounded `INSERT OVERWRITE` with
`STREAM_LOAD` and 2PC enabled. It rejects unsafe configurations such as
streaming overwrite, non-Stream Load write modes,
`sink.ignore.commit-error=true`, missing `sink.label-prefix`, missing
`jdbc-url`, and pre-existing staging tables.
Additional guards were added to require Doris table id metadata from
`information_schema.metadata_name_ids`, so the connector can detect
target-table changes before finalization and identify already-finalized
overwrite attempts.
## Checklist(Required)
1. Does it affect the original behavior: Yes
2. Has unit tests been added: Yes
3. Has document been added or modified: No Need
4. Does it need to update dependencies: No
5. Are there any changes that cannot be rolled back: No
## Further comments
This change intentionally chooses a conservative first version:
- Only bounded overwrite is supported.
- Only Stream Load 2PC is supported.
- Existing staging tables are not reused to avoid publishing stale or mixed
data.
- Table id metadata is required instead of silently degrading safety checks.
Alternatives considered include continuing to use `TRUNCATE TABLE`, reusing
existing staging tables during recovery, or supporting more write modes
immediately. These were not chosen because they either preserve the original
data-loss risk or make failure/recovery semantics harder to prove safe.
Validation performed:
```bash
mvn -Pflink1 -pl flink-doris-connector-base -Dtest=TestDorisOverwriteManager
test
mvn -Pflink1 -pl flink-doris-connector-flink1 -am -DskipTests compile
mvn -Pflink2 -pl flink-doris-connector-flink2 -am -DskipTests compile
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]