wwj6591812 opened a new pull request, #6324:
URL: https://github.com/apache/paimon/pull/6324
<!-- Please specify the module before the PR name: [core] ... or [flink] ...
-->
### Purpose
In our company, we have a business requirement to compute the primary keys
of new (+I) and deleted (-D) records between today's and yesterday's partitions
of an ODPS (Alibaba Cloud MaxCompute) table using Paimon.
We have conducted a POC that a daily Flink batch job reads a partition from
the ODPS table and writes it to a Paimon primary key table. This job is
configured with the following parameters:
`'full-compaction.delta-commits'='1',
'changelog-producer'='full-compaction', 'tag.automatic-creation' = 'batch',
'tag.batch.customized-name'='ds=20250922'`
This gives us two snapshots and one tag daily.
We found that we can read the changelog by providing snapshot IDs, like in
the SQL below, but we can't do the same using tags.
`SELECT rowkind, item_id, sku_id, ds
FROM `alake`.`omega_alake`.`kk_invalid_sku_prediction_v4$audit_log`
/*+ OPTIONS('scan.parallelism'='128',
'incremental-between-scan-mode'='changelog', 'incremental-between'='2,4')*/
where rowkind = '+U' OR rowkind = '-U'
limit 100;`
So, this PR adds support for reading the incremental changelog between two
specified tags.
<!-- Linking this pull request to the issue -->
Linked issue: close #xxx
<!-- What is the purpose of the change -->
### Tests
<!-- List UT and IT cases to verify this change -->
### API and Format
<!-- Does this change affect API or storage format -->
### Documentation
<!-- Does this change introduce a new feature -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]