JNSimba opened a new pull request, #63618:
URL: https://github.com/apache/doris/pull/63618
### What problem does this PR solve?
Problem Summary:
When a Postgres CDC streaming job ingests rows whose timestamp / date
columns hold historical values (pre-1970 with sub-millisecond precision, or
pre-1582 / pre-1901 dates), two independent bugs in cdc-client cause data
corruption or task crash:
1. `DebeziumJsonDeserializer.convertTimestamp` uses signed `/` and `%` on
negative `micros` / `nanos`, producing a negative `nanoOfMillisecond` and
tripping Flink `TimestampData`'s `checkArgument(nanoOfMillisecond >= 0)`.
Result: the ingestion task crashes whenever a pre-1970 timestamp with
sub-millisecond precision flows through (e.g. `1969-12-31 23:59:59.999123`).
2. The snapshot path reads column values via `rs.getObject()`, which routes
through PG JDBC's `TimestampUtils` + `GregorianCalendar`. For pre-1582
timestamps the Julian/proleptic cutover shifts values by N days; for pre-1901
timestamps the JVM time zone's LMT offset shifts values by the LMT difference
(e.g. ~343s in `Asia/Shanghai`). Result: the same PG value (e.g. `0001-01-01
00:00:00`) yields different doris values depending on whether the row was
synced via snapshot or via binlog.
This PR fixes both:
1. Use `Math.floorDiv` / `Math.floorMod` so the millisecond / nanosecond
split stays valid for negative epoch values.
2. Dispatch `TIMESTAMP` / `TIMESTAMPTZ` / `DATE` columns through
`LocalDateTime` / `OffsetDateTime` / `LocalDate` in the snapshot reader,
bypassing `GregorianCalendar` entirely. Preserve the legacy
`Timestamp(Long.MAX/MIN_VALUE)` sentinel for `+/-infinity`.
### Release note
Fix postgres CDC streaming job ingestion crash and value drift for
historical-date timestamp / date columns.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- Behavior changed:
- [x] No.
- Does this need documentation?
- [x] No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]