wgtmac commented on code in PR #653:
URL: https://github.com/apache/iceberg-cpp/pull/653#discussion_r3272689931
##########
src/iceberg/util/bucket_util.cc:
##########
@@ -63,6 +63,16 @@ int32_t HashLiteral<TypeId::kTimestampTz>(const Literal&
literal) {
return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
}
+template <>
+int32_t HashLiteral<TypeId::kTimestampNs>(const Literal& literal) {
+ return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
+}
+
+template <>
+int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) {
+ return BucketUtils::HashLong(std::get<int64_t>(literal.value()));
+}
Review Comment:
According to the Iceberg V3 spec and the Java implementation
(`BucketTimestampNano.java`), nanosecond timestamps must be converted to
microseconds (divided by 1000) before hashing. This ensures that bucket
partitioning is consistent between microsecond and nanosecond precision types
for the same logical time.
```cpp
return BucketUtils::HashLong(std::get<int64_t>(literal.value()) / 1000);
```
##########
src/iceberg/util/transform_util.cc:
##########
@@ -117,6 +199,21 @@ std::string TransformUtil::HumanTimestamp(int64_t
timestamp_micros) {
}
}
+std::string TransformUtil::HumanTimestampNs(int64_t timestamp_nanos) {
+ auto tp = std::chrono::time_point<std::chrono::system_clock,
std::chrono::seconds>{
+ std::chrono::seconds(timestamp_nanos / kNanosPerSecond)};
+ auto nanos = timestamp_nanos % kNanosPerSecond;
Review Comment:
For negative timestamps (pre-1970), C++'s division (`/`) and modulo (`%`)
operators truncate towards zero. This causes `ParseTimestampNs` and
`HumanTimestampNs` to compute an incorrect base time point and a negative
fractional part, breaking the string formatting and parsing.
For example, `1969-12-31T23:59:59.123456789` parses to `-876543211` nanos.
Passing this back here yields `0` for seconds and `-876543211` for nanos,
resulting in `1970-01-01T00:00:00.-876543211`.
Consider using `std::chrono::floor` to handle the negative values correctly
(note: the original microsecond `HumanTimestamp` and `ParseTimestamp` also
suffer from this issue).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]