BentsiLeviav commented on code in PR #38510:
URL: https://github.com/apache/beam/pull/38510#discussion_r3369680648
##########
sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse/TableSchema.java:
##########
@@ -238,6 +257,14 @@ public abstract static class ColumnType implements
Serializable {
public abstract @Nullable Map<String, ColumnType> tupleTypes();
+ /** Sub-second precision (0–9) of {@code DateTime64}. {@code null} for
other types. */
+ public abstract @Nullable Integer precision();
+
+ /**
+ * Optional timezone of {@code DateTime64}; semantically display-only.
{@code null} otherwise.
+ */
+ public abstract @Nullable String timezone();
Review Comment:
Is preserving it adding value, or is it a dead state? I see no usage for that
##########
sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOIT.java:
##########
@@ -480,6 +482,79 @@ public void testPojo() throws Exception {
assertEquals(12L, sum1);
}
+ @Test
+ public void testDateTime64Millis() throws Exception {
+ Schema schema = Schema.of(Schema.Field.of("ts", FieldType.DATETIME));
+ DateTime ts = new DateTime(2026, 5, 15, 12, 34, 56, 789, DateTimeZone.UTC);
+ Row row = Row.withSchema(schema).addValue(ts).build();
+
+ executeSql("CREATE TABLE test_datetime64_ms (ts DateTime64(3, 'UTC'))
ENGINE=Log");
+
+
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_ms"));
+ pipeline.run().waitUntilFinish();
+
+ // toUnixTimestamp64Milli returns the underlying tick count, which is the
most stable thing to
+ // assert across CH versions (string formatting may include trailing zeros
depending on
+ // version).
+ long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Milli(ts) FROM
test_datetime64_ms");
+ assertEquals(ts.getMillis(), ticks);
+ }
+
+ @Test
+ public void testDateTime64Micros() throws Exception {
+ Schema schema = Schema.of(Schema.Field.of("ts",
FieldType.logicalType(SqlTypes.TIMESTAMP)));
+ // 2026-05-15T12:34:56.789012Z — exactly micros, so MicrosInstant accepts
it.
+ java.time.Instant ts = java.time.Instant.ofEpochSecond(1_778_071_696L,
789_012_000L);
+ Row row = Row.withSchema(schema).addValue(ts).build();
+
+ executeSql("CREATE TABLE test_datetime64_us (ts DateTime64(6))
ENGINE=Log");
+
+
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_us"));
+ pipeline.run().waitUntilFinish();
+
+ long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Micro(ts) FROM
test_datetime64_us");
+ assertEquals(1_778_071_696L * 1_000_000L + 789_012L, ticks);
+ }
+
+ @Test
+ public void testDateTime64Nanos() throws Exception {
+ // DateTime64(9) must preserve full nanosecond precision. Use NanosInstant
directly
+ // because SqlTypes.TIMESTAMP (MicrosInstant) rejects non-micro-aligned
nanos like the
+ // trailing 345 below.
+ Schema schema = Schema.of(Schema.Field.of("ts", FieldType.logicalType(new
NanosInstant())));
+ java.time.Instant ts = java.time.Instant.ofEpochSecond(1_778_071_696L,
789_012_345L);
+ Row row = Row.withSchema(schema).addValue(ts).build();
+
+ executeSql("CREATE TABLE test_datetime64_ns (ts DateTime64(9))
ENGINE=Log");
+
+
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_ns"));
+ pipeline.run().waitUntilFinish();
+
+ long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Nano(ts) FROM
test_datetime64_ns");
+ assertEquals(1_778_071_696L * 1_000_000_000L + 789_012_345L, ticks);
+ }
+
+ @Test
+ public void testNullableDateTime64() throws Exception {
Review Comment:
Nullable and non-nullable take different code paths in the writer - can we
also have a nullable assertion with a nanos tedt?
##########
sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseWriterTest.java:
##########
Review Comment:
Great coverage. Can we also add:
* Overflow test - `Math.multiplyExact` will throw `ArithmeticException` for
a `DateTime64(9)` timestamp past ~year 2262.
* Null input test - The reject test covers a wrong type, but not null ( The
method has a specific `value == null ? "null" : …` branch in its error message
that's currently unexercised.)
##########
sdks/java/io/clickhouse/src/main/javacc/ColumnTypeParser.jj:
##########
@@ -214,10 +216,28 @@ private ColumnType primitive() :
(<FIXEDSTRING> <LPAREN> ( size = integer() ) <RPAREN>) {
return ColumnType.fixedString(Integer.valueOf(size));
}
+ |
+ (ct = dateTime64()) { return ct; }
)
}
+private ColumnType dateTime64() :
+{
+ String precision;
+ String timezone = null;
+}
+{
+ (
+ <DATETIME64> <LPAREN> ( precision = integer() )
Review Comment:
Bare DateTime64 (no precision) won't parse. The `<LPAREN> integer()
<RPAREN>` is mandatory, but ClickHouse accepts a bare DateTime64 that defaults
to precision 3. To be fully covered, can we make the whole argument list
optional and default to 3?
##########
sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOIT.java:
##########
@@ -480,6 +482,79 @@ public void testPojo() throws Exception {
assertEquals(12L, sum1);
}
+ @Test
+ public void testDateTime64Millis() throws Exception {
+ Schema schema = Schema.of(Schema.Field.of("ts", FieldType.DATETIME));
+ DateTime ts = new DateTime(2026, 5, 15, 12, 34, 56, 789, DateTimeZone.UTC);
+ Row row = Row.withSchema(schema).addValue(ts).build();
+
+ executeSql("CREATE TABLE test_datetime64_ms (ts DateTime64(3, 'UTC'))
ENGINE=Log");
+
+
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_ms"));
+ pipeline.run().waitUntilFinish();
+
+ // toUnixTimestamp64Milli returns the underlying tick count, which is the
most stable thing to
+ // assert across CH versions (string formatting may include trailing zeros
depending on
+ // version).
+ long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Milli(ts) FROM
test_datetime64_ms");
+ assertEquals(ts.getMillis(), ticks);
+ }
+
+ @Test
+ public void testDateTime64Micros() throws Exception {
+ Schema schema = Schema.of(Schema.Field.of("ts",
FieldType.logicalType(SqlTypes.TIMESTAMP)));
+ // 2026-05-15T12:34:56.789012Z — exactly micros, so MicrosInstant accepts
it.
+ java.time.Instant ts = java.time.Instant.ofEpochSecond(1_778_071_696L,
789_012_000L);
+ Row row = Row.withSchema(schema).addValue(ts).build();
+
+ executeSql("CREATE TABLE test_datetime64_us (ts DateTime64(6))
ENGINE=Log");
+
+
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_us"));
+ pipeline.run().waitUntilFinish();
+
+ long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Micro(ts) FROM
test_datetime64_us");
+ assertEquals(1_778_071_696L * 1_000_000L + 789_012L, ticks);
Review Comment:
styling comment - can we move those magic numbers (also in unit tests) to a
meaningful consts? will be easier to understand
##########
sdks/java/io/clickhouse/src/main/javacc/ColumnTypeParser.jj:
##########
@@ -214,10 +216,28 @@ private ColumnType primitive() :
(<FIXEDSTRING> <LPAREN> ( size = integer() ) <RPAREN>) {
return ColumnType.fixedString(Integer.valueOf(size));
}
+ |
+ (ct = dateTime64()) { return ct; }
)
}
+private ColumnType dateTime64() :
+{
+ String precision;
+ String timezone = null;
+}
+{
+ (
+ <DATETIME64> <LPAREN> ( precision = integer() )
+ ( <COMMA> ( timezone = string() ) )?
+ <RPAREN>
+ )
+ {
+ return ColumnType.dateTime64(Integer.parseInt(precision), timezone);
Review Comment:
Minor comment - a typo like `DateTime64(abc)` will result in
`IllegalArgumentException: "failed to parse"`, while a bad precision like
`DateTime64(-1)` will result in a different exception with a different message,
from a different layer. Should we catch and rethrow so all bad input funnels
through the same `failed to parse` error?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]