BentsiLeviav commented on code in PR #38510:
URL: https://github.com/apache/beam/pull/38510#discussion_r3369680648


##########
sdks/java/io/clickhouse/src/main/java/org/apache/beam/sdk/io/clickhouse/TableSchema.java:
##########
@@ -238,6 +257,14 @@ public abstract static class ColumnType implements 
Serializable {
 
     public abstract @Nullable Map<String, ColumnType> tupleTypes();
 
+    /** Sub-second precision (0–9) of {@code DateTime64}. {@code null} for 
other types. */
+    public abstract @Nullable Integer precision();
+
+    /**
+     * Optional timezone of {@code DateTime64}; semantically display-only. 
{@code null} otherwise.
+     */
+    public abstract @Nullable String timezone();

Review Comment:
   Is preserving it adding value, or is it a dead state? I see no usage for that



##########
sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOIT.java:
##########
@@ -480,6 +482,79 @@ public void testPojo() throws Exception {
     assertEquals(12L, sum1);
   }
 
+  @Test
+  public void testDateTime64Millis() throws Exception {
+    Schema schema = Schema.of(Schema.Field.of("ts", FieldType.DATETIME));
+    DateTime ts = new DateTime(2026, 5, 15, 12, 34, 56, 789, DateTimeZone.UTC);
+    Row row = Row.withSchema(schema).addValue(ts).build();
+
+    executeSql("CREATE TABLE test_datetime64_ms (ts DateTime64(3, 'UTC')) 
ENGINE=Log");
+
+    
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_ms"));
+    pipeline.run().waitUntilFinish();
+
+    // toUnixTimestamp64Milli returns the underlying tick count, which is the 
most stable thing to
+    // assert across CH versions (string formatting may include trailing zeros 
depending on
+    // version).
+    long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Milli(ts) FROM 
test_datetime64_ms");
+    assertEquals(ts.getMillis(), ticks);
+  }
+
+  @Test
+  public void testDateTime64Micros() throws Exception {
+    Schema schema = Schema.of(Schema.Field.of("ts", 
FieldType.logicalType(SqlTypes.TIMESTAMP)));
+    // 2026-05-15T12:34:56.789012Z — exactly micros, so MicrosInstant accepts 
it.
+    java.time.Instant ts = java.time.Instant.ofEpochSecond(1_778_071_696L, 
789_012_000L);
+    Row row = Row.withSchema(schema).addValue(ts).build();
+
+    executeSql("CREATE TABLE test_datetime64_us (ts DateTime64(6)) 
ENGINE=Log");
+
+    
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_us"));
+    pipeline.run().waitUntilFinish();
+
+    long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Micro(ts) FROM 
test_datetime64_us");
+    assertEquals(1_778_071_696L * 1_000_000L + 789_012L, ticks);
+  }
+
+  @Test
+  public void testDateTime64Nanos() throws Exception {
+    // DateTime64(9) must preserve full nanosecond precision. Use NanosInstant 
directly
+    // because SqlTypes.TIMESTAMP (MicrosInstant) rejects non-micro-aligned 
nanos like the
+    // trailing 345 below.
+    Schema schema = Schema.of(Schema.Field.of("ts", FieldType.logicalType(new 
NanosInstant())));
+    java.time.Instant ts = java.time.Instant.ofEpochSecond(1_778_071_696L, 
789_012_345L);
+    Row row = Row.withSchema(schema).addValue(ts).build();
+
+    executeSql("CREATE TABLE test_datetime64_ns (ts DateTime64(9)) 
ENGINE=Log");
+
+    
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_ns"));
+    pipeline.run().waitUntilFinish();
+
+    long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Nano(ts) FROM 
test_datetime64_ns");
+    assertEquals(1_778_071_696L * 1_000_000_000L + 789_012_345L, ticks);
+  }
+
+  @Test
+  public void testNullableDateTime64() throws Exception {

Review Comment:
    Nullable and non-nullable take different code paths in the writer - can we 
also have a nullable assertion with a nanos tedt?



##########
sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseWriterTest.java:
##########


Review Comment:
   Great coverage. Can we also add:
   * Overflow test - `Math.multiplyExact` will throw `ArithmeticException` for 
a `DateTime64(9)` timestamp past ~year 2262. 
   * Null input test -  The reject test covers a wrong type, but not null ( The 
method has a specific `value == null ? "null" : …` branch in its error message 
that's currently unexercised.)



##########
sdks/java/io/clickhouse/src/main/javacc/ColumnTypeParser.jj:
##########
@@ -214,10 +216,28 @@ private ColumnType primitive() :
       (<FIXEDSTRING> <LPAREN> ( size = integer() ) <RPAREN>) {
         return ColumnType.fixedString(Integer.valueOf(size));
       }
+    |
+      (ct = dateTime64()) { return ct; }
     )
 
 }
 
+private ColumnType dateTime64() :
+{
+    String precision;
+    String timezone = null;
+}
+{
+    (
+        <DATETIME64> <LPAREN> ( precision = integer() )

Review Comment:
    Bare DateTime64 (no precision) won't parse. The `<LPAREN> integer() 
<RPAREN>` is mandatory, but ClickHouse accepts a bare DateTime64 that defaults 
to precision 3. To be fully covered, can we make the whole argument list 
optional and default to 3?



##########
sdks/java/io/clickhouse/src/test/java/org/apache/beam/sdk/io/clickhouse/ClickHouseIOIT.java:
##########
@@ -480,6 +482,79 @@ public void testPojo() throws Exception {
     assertEquals(12L, sum1);
   }
 
+  @Test
+  public void testDateTime64Millis() throws Exception {
+    Schema schema = Schema.of(Schema.Field.of("ts", FieldType.DATETIME));
+    DateTime ts = new DateTime(2026, 5, 15, 12, 34, 56, 789, DateTimeZone.UTC);
+    Row row = Row.withSchema(schema).addValue(ts).build();
+
+    executeSql("CREATE TABLE test_datetime64_ms (ts DateTime64(3, 'UTC')) 
ENGINE=Log");
+
+    
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_ms"));
+    pipeline.run().waitUntilFinish();
+
+    // toUnixTimestamp64Milli returns the underlying tick count, which is the 
most stable thing to
+    // assert across CH versions (string formatting may include trailing zeros 
depending on
+    // version).
+    long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Milli(ts) FROM 
test_datetime64_ms");
+    assertEquals(ts.getMillis(), ticks);
+  }
+
+  @Test
+  public void testDateTime64Micros() throws Exception {
+    Schema schema = Schema.of(Schema.Field.of("ts", 
FieldType.logicalType(SqlTypes.TIMESTAMP)));
+    // 2026-05-15T12:34:56.789012Z — exactly micros, so MicrosInstant accepts 
it.
+    java.time.Instant ts = java.time.Instant.ofEpochSecond(1_778_071_696L, 
789_012_000L);
+    Row row = Row.withSchema(schema).addValue(ts).build();
+
+    executeSql("CREATE TABLE test_datetime64_us (ts DateTime64(6)) 
ENGINE=Log");
+
+    
pipeline.apply(Create.of(row).withRowSchema(schema)).apply(write("test_datetime64_us"));
+    pipeline.run().waitUntilFinish();
+
+    long ticks = executeQueryAsLong("SELECT toUnixTimestamp64Micro(ts) FROM 
test_datetime64_us");
+    assertEquals(1_778_071_696L * 1_000_000L + 789_012L, ticks);

Review Comment:
   styling comment - can we move those magic numbers (also in unit tests) to a 
meaningful consts? will be easier to understand



##########
sdks/java/io/clickhouse/src/main/javacc/ColumnTypeParser.jj:
##########
@@ -214,10 +216,28 @@ private ColumnType primitive() :
       (<FIXEDSTRING> <LPAREN> ( size = integer() ) <RPAREN>) {
         return ColumnType.fixedString(Integer.valueOf(size));
       }
+    |
+      (ct = dateTime64()) { return ct; }
     )
 
 }
 
+private ColumnType dateTime64() :
+{
+    String precision;
+    String timezone = null;
+}
+{
+    (
+        <DATETIME64> <LPAREN> ( precision = integer() )
+        ( <COMMA> ( timezone = string() ) )?
+        <RPAREN>
+    )
+    {
+        return ColumnType.dateTime64(Integer.parseInt(precision), timezone);

Review Comment:
   Minor comment - a typo like `DateTime64(abc)` will result in 
`IllegalArgumentException: "failed to parse"`, while a bad precision like 
`DateTime64(-1)` will result in a different exception with a different message, 
from a different layer. Should we catch and rethrow so all bad input funnels 
through the same `failed to parse` error?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to