voonhous commented on code in PR #8418:
URL: https://github.com/apache/hudi/pull/8418#discussion_r1162484156
##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/cluster/ITTestHoodieFlinkClustering.java:
##########
@@ -419,4 +425,179 @@ public void
testHoodieFlinkClusteringScheduleAfterArchive() throws Exception {
.stream().anyMatch(fg -> fg.getSlices()
.stream().anyMatch(s ->
s.getDataFilePath().contains(firstClusteringInstant))));
}
+
+ /**
+ * Test to ensure that creating a table with a column of TIMESTAMP(9) will
throw errors
+ * @throws Exception
+ */
+ @Test
+ public void testHoodieFlinkClusteringWithTimestampNanos() {
+ // create hoodie table and insert into data
Review Comment:
> Can the append mode write timestamp(9) then?
Nope, APPEND can't write TIMESTAMP(9).
> BTW, Spark use the INT96 as the default output timestamp type in their
parquet writer:
https://github.com/apache/spark/blob/0a63a496bdced946a5d4825ca66df12de51d3a87/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L970
I don't think we are using this by default, writing TIMESTAMP_NANOS with
Hudi-on-Spark will write INT64.
Example:
```sql
CREATE TABLE `dev_hudi`.`timestamp_test`
(
`id` INTEGER,
`bigint_col` BIGINT,
`string_col` STRING,
`double_col` DOUBLE,
`timestamp_col` TIMESTAMP,
`operation` STRING
) USING hudi
TBLPROPERTIES (
'primaryKey' = 'id',
'type' = 'cow',
'preCombineField' = 'bigint_col'
)
LOCATION 'hdfs://path/to/timestamp_test';
-- use nanos, however, this will fallback to micros
INSERT INTO `dev_hudi`.`timestamp_test`
VALUES (1, 1000, "string_col_1", 1.1, TIMESTAMP "1970-01-01
00:00:01.001001001", "init"),
(2, 2000, "string_col_2", 2.2, TIMESTAMP "1970-01-01
00:00:02.001001001", "init");
SELECT * FROM `dev_hudi`.`timestamp_test`;
20230411163354949 20230411163354949_0_0 1
5ea1112a-3f7d-4c6a-8f20-5275055ee330-0_0-17-20_20230411163354949.parquet
1 1000 string_col_1 1.1 1970-01-01 00:00:01.001001 init
20230411163354949 20230411163354949_0_1 2
5ea1112a-3f7d-4c6a-8f20-5275055ee330-0_0-17-20_20230411163354949.parquet
2 2000 string_col_2 2.2 1970-01-01 00:00:02.001001 init
```
parquet-tools snippet:
```
############ Column(timestamp_col)[row group 0] ############
name: timestamp_col
path: timestamp_col
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=true, timeUnit=microseconds,
is_from_converted_type=true, force_set_converted_type=false)
converted_type (legacy): TIMESTAMP_MICROS
compression: GZIP (space_saved: -20%)
total_compressed_size: 100
total_uncompressed_size: 83
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]