Re: [PR] refactor: Update ParquetWriteSupport for Rows to match Avro writer behavior [hudi]

via GitHub Sat, 20 Sep 2025 12:08:56 -0700


the-other-tim-brown commented on code in PR #13882:
URL: https://github.com/apache/hudi/pull/13882#discussion_r2345550720



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowParquetWriteSupport.java:
##########
@@ -73,6 +175,173 @@ public void add(UTF8String recordKey) {
         bloomFilterWriteSupport.addKey(recordKey));
   }
 
+  @FunctionalInterface
+  private interface ValueWriter {
+    void write(SpecializedGetters row, int ordinal);
+  }
+
+  private void consumeMessage(Runnable writer) {
+    recordConsumer.startMessage();
+    writer.run();
+    recordConsumer.endMessage();
+  }
+
+  private void consumeGroup(Runnable writer) {
+    recordConsumer.startGroup();
+    writer.run();
+    recordConsumer.endGroup();
+  }
+
+  private void consumeField(String field, int index, Runnable writer) {
+    recordConsumer.startField(field, index);
+    writer.run();
+    recordConsumer.endField(field, index);
+  }
+
+  private void writeFields(InternalRow row, Schema schema, ValueWriter[] 
fieldWriters) {
+    for (int i = 0; i < fieldWriters.length; i++) {
+      int index = i;
+      if (!row.isNullAt(i)) {
+        Schema.Field field = schema.getFields().get(index);
+        consumeField(field.name(), index, () -> fieldWriters[index].write(row, 
index));
+      }
+    }
+  }
+
+  private ValueWriter makeWriter(Schema avroSchema, DataType dataType) {
+    Schema resolvedSchema = resolveNullableSchema(avroSchema);
+    Schema.Type type = resolvedSchema.getType();
+    LogicalType logicalType = resolvedSchema.getLogicalType();
+    switch (type) {
+      case BOOLEAN:
+        return (row, ordinal) -> 
recordConsumer.addBoolean(row.getBoolean(ordinal));
+      case INT:
+        if (logicalType != null) {
+          if (logicalType.getName().equals(LogicalTypes.date().getName())) {
+            return (row, ordinal) -> recordConsumer.addInteger((Integer) 
dateRebaseFunction.apply(row.getInt(ordinal)));
+          }
+        }
+        return (row, ordinal) -> 
recordConsumer.addInteger(row.getInt(ordinal));
+      case LONG:
+        if (logicalType != null) {
+          if 
(logicalType.getName().equals(LogicalTypes.timestampMillis().getName())) {
+            return (row, ordinal) -> 
recordConsumer.addLong(DateTimeUtils.microsToMillis((long) 
timestampRebaseFunction.apply(row.getLong(ordinal))));

Review Comment:
   Yes, the timestamp in spark is always micros under the hood based on my 
understanding



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] refactor: Update ParquetWriteSupport for Rows to match Avro writer behavior [hudi]

Reply via email to