vinothchandar commented on code in PR #17768:
URL: https://github.com/apache/hudi/pull/17768#discussion_r2696215008


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/common/model/HoodieSparkRecord.java:
##########
@@ -327,7 +328,15 @@ public Option<HoodieAvroIndexedRecord> 
toIndexedRecord(HoodieSchema recordSchema
 
   @Override
   public ByteArrayOutputStream getAvroBytes(HoodieSchema recordSchema, 
Properties props) throws IOException {
-    throw new UnsupportedOperationException();
+    // Convert Spark InternalRow to Avro GenericRecord
+    if (data == null) {

Review Comment:
   this change is not lance specific. So love to understand, why this becomes 
necessary. 



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkLanceWriter.java:
##########
@@ -113,26 +114,26 @@ public void writeRowWithMetadata(HoodieKey key, 
InternalRow row) throws IOExcept
     if (populateMetaFields) {
       UTF8String recordKey = UTF8String.fromString(key.getRecordKey());
       updateRecordMetadata(row, recordKey, key.getPartitionPath(), 
getWrittenRecordCount());
-      super.write(row);
+      super.write(row.copy());

Review Comment:
   so is this an existing issue for Lance writer in general, unrelated to MoR?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -516,9 +516,6 @@ class HoodieSparkSqlWriterInternal {
             // scalastyle:on
 
             val writeConfig = client.getConfig
-            if (writeConfig.getRecordMerger.getRecordType == 
HoodieRecordType.SPARK && tableType == MERGE_ON_READ && 
writeConfig.getLogDataBlockFormat.orElse(HoodieLogBlockType.AVRO_DATA_BLOCK) != 
HoodieLogBlockType.PARQUET_DATA_BLOCK) {

Review Comment:
   +1 . I can take a closer pass at why we need the getAvroBytes now, and how 
parquet log vs avro log is working for existing table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to