rahil-c commented on code in PR #17768:
URL: https://github.com/apache/hudi/pull/17768#discussion_r2697157911


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/common/model/HoodieSparkRecord.java:
##########
@@ -327,7 +328,15 @@ public Option<HoodieAvroIndexedRecord> 
toIndexedRecord(HoodieSchema recordSchema
 
   @Override
   public ByteArrayOutputStream getAvroBytes(HoodieSchema recordSchema, 
Properties props) throws IOException {
-    throw new UnsupportedOperationException();
+    // Convert Spark InternalRow to Avro GenericRecord
+    if (data == null) {

Review Comment:
   @vinothchandar 
   Originally I hit the following exception in 
`TestLanceDataSource#testBasicUpsertModifyExistingRow` when trying to upsert on 
an existing row for the MOR case (where there should have been a lance base 
file but an avro log file)
   ```
   Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while 
appending records to 
/var/folders/lm/0j1q1s_n09b4wgqkdqbzpbkm0000gn/T/junit-11448262777148643233/dataset/test_lance_upsert_merge_on_read/.3169035e-e73a-49ec-be8f-c7045242bf56-0_20260115220744098.log.1_0-38-60
        at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:511)
        at 
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:470)
        at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:82)
        at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:358)
        ... 35 more
   Caused by: java.lang.UnsupportedOperationException
        at 
org.apache.hudi.common.model.HoodieSparkRecord.getAvroBytes(HoodieSparkRecord.java:331)
        at 
org.apache.hudi.common.table.log.block.HoodieAvroDataBlock.serializeRecords(HoodieAvroDataBlock.java:122)
        at 
org.apache.hudi.common.table.log.block.HoodieDataBlock.getContentBytes(HoodieDataBlock.java:132)
        at 
org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:147)
        at 
org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:503)
        ... 38 more
   ```
   When examining the frames of the stack trace, i can see that is it going 
thru the `upsert` path and to  `HoodieAppendHandle` 
   <img width="864" height="283" alt="Screenshot 2026-01-15 at 10 12 28 PM" 
src="https://github.com/user-attachments/assets/5edd8d24-dbb8-42c4-ae73-a9ca106e4915";
 />
   and attempts to write a log file in 
`HoodieAppendHandle#appendDataAndDeleteBlocks`, in the following code pointer, 
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java#L503
   
   The actual block seems to be the `HoodieAvroDataBlock` 
   <img width="1231" height="248" alt="Screenshot 2026-01-15 at 10 17 29 PM" 
src="https://github.com/user-attachments/assets/eef2960b-61ca-44ed-bff3-49d487e5af00";
 />
   
   which contains a method called `serializeRecords`
   
https://github.com/apache/hudi/blob/30029e37017f64b1a4d682f08c99021fadede70b/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L122
   
   The actual record being used in this case is `HoodieSparkRecord` which 
currently did not have a `getAvroBytes` hence why I implemented it for now.
   <img width="1150" height="439" alt="Screenshot 2026-01-15 at 10 21 44 PM" 
src="https://github.com/user-attachments/assets/0f0fef7d-cdc8-411a-86de-faefc11522d2";
 />
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to