hudi-agent commented on code in PR #18384:
URL: https://github.com/apache/hudi/pull/18384#discussion_r3253214755


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java:
##########


Review Comment:
   🤖 **Line 211:** I believe this is still the same NPE scenario from last 
round — when only `_hoodie_partition_path` is excluded but `_hoodie_record_key` 
is not, `populateMetaFields = flags.isRecordKeyPopulated() = true`, and 
downstream `processDataBlock` (line 472-474) builds 
`recordKeyPartitionPathFieldPair = Option.empty()`. That flows into 
`SpillableMapUtils.convertToHoodieRecordPayload` line 141 which does 
`record.get(PARTITION_PATH_METADATA_FIELD).toString()` on the now-null column → 
NPE. The existing tests only cover the case where record_key and partition_path 
are excluded together (via `EXCLUDE_ALL_EXCEPT_COMMIT_TIME`), so this edge case 
isn't exercised. Could the `populateMetaFields` flag here gate on both 
`isRecordKeyPopulated() && isPartitionPathPopulated()`, or could the field pair 
always be sourced from the already-resolved 
`recordKeyField`/`partitionPathFieldOpt`?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteMergeHandle.java:
##########


Review Comment:
   🤖 **Line 421:** I think this still has the issue I flagged last round — the 
preserve path only nullifies `_hoodie_file_name`, and `fileWriter.write(...)` 
routes to `writeAvro(...)` (verified in `HoodieAvroParquetWriter` line 85), 
which does NOT call `prepRecordWithMetadata` and therefore doesn't consult 
`HoodieMetaFieldFlags`. So if a transition grows the exclude list to include 
e.g. `_hoodie_record_key`, the first compaction reads old records (with the key 
populated) and writes them verbatim into the new file, leaving the new file 
inconsistent with the table's exclusion intent. Could you also null out the 
other 4 excluded meta fields here (the same `updateMetaField` pattern as 
file_name), or alternatively route this through a path that goes through 
`prepRecordWithMetadata`?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to