hudi-agent commented on code in PR #18776:
URL: https://github.com/apache/hudi/pull/18776#discussion_r3272595614


##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/hadoop/HoodieAvroHFileWriter.java:
##########
@@ -105,8 +106,14 @@ public HoodieAvroHFileWriter(String instantTime, 
StoragePath file, HoodieHFileCo
         .build();
     StorageConfiguration<Configuration> storageConf = new 
HadoopStorageConfiguration(conf);
     StoragePath filePath = new StoragePath(this.file.toUri());
-    OutputStream outputStream =  HoodieStorageUtils.getStorage(filePath, 
storageConf).create(filePath);
-    this.writer = new HFileWriterImpl(context, outputStream);
+    OutputStream outputStream = HoodieStorageUtils.getStorage(filePath, 
storageConf).create(filePath);
+    try {
+      this.writer = new HFileWriterImpl(context, outputStream);
+    } finally {
+      if (this.writer == null) {
+        closeQuietly(outputStream);

Review Comment:
   🤖 The `try { ... } finally { if (this.writer == null) 
closeQuietly(outputStream); }` only protects the `HFileWriterImpl` constructor. 
If `writer.appendFileInfo(...)` on line 116-117 throws (or any later 
constructor line), `this.writer` is non-null but the constructor exits, leaving 
the writer (and its underlying output stream) leaked since the caller never 
gets a reference. Could the try region cover the rest of the constructor, or 
close `this.writer` itself on failure?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieBinaryCopyHandle.java:
##########
@@ -118,14 +118,26 @@ public void write() {
       log.info("Schema evolution enabled for binary copy: {}", 
schemaEvolutionEnabled);
       records = this.writer.binaryCopy(inputFiles, 
Collections.singletonList(path), writeScheMessageType, schemaEvolutionEnabled);
     } catch (IOException e) {
+      closeWriterAfterFailure(e);
       throw new HoodieIOException(e.getMessage(), e);
+    } catch (RuntimeException e) {
+      closeWriterAfterFailure(e);
+      throw e;
     } finally {
       this.recordsWritten = records;
       this.insertRecordsWritten = records;
     }
     log.info("Finish rewriting " + this.path + ". Using " + timer.endTimer() + 
" mills");
   }
 
+  private void closeWriterAfterFailure(Throwable failure) {
+    try {
+      this.writer.close();

Review Comment:
   🤖 This only catches `IOException` — if `writer.close()` throws a 
`RuntimeException`, it propagates out of `closeWriterAfterFailure` and replaces 
the original failure passed in via `failure`. Could we route this through 
`AutoCloseableUtils.closeQuietlyWithSuppressed(writer, failure)` for 
consistency with the other handles?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieBinaryCopyHandle.java:
##########
@@ -118,14 +118,26 @@ public void write() {
       log.info("Schema evolution enabled for binary copy: {}", 
schemaEvolutionEnabled);
       records = this.writer.binaryCopy(inputFiles, 
Collections.singletonList(path), writeScheMessageType, schemaEvolutionEnabled);
     } catch (IOException e) {
+      closeWriterAfterFailure(e);
       throw new HoodieIOException(e.getMessage(), e);
+    } catch (RuntimeException e) {
+      closeWriterAfterFailure(e);
+      throw e;
     } finally {
       this.recordsWritten = records;
       this.insertRecordsWritten = records;
     }
     log.info("Finish rewriting " + this.path + ". Using " + timer.endTimer() + 
" mills");
   }
 
+  private void closeWriterAfterFailure(Throwable failure) {
+    try {

Review Comment:
   🤖 nit: this handle re-implements the same close-and-suppress logic instead 
of reusing the new `AutoCloseableUtils.closeQuietlyWithSuppressed` that the 
other handles in this PR adopted — could you switch to the shared util for 
consistency?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -531,6 +532,7 @@ protected void doWrite(HoodieRecord record, HoodieSchema 
schema, TypedProperties
     } catch (Throwable t) {
       log.error("Error writing record " + record, t);
       if (!config.getIgnoreWriteFailed()) {
+        closeLogWriterQuietly(t);

Review Comment:
   🤖 Closing `writer` here makes the handle inconsistent: `recordList` may 
still hold buffered records from earlier successful writes, and if a caller (or 
finally-block) later invokes `close()`, `appendDataAndDeleteBlocks` will hit 
`writer.appendBlocks(blocks)` with a null writer, then the catch block at line 
519 will dereference `writer.getLogFile().getPath()` and throw a second NPE 
that replaces the original exception. Could we either skip closing the writer 
here, or also clear/flush `recordList` so the subsequent `close()` no-ops?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-hadoop-common/src/main/java/org/apache/hudi/parquet/io/HoodieParquetBinaryCopyBase.java:
##########
@@ -147,17 +154,42 @@ protected void initFileWriter(Path outPutFile, 
CompressionCodecName newCodecName
 
   @Override
   public void close() throws IOException {
+    if (writer == null) {
+      return;
+    }
     Map<String, String> extraMetaData = finalizeMetadata();
     extraMetaData = extraMetaData == null ? new HashMap<>() : extraMetaData;
     extraMetaData.remove("parquet.avro.schema");
     extraMetaData.remove("org.apache.spark.sql.parquet.row.metadata");
-    writer.end(extraMetaData);
-    // Release the buffer
-    reusableBlockBuffer = null;
+    try {
+      writer.end(extraMetaData);
+    } finally {
+      writer = null;
+      // Release the buffer
+      reusableBlockBuffer = null;
+    }
   }
 
   protected abstract Map<String, String> finalizeMetadata();
 
+  private void closeParquetFileWriterQuietly(ParquetFileWriter 
parquetFileWriter) {
+    if (parquetFileWriter == null) {
+      return;
+    }
+    Method closeMethod;
+    try {
+      // ParquetFileWriter does not implement Closeable in all versions, so 
attempt close() reflectively.
+      closeMethod = parquetFileWriter.getClass().getMethod("close");
+    } catch (NoSuchMethodException e) {
+      return;

Review Comment:
   🤖 nit: the comment explains *what* the reflection does but not *why* — could 
you call out which Parquet versions are missing `close()` here? Otherwise a 
future reader is likely to delete the reflection thinking it's gratuitous.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteMergeHandle.java:
##########
@@ -485,10 +483,30 @@ public List<WriteStatus> close() {
 
       return Collections.singletonList(writeStatus);
     } catch (IOException e) {
+      closeFileWriterQuietly(e);
       throw new HoodieUpsertException("Failed to close UpdateHandle", e);
+    } catch (RuntimeException e) {
+      closeFileWriterQuietly(e);
+      throw e;
+    } finally {
+      keyToNewRecords = null;
+      writtenRecordKeys = null;

Review Comment:
   🤖 nit: `keyToNewRecords` / `writtenRecordKeys` are already nulled inside the 
try-block on the success path — duplicating the assignment in `finally` is 
harmless but a bit noisy. Could the finally just rely on the try-block, or 
alternatively drop the success-path assignment and keep only the finally?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to