hudi-agent commented on code in PR #18776:
URL: https://github.com/apache/hudi/pull/18776#discussion_r3272595614
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/hadoop/HoodieAvroHFileWriter.java:
##########
@@ -105,8 +106,14 @@ public HoodieAvroHFileWriter(String instantTime,
StoragePath file, HoodieHFileCo
.build();
StorageConfiguration<Configuration> storageConf = new
HadoopStorageConfiguration(conf);
StoragePath filePath = new StoragePath(this.file.toUri());
- OutputStream outputStream = HoodieStorageUtils.getStorage(filePath,
storageConf).create(filePath);
- this.writer = new HFileWriterImpl(context, outputStream);
+ OutputStream outputStream = HoodieStorageUtils.getStorage(filePath,
storageConf).create(filePath);
+ try {
+ this.writer = new HFileWriterImpl(context, outputStream);
+ } finally {
+ if (this.writer == null) {
+ closeQuietly(outputStream);
Review Comment:
🤖 The `try { ... } finally { if (this.writer == null)
closeQuietly(outputStream); }` only protects the `HFileWriterImpl` constructor.
If `writer.appendFileInfo(...)` on line 116-117 throws (or any later
constructor line), `this.writer` is non-null but the constructor exits, leaving
the writer (and its underlying output stream) leaked since the caller never
gets a reference. Could the try region cover the rest of the constructor, or
close `this.writer` itself on failure?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieBinaryCopyHandle.java:
##########
@@ -118,14 +118,26 @@ public void write() {
log.info("Schema evolution enabled for binary copy: {}",
schemaEvolutionEnabled);
records = this.writer.binaryCopy(inputFiles,
Collections.singletonList(path), writeScheMessageType, schemaEvolutionEnabled);
} catch (IOException e) {
+ closeWriterAfterFailure(e);
throw new HoodieIOException(e.getMessage(), e);
+ } catch (RuntimeException e) {
+ closeWriterAfterFailure(e);
+ throw e;
} finally {
this.recordsWritten = records;
this.insertRecordsWritten = records;
}
log.info("Finish rewriting " + this.path + ". Using " + timer.endTimer() +
" mills");
}
+ private void closeWriterAfterFailure(Throwable failure) {
+ try {
+ this.writer.close();
Review Comment:
🤖 This only catches `IOException` — if `writer.close()` throws a
`RuntimeException`, it propagates out of `closeWriterAfterFailure` and replaces
the original failure passed in via `failure`. Could we route this through
`AutoCloseableUtils.closeQuietlyWithSuppressed(writer, failure)` for
consistency with the other handles?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieBinaryCopyHandle.java:
##########
@@ -118,14 +118,26 @@ public void write() {
log.info("Schema evolution enabled for binary copy: {}",
schemaEvolutionEnabled);
records = this.writer.binaryCopy(inputFiles,
Collections.singletonList(path), writeScheMessageType, schemaEvolutionEnabled);
} catch (IOException e) {
+ closeWriterAfterFailure(e);
throw new HoodieIOException(e.getMessage(), e);
+ } catch (RuntimeException e) {
+ closeWriterAfterFailure(e);
+ throw e;
} finally {
this.recordsWritten = records;
this.insertRecordsWritten = records;
}
log.info("Finish rewriting " + this.path + ". Using " + timer.endTimer() +
" mills");
}
+ private void closeWriterAfterFailure(Throwable failure) {
+ try {
Review Comment:
🤖 nit: this handle re-implements the same close-and-suppress logic instead
of reusing the new `AutoCloseableUtils.closeQuietlyWithSuppressed` that the
other handles in this PR adopted — could you switch to the shared util for
consistency?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -531,6 +532,7 @@ protected void doWrite(HoodieRecord record, HoodieSchema
schema, TypedProperties
} catch (Throwable t) {
log.error("Error writing record " + record, t);
if (!config.getIgnoreWriteFailed()) {
+ closeLogWriterQuietly(t);
Review Comment:
🤖 Closing `writer` here makes the handle inconsistent: `recordList` may
still hold buffered records from earlier successful writes, and if a caller (or
finally-block) later invokes `close()`, `appendDataAndDeleteBlocks` will hit
`writer.appendBlocks(blocks)` with a null writer, then the catch block at line
519 will dereference `writer.getLogFile().getPath()` and throw a second NPE
that replaces the original exception. Could we either skip closing the writer
here, or also clear/flush `recordList` so the subsequent `close()` no-ops?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/parquet/io/HoodieParquetBinaryCopyBase.java:
##########
@@ -147,17 +154,42 @@ protected void initFileWriter(Path outPutFile,
CompressionCodecName newCodecName
@Override
public void close() throws IOException {
+ if (writer == null) {
+ return;
+ }
Map<String, String> extraMetaData = finalizeMetadata();
extraMetaData = extraMetaData == null ? new HashMap<>() : extraMetaData;
extraMetaData.remove("parquet.avro.schema");
extraMetaData.remove("org.apache.spark.sql.parquet.row.metadata");
- writer.end(extraMetaData);
- // Release the buffer
- reusableBlockBuffer = null;
+ try {
+ writer.end(extraMetaData);
+ } finally {
+ writer = null;
+ // Release the buffer
+ reusableBlockBuffer = null;
+ }
}
protected abstract Map<String, String> finalizeMetadata();
+ private void closeParquetFileWriterQuietly(ParquetFileWriter
parquetFileWriter) {
+ if (parquetFileWriter == null) {
+ return;
+ }
+ Method closeMethod;
+ try {
+ // ParquetFileWriter does not implement Closeable in all versions, so
attempt close() reflectively.
+ closeMethod = parquetFileWriter.getClass().getMethod("close");
+ } catch (NoSuchMethodException e) {
+ return;
Review Comment:
🤖 nit: the comment explains *what* the reflection does but not *why* — could
you call out which Parquet versions are missing `close()` here? Otherwise a
future reader is likely to delete the reflection thinking it's gratuitous.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteMergeHandle.java:
##########
@@ -485,10 +483,30 @@ public List<WriteStatus> close() {
return Collections.singletonList(writeStatus);
} catch (IOException e) {
+ closeFileWriterQuietly(e);
throw new HoodieUpsertException("Failed to close UpdateHandle", e);
+ } catch (RuntimeException e) {
+ closeFileWriterQuietly(e);
+ throw e;
+ } finally {
+ keyToNewRecords = null;
+ writtenRecordKeys = null;
Review Comment:
🤖 nit: `keyToNewRecords` / `writtenRecordKeys` are already nulled inside the
try-block on the success path — duplicating the assignment in `finally` is
harmless but a bit noisy. Could the finally just rely on the try-block, or
alternatively drop the success-path assignment and keep only the finally?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]