hudi-agent commented on code in PR #17785:
URL: https://github.com/apache/hudi/pull/17785#discussion_r3346422725
##########
hudi-hadoop-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java:
##########
@@ -207,23 +206,29 @@ private int getLogBlockLength(int contentLength, int
headerLength, int footerLen
private void rolloverIfNeeded() throws IOException {
// Roll over if the size is past the threshold
- if (getCurrentSize() > sizeThreshold) {
- log.info("CurrentSize {} has reached threshold {}. Rolling over to the
next version", getCurrentSize(), sizeThreshold);
+ if (getCurrentSize() > getSizeThreshold()) {
+ log.info("CurrentSize {} has reached threshold {}. Rolling over to the
next version", getCurrentSize(), getSizeThreshold());
rollOver();
}
}
private void rollOver() throws IOException {
closeStream();
- this.logFile = logFile.rollOver(rolloverLogWriteToken);
+ this.logFile = getLogFile().rollOver(getLogWriteToken());
this.closed = false;
}
private void createNewFile() throws IOException {
- fileCreationHook.preFileCreation(this.logFile);
- this.output = new FSDataOutputStream(
- storage.create(this.logFile.getPath(), false, bufferSize, replication,
WriterBuilder.DEFAULT_SIZE_THRESHOLD),
- new FileSystem.Statistics(storage.getScheme())
+ getFileCreationCallback().preFileCreation(this.getLogFile());
+ this.outputStream = new FSDataOutputStream(
+ getStorage().create(
+ this.getLogFile().getPath(),
+ false,
+ getBufferSize(),
+ getReplication(),
Review Comment:
🤖 Heads-up: this changes the HDFS block size argument from the previous
fixed `WriterBuilder.DEFAULT_SIZE_THRESHOLD` (512 MB) to the user-configurable
`getSizeThreshold()`. Since `HoodieHadoopStorage.create()` passes this through
to `FileSystem.create(..., blockSize, ...)`, a user (or test) setting a small
`withSizeThreshold(...)` would now create the underlying file with that tiny
block size, which HDFS rejects when it falls below
`dfs.namenode.fs-limits.min-block-size` (default 1 MB). Could you keep using
the 512 MB constant here as the block size, since that's a different concept
from the log rollover threshold? @yihua
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormat.java:
##########
@@ -60,31 +60,140 @@ public interface HoodieLogFormat {
String DEFAULT_WRITE_TOKEN = "0-0-0";
- String DEFAULT_LOG_FORMAT_WRITER =
"org.apache.hudi.common.table.log.HoodieLogFormatWriter";
-
/**
- * Writer interface to allow appending block to this file format.
+ * Abstract base class for appending blocks to the Hoodie log format.
+ * Subclasses provide specific implementations for writing to different
storage layers.
*/
- interface Writer extends Closeable {
+ @Getter
+ @Slf4j
+ abstract class Writer implements Closeable {
+
+ // Default max log file size 512 MB
+ public static final long DEFAULT_SIZE_THRESHOLD = 512 * 1024 * 1024L;
+
+ // Buffer size
+ protected Integer bufferSize;
+ // FileSystem
+ protected HoodieStorage storage;
+ // Size threshold for the log file. Useful when used with a rolling log
appender
+ protected Long sizeThreshold;
+ // Log File extension. Could be .avro.delta or .avro.commits etc
+ protected String fileExtension;
+ // File Id
+ protected String logFileId;
+ // File Commit Time stamp
+ protected String instantTime;
+ // version number for this log file. If not specified, then the current
version will be
+ // computed by inspecting the file system
+ protected Integer logVersion;
+ // file len of this log file
Review Comment:
🤖 nit: the comment just above (`// file len of this log file`) is a leftover
from when this field was called `fileLen` — could you update it to match the
new `fileSize` name? On the same line, the `= 0L` inline initializer is
redundant since the constructor always assigns `this.fileSize = fileSize !=
null ? fileSize : 0L;`; all other fields in this class have no inline default,
so this one stands out and could mislead a reader into thinking the default is
set here rather than in the constructor.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java:
##########
@@ -311,9 +312,9 @@ List<Pair<String, HoodieRollbackStat>>
maybeDeleteAndCollectStats(HoodieEngineCo
// Let's emit markers for rollback as well. markers are emitted
under rollback instant time.
WriteMarkers writeMarkers =
WriteMarkersFactory.get(config.getMarkersType(), table, instantTime);
- HoodieLogFormat.WriterBuilder writerBuilder =
HoodieLogFormat.newWriterBuilder()
-
.onParentPath(FSUtils.constructAbsolutePath(metaClient.getBasePath(),
partitionPath))
- .withFileId(fileId)
+ HoodieLogFormatWriter.HoodieLogFormatWriterBuilder writerBuilder =
HoodieLogFormatWriter.builder()
Review Comment:
🤖 nit: `HoodieLogFormatWriter.HoodieLogFormatWriterBuilder` is a
Lombok-generated inner-class name leaking into production code — callers are
now coupled to that implementation detail. The old code used
`HoodieLogFormat.WriterBuilder` as a stable, publicly-declared type. Could we
keep a named builder type in `HoodieLogFormat` (even a simple alias or thin
interface) so call sites like this one don't need to reference the Lombok
internals?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]