[ https://issues.apache.org/jira/browse/HADOOP-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703520#comment-17703520 ]
ASF GitHub Bot commented on HADOOP-18458: ----------------------------------------- wujinhu commented on code in PR #4912: URL: https://github.com/apache/hadoop/pull/4912#discussion_r1144329309 ########## hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSBlockOutputStream.java: ########## @@ -138,64 +168,74 @@ public synchronized void write(int b) throws IOException { @Override public synchronized void write(byte[] b, int off, int len) throws IOException { - if (closed) { - throw new IOException("Stream closed."); + OSSDataBlocks.validateWriteArgs(b, off, len); + checkOpen(); + if (len == 0) { + return; } - blockStream.write(b, off, len); - blockWritten += len; - if (blockWritten >= blockSize) { - uploadCurrentPart(); - blockWritten = 0L; + OSSDataBlocks.DataBlock block = createBlockIfNeeded(); + int written = block.write(b, off, len); + blockWritten += written; + int remainingCapacity = block.remainingCapacity(); + if (written < len) { + // not everything was written — the block has run out + // of capacity + // Trigger an upload then process the remainder. + LOG.debug("writing more data than block has capacity -triggering upload"); + uploadCurrentBlock(); + // tail recursion is mildly expensive, but given buffer sizes must be MB. + // it's unlikely to recurse very deeply. + this.write(b, off + written, len - written); Review Comment: Good suggestion, will optimize the code. > AliyunOSS: AliyunOSSBlockOutputStream to support heap/off-heap buffer before > uploading data to OSS > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-18458 > URL: https://issues.apache.org/jira/browse/HADOOP-18458 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss > Affects Versions: 3.0.3, 3.1.4, 2.10.2, 3.2.4, 3.3.4 > Reporter: wujinhu > Assignee: wujinhu > Priority: Major > Labels: pull-request-available > > Recently, our customers raise a requirement: AliyunOSSBlockOutputStream > should support heap/off-heap buffer before uploading data to OSS. > Currently, AliyunOSSBlockOutputStream buffers data in local directory before > uploading to OSS, it is not efficient compared to memory. > Changes: > # Adds heap/off-heap buffers > # Adds limitation of memory used, and fallback to disk -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org