huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r677506934
##########
File path:
fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long
length) {
currentStreamOffset, offset);
}
}
- ByteBuffer buf;
+ // Avoid using the ByteBuffer based read for Hadoop because some
FSDataInputStream
+ // implementations are not ByteBufferReadable,
+ // See https://issues.apache.org/jira/browse/HADOOP-14603
+ byte[] buf;
if (length > readBufferSize) {
- buf = ByteBuffer.allocate(readBufferSize);
+ buf = new byte[readBufferSize];
Review comment:
Ehh...I think `ByteBuffer` can't solve such a problem, it's only related
to what size of the buffer we inited and whether the buffer can read enough
bytes. In this way, the `ByteBuffer` should same as `byte array`.
I tested it with both `ByteBuffer` and `byte array` and the behaviors are
same when `readBufferSize` is larger than 128kb. All two of them can't read
more the 128k data. Here is the debug code and part of the log.
<pre>
logger.info("read buffer from input stream, request.length " + length + ",
readBufferSize:" + readBufferSize +", buffer size:" + buf.length + ", read
length:" + readLength);
2021-07-27 09:57:04 [ pool-2-thread-4:31261 ] - [ INFO ] read buffer from
input stream, request.length 131072, readBufferSize:1048576, buffer
size:131072, read length:131072
2021-07-27 09:57:04 [ pool-2-thread-4:31268 ] - [ INFO ] read buffer from
input stream, request.length 131072, readBufferSize:1048576, buffer
size:131072, read length:131072
2021-07-27 09:57:04 [ pool-2-thread-4:31273 ] - [ INFO ] read buffer from
input stream, request.length 17612, readBufferSize:1048576, buffer size:17612,
read length:17612
2021-07-27 09:57:04 [ pool-2-thread-4:31275 ] - [ INFO ] read buffer from
input stream, request.length 186, readBufferSize:1048576, buffer size:186, read
length:186
2021-07-27 09:57:04 [ pool-2-thread-4:31277 ] - [ INFO ] read buffer from
input stream, request.length 680, readBufferSize:1048576, buffer size:680, read
length:680
</pre>
I saw the default value of `readBufferSize` is 128k. Maybe there is a tricky
logic in Doris.
<pre>
private int readBufferSize = 128 << 10; // 128k
private int writeBufferSize = 128 << 10; // 128k
</pre>
I guess the root cause is `TBrokerPReadRequest.length` in RPC request is no
more than 128k which is controlled by the client. I have no BE dev env now,
maybe you can help to take a look😁
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]