huozhanfeng commented on a change in pull request #6308:
URL: https://github.com/apache/incubator-doris/pull/6308#discussion_r681770206
##########
File path:
fs_brokers/apache_hdfs_broker/src/main/java/org/apache/doris/broker/hdfs/FileSystemManager.java
##########
@@ -561,22 +561,25 @@ public ByteBuffer pread(TBrokerFD fd, long offset, long
length) {
currentStreamOffset, offset);
}
}
- ByteBuffer buf;
+ // Avoid using the ByteBuffer based read for Hadoop because some
FSDataInputStream
+ // implementations are not ByteBufferReadable,
+ // See https://issues.apache.org/jira/browse/HADOOP-14603
+ byte[] buf;
if (length > readBufferSize) {
- buf = ByteBuffer.allocate(readBufferSize);
+ buf = new byte[readBufferSize];
Review comment:
I have tested a case by using ORC file format file, it can read enough
bytes when `readBufferSize` is larger than 128kb. And in ORC mode, the
limitation of 128k is controlled by logic as follows:
<pre>
file: be/src/exec/orc_scanner.cpp
/**
* Get the natural size for reads.
* @return the number of bytes that should be read at once
*/
uint64_t getNaturalReadSize() const override {
return 128 * 1024;
}
</pre>
See https://github.com/apache/orc/blob/main/c++/src/OrcFile.cc to get more
details.
Part of log as follows
<pre>
2021-08-03 09:33:38 [ pool-2-thread-2:25353 ] - [ INFO ] read buffer from
input stream, request.length 99, readBufferSize:1048576, buffer size:99, read
length:99
2021-08-03 09:33:38 [ pool-2-thread-2:25364 ] - [ INFO ] read buffer from
input stream, request.length 67876, readBufferSize:1048576, buffer size:67876,
read length:67876
2021-08-03 09:33:38 [ pool-2-thread-2:25389 ] - [ INFO ] read buffer from
input stream, request.length 22518, readBufferSize:1048576, buffer size:22518,
read length:22518
2021-08-03 09:33:38 [ pool-2-thread-2:25417 ] - [ INFO ] read buffer from
input stream, request.length 279756, readBufferSize:1048576, buffer
size:279756, read length:279756
2021-08-03 09:33:38 [ pool-2-thread-2:25434 ] - [ INFO ] read buffer from
input stream, request.length 186, readBufferSize:1048576, buffer size:186, read
length:186
</pre>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]