[
https://issues.apache.org/jira/browse/ORC-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739256#comment-17739256
]
Dongjoon Hyun commented on ORC-1458:
------------------------------------
Thank you for reporting, [~mamingchen]. Please feel free to make a PR for that.
> reduce namenode getFileinfo rpc
> --------------------------------
>
> Key: ORC-1458
> URL: https://issues.apache.org/jira/browse/ORC-1458
> Project: ORC
> Issue Type: Wish
> Components: Java, Reader
> Reporter: Mingchen_Ma
> Priority: Minor
>
> In the ReaderImpl.java code, there is the following logic:
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) {
> FileStatus fileStatus = fs.getFileStatus(path);
> size = fileStatus. getLen();
> modificationTime = fileStatus. getModificationTime();
> }
> {code}
> The above logic is to obtain the length of the file so as to read the footer
> of orc. But because of this, when we read the orc file on hdfs, an open
> operation will cause an additional getFileinfo rpc operation by default
> (unless we set the file length through ReaderOptions.set before the orc open).
> Because we have opened the file in ReaderImpl, can we optimize the rpc call
> of NN in the following way (in a high-load cluster, the pressure on the
> namenode can be significantly reduced):
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) {
> size = (DFSInputStream)file.getWrappedStream.getFileLength();
> modificationTime = -1;
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)