[ 
https://issues.apache.org/jira/browse/ORC-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739256#comment-17739256
 ] 

Dongjoon Hyun commented on ORC-1458:
------------------------------------

Thank you for reporting, [~mamingchen]. Please feel free to make a PR for that.

> reduce namenode getFileinfo rpc 
> --------------------------------
>
>                 Key: ORC-1458
>                 URL: https://issues.apache.org/jira/browse/ORC-1458
>             Project: ORC
>          Issue Type: Wish
>          Components: Java, Reader
>            Reporter: Mingchen_Ma
>            Priority: Minor
>
> In the ReaderImpl.java code, there is the following logic:
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) { 
> FileStatus fileStatus = fs.getFileStatus(path);          
> size = fileStatus. getLen();          
> modificationTime = fileStatus. getModificationTime(); 
> }
> {code}
> The above logic is to obtain the length of the file so as to read the footer 
> of orc. But because of this, when we read the orc file on hdfs, an open 
> operation will cause an additional getFileinfo rpc operation by default 
> (unless we set the file length through ReaderOptions.set before the orc open).
> Because we have opened the file in ReaderImpl, can we optimize the rpc call 
> of NN in the following way (in a high-load cluster, the pressure on the 
> namenode can be significantly reduced):
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) {           
> size = (DFSInputStream)file.getWrappedStream.getFileLength();          
> modificationTime = -1; 
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to