[ 
https://issues.apache.org/jira/browse/ORC-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739786#comment-17739786
 ] 

Xiaoqiao He commented on ORC-1458:
----------------------------------

Great catch here. It will reduce request load to NameNode when using ORC on 
HDFS obviously. I am concerned if it will involve any potential issues when get 
file size from input stream. From HDFS side, this file size is not including 
the last block if it is under construction. 
https://github.com/apache/hadoop/blob/6042d599042ed12f1c6ccb286c7f1421b34fa126/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirStatAndListingOp.java#L165C23-L165C23
My first guess tell me it could works, but TBH I am not familiar with ORC 
format, so not sure it is true actually. cc [~mamingchen], [~dongjoon] Any 
thought?

> reduce namenode getFileinfo rpc 
> --------------------------------
>
>                 Key: ORC-1458
>                 URL: https://issues.apache.org/jira/browse/ORC-1458
>             Project: ORC
>          Issue Type: Wish
>          Components: Java, Reader
>            Reporter: Mingchen_Ma
>            Priority: Minor
>
> In the ReaderImpl.java code, there is the following logic:
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) { 
> FileStatus fileStatus = fs.getFileStatus(path);          
> size = fileStatus. getLen();          
> modificationTime = fileStatus. getModificationTime(); 
> }
> {code}
> The above logic is to obtain the length of the file so as to read the footer 
> of orc. But because of this, when we read the orc file on hdfs, an open 
> operation will cause an additional getFileinfo rpc operation by default 
> (unless we set the file length through ReaderOptions.set before the orc open).
> Because we have opened the file in ReaderImpl, can we optimize the rpc call 
> of NN in the following way (in a high-load cluster, the pressure on the 
> namenode can be significantly reduced):
> {code:java}
> if (maxFileLength == Long. MAX_VALUE) {           
> size = (DFSInputStream)file.getWrappedStream.getFileLength();          
> modificationTime = -1; 
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to