[
https://issues.apache.org/jira/browse/HDDS-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943983#comment-16943983
]
Aravindan Vijayan commented on HDDS-2188:
-----------------------------------------
On discussing with [~msingh], we found the following to be implemented to make
this work correctly. I will be creating separate JIRAs for handling each work
item.
*This JIRA will cover the following*
* The getFileStatus call on the Ozone file system will compute and return an
instance of LocatedFileStatus. This means that it will have the block locations
for the file (if present) as part of the status. This will be used by the
Map-Reduce applications automatically.
* For the Ozone Manager to supplement the getFileStatus with Block info
locations, we need to use a flag like the "refreshPipeline" flag to obtain the
information from the SCM. Whenever the flag is set, OM will get the Block
locations from SCM and include it in the returned File Status.
*New JIRAs will be created for the following.*
* Currently, we get block location info from SCM for every block. This will
lead to multiple SCM RPC calls to get the blocks for 1 file. We can implement a
batch GET API for SCM using which we can get the Block info locations for all
the blocks for a file.
* As an optimization, we can cache the block info in FileSystem client layer so
that we can reuse them instead of making a call to RPC. An expiry based Guava
cache is one candidate.
> Implement LocatedFileStatus & getFileBlockLocations to provide
> node/localization information to Yarn/Mapreduce
> --------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-2188
> URL: https://issues.apache.org/jira/browse/HDDS-2188
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Filesystem
> Affects Versions: 0.5.0
> Reporter: Mukul Kumar Singh
> Assignee: Aravindan Vijayan
> Priority: Major
>
> For applications like Hive/MapReduce to take advantage of the data locality
> in Ozone, Ozone should return the location of the Ozone blocks. This is
> needed for better read performance for Hadoop Applications.
> {code}
> if (file instanceof LocatedFileStatus) {
> blkLocations = ((LocatedFileStatus) file).getBlockLocations();
> } else {
> blkLocations = fs.getFileBlockLocations(file, 0, length);
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]