[
https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482631
]
Konstantin Shvachko commented on HADOOP-894:
--------------------------------------------
I understand the problem as that a lot of clients are opening the same file and
read the first block of it,
e.g. in streaming, and then each reads a specific part of the file. So each
client does not need to receive
a block map for the whole file, but rather needs to get block locations in a
specified range.
I propose to modify ClientProtocol.open() to
OpenFileInfo open( String src, int numBlocks )
where
src - is the path;
numBlocks - is the number of blocks, which locations the client wants to be
calculated by the open()
@returns
OpenFileInfo : extends DFSFileInfo {
LocatedBlock[ numBlocks ];
}
DFSFileInfo contains file information including file length and replication.
ClientProtocol should also contain
public LocatedBlock[] getBlockLocations(String src, int offset, int length)
throws IOException;
offset - is the starting offset in the file
length - is the number of bytes the client is supposed to read
class LocatedBlock should include an additional field
+ long startFrom; which determines the offset within the block to the desired
region of bytes.
Then we will need to reimplement seeks and reads for DFSInputStream using that
API.
What would be a good default for the number of blocks that getBlockLocations()
would fetch per call if the file is read from start to finish?
> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>
> Key: HADOOP-894
> URL: https://issues.apache.org/jira/browse/HADOOP-894
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Reporter: Owen O'Malley
> Assigned To: Wendy Chien
>
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
> /** the info for the first block */
> public LocatedBlockInfo getBlockInfo();
> public long getBlockSize();
> public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
> public OpenFileInfo open(String name) throws IOException;
> /** get block info for any range of blocks */
> public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int
> blockLength) throws IOException;
> }
> so that the client can decide how much block info to request and when.
> Currently, when the file is opened or an error occurs, the entire block list
> is requested and sent.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.