Jihoon Son created HDFS-4931:
--------------------------------
Summary: Extend the block placement policy interface to utilize
the location information of previously stored files
Key: HDFS-4931
URL: https://issues.apache.org/jira/browse/HDFS-4931
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Jihoon Son
Nowadays, I'm implementing a locality preserving block placement policy which
stores files in a directory in the same datanode. That is to say, given a root
directory, files under the root directory are grouped by paths of their parent
directories. After that, files of a group are stored in the same datanode.
When a new file is stored at HDFS, the block placement policy choose the target
datanode considering locations of previously stored files.
In the current block placement policy interface, there are some problems. The
first problem is that there is no interface to keep the previously stored files
when HDFS is restarted. To restore the location information of all files, this
process should be done during the safe mode of the namenode.
To solve the first problem, I modified the block placement policy interface and
FSNamesystem. Before leaving the safe mode, every necessary location
information is sent to the block placement policy.
However, there are too much changes of access modifiers from private to public
in my implementation. This may violate the design of the interface.
The second problem is occurred when some blocks are moved by the balancer or
node failures. In this case, the block placement policy should recognize the
current status, and return a new datanode to move blocks. However, the current
interface does not support it.
The attached patch is to solve the first problem, but as mentioned above, it
may violate the design of the interface.
Do you have any good ideas?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira