[
https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957075#comment-13957075
]
Sergey Shelukhin commented on HBASE-10216:
------------------------------------------
Was the HDFS issue ever filed?
> Change HBase to support local compactions
> -----------------------------------------
>
> Key: HBASE-10216
> URL: https://issues.apache.org/jira/browse/HBASE-10216
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Environment: All
> Reporter: David Witten
>
> As I understand it compactions will read data from DFS and write to DFS.
> This means that even when the reading occurs on the local host (because
> region server has a local copy) all the writing must go over the network to
> the other replicas. This proposal suggests that HBase would perform much
> better if all the reading and writing occurred locally and did not go over
> the network.
> I propose that the DFS interface be extended to provide method that would
> merge files so that the merging and deleting can be performed on local data
> nodes with no file contents moving over the network. The method would take a
> list of paths to be merged and deleted and the merged file path and an
> indication of a file-format-aware class that would be run on each data node
> to perform the merge. The merge method provided by this merging class would
> be passed files open for reading for all the files to be merged and one file
> open for writing. The custom class provided merge method would read all the
> input files and append to the output file using some standard API that would
> work across all DFS implementations. The DFS would ensure that the merge had
> happened properly on all replicas before returning to the caller. It could
> be that greater resiliency could be achieved by implementing the deletion as
> a separate phase that is only done after enough of the replicas had completed
> the merge.
> HBase would be changed to use the new merge method for compactions, and would
> provide an implementation of the merging class that works with HFiles.
> This proposal would require a custom code that understands the file format to
> be runnable by the data nodes to manage the merge. So there would need to be
> a facility to load classes into DFS if there isn't such a facility already.
> Or, less generally, HDFS could build in support for HFile merging.
> The merge method might be optional. If the DFS implementation did not
> provide it a generic version that performed the merge on top of the regular
> DFS interfaces would be used.
> It may be that this method needs to be tweaked or ignored when the region
> server does not have a local copy data so that, as happens currently, one
> copy of the data moves to the region server.
--
This message was sent by Atlassian JIRA
(v6.2#6252)