[
https://issues.apache.org/jira/browse/HBASE-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582923#comment-15582923
]
Umesh Agashe commented on HBASE-16789:
--------------------------------------
[~busbey], here are a few points that are discussed:
* This is an offline Compaction Tool (CT). Without MR option, CT will compact
files for input table/ region/ column family on local node where CT is run.
* Current CT, decides on node to run MR jobs based on location of first block
of a first file in an input directory.
* This can be improved to consider nodes based on last know region assignments
with fallback on location of first block of first file in a table/ region/
column family. This will provide better locality.
* Even with the improved logic, locality cannot be guaranteed.
* So, whether to run with MR and MR job node selection can be determined by
code outside of CT or a User. CT will be just responsible for compaction of
files for input table/ region/ cf without deciding on MR or node selection for
MR.
* CT may query/ consider local regions and only compact files belonging to
local regions. Workaround with -force option can be provided for the default
behavior.
> Remove directory layout/ filesystem references from CompactionTool
> ------------------------------------------------------------------
>
> Key: HBASE-16789
> URL: https://issues.apache.org/jira/browse/HBASE-16789
> Project: HBase
> Issue Type: Sub-task
> Components: Filesystem Integration
> Reporter: Umesh Agashe
> Assignee: Umesh Agashe
> Attachments: HBASE-16789-hbase-14439.v1.patch
>
>
> Remove directory layout/ filesystem references from CompactionTool and use
> APIs provided by MasterStorage/ RegionStorage instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)