[ https://issues.apache.org/jira/browse/HBASE-20361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuki Tawara updated HBASE-20361: -------------------------------- Attachment: HBASE-20361.master.002.patch > Non-successive TableInputSplits may wrongly be merged by auto balancing > feature > ------------------------------------------------------------------------------- > > Key: HBASE-20361 > URL: https://issues.apache.org/jira/browse/HBASE-20361 > Project: HBase > Issue Type: Bug > Components: mapreduce > Reporter: Yuki Tawara > Priority: Major > Attachments: HBASE-20361.master.001.patch, > HBASE-20361.master.002.patch > > > TableInputFormatBase class offers users a mechanism to exclude specific > splits from returned list of TableInputFormatBase#getSplits through > TableInputFormatBase#includeRegionInSplit. > It also offers users a feature called "auto balancing" to mitigate data skew > by splitting large splits and merging small splits. > If a user overrides TableInputFormatBase#includeRegionInSplit, i th split and > i+1 th split may not be successive(i th split's end key is smaller than i+1 > th split's start key). > If he or she further enable auto balancing feature, non-successive splits can > be merged, which means excluded splits between merged non-successive splits > "revive". > To avoid such cases, we should not merge non-successive splits. -- This message was sent by Atlassian JIRA (v7.6.3#76005)