[jira] [Updated] (HBASE-26242) Allow split when store file count larger than the configured blocking file count

Xiaolin Ha (Jira) Tue, 01 Mar 2022 00:56:05 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-26242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiaolin Ha updated HBASE-26242:
-------------------------------
    Description: 
Currently, region will not split when the number of store files is up to the 
configed blocking count, by `hbase.hstore.blockingStoreFiles`.
The relevant codes are as follows, 

the CompactSplit#requestSplit() (called by the MemstoreFlusher and 
CompactionRunner) checks the compaction priority of the region, if the compact 
priority < PRIORITY_USER, the region will not split.
{code:java}
public synchronized boolean requestSplit(final Region r) {
  // don't split regions that are blocking
  HRegion hr = (HRegion)r;
  try {
    if (shouldSplitRegion() && hr.getCompactPriority() >= PRIORITY_USER) {
      byte[] midKey = hr.checkSplit().orElse(null);
      if (midKey != null) {
        requestSplit(r, midKey);
        return true;
      }
    }
.... {code}
But the region's compact priority is the minimum of all the stores, when the 
number of storefiles in a store is larger than the configed 
`hbase.hstore.blockingStoreFiles`, the compact priority will be a negative 
number, while PRIORITY_USER = 1.
{code:java}
public int getStoreCompactionPriority() {
  int priority = blockingFileCount - storefiles.size();
  return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority;
} {code}
As a result, when a region size is up to the split limit, but its speed of 
reducing the number of files through compaction is slower than the speed of 
generating new files(e.g. compacting L0 files to stripes, bulk load, flush 
memstore), the region will never split. 

The problem is obvious in StripeStoreEngine, though memstore flushing is 
pending when store file count up to the blocking count, each L0 compaction may 
generate the stripe count new files to each stripe. And in this scenario, since 
the store always compact priority to split, the stripe count is larger and 
larger, the new files generated by compact is more and more, no split in the 
end. While split can divide the compaction pressure(1 parent compaction + 2 
children compaction can be reduced to 2 children compaction).

We can add a configuration to enable split when blocking, not only keep the 
origin behavior but also support flexible control. 

 

  was:
In the requestSplit() function (called by the MemstoreFlusher and 
CompactionRunner) for a region, it will check the compaction priority of the 
region. If compact priority < PRIORITY_USER , it will not split.
{code:java}
public synchronized boolean requestSplit(final Region r) {
  // don't split regions that are blocking
  HRegion hr = (HRegion)r;
  try {
    if (shouldSplitRegion() && hr.getCompactPriority() >= PRIORITY_USER) {
      byte[] midKey = hr.checkSplit().orElse(null);
      if (midKey != null) {
        requestSplit(r, midKey);
        return true;
      }
    }
....{code}
But the region's compact priority is the minimum of all the stores, when the 
number of storefiles in a store is larger than the configed 
`hbase.hstore.blockingStoreFiles`, the priority will be a negative number, but 
the compared priority in requestSplit() is 1(PRIORITY_USER).
{code:java}
public int getStoreCompactionPriority() {
  int priority = blockingFileCount - storefiles.size();
  return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority;
}
{code}
As a result, when a region should split, but its speed of reducing the number 
of files through compaction is slower than the speed of generating new 
files(e.g. compacting L0 files to stripes, bulk load, flush memstore), the 
region will never split. While split can divide the compaction pressure(1 
parent compaction + 2 children compaction can be reduced to 2 children 
compaction).

The problem is obvious in StripeStoreEngine, though memstore flushing is 
pending when store file count up to the blocking count, each L0 compaction may 
generate the stripe count new files to each stripe. And in this scenario, since 
the store always compact priority to split, the stripe count is larger and 
larger, the new files generated by compact is more and more, no split in the 
end...

 

 


> Allow split when store file count larger than the configured blocking file 
> count
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-26242
>                 URL: https://issues.apache.org/jira/browse/HBASE-26242
>             Project: HBase
>          Issue Type: Wish
>    Affects Versions: 3.0.0-alpha-1, 1.4.0, 2.0.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>             Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> Currently, region will not split when the number of store files is up to the 
> configed blocking count, by `hbase.hstore.blockingStoreFiles`.
> The relevant codes are as follows, 
> the CompactSplit#requestSplit() (called by the MemstoreFlusher and 
> CompactionRunner) checks the compaction priority of the region, if the 
> compact priority < PRIORITY_USER, the region will not split.
> {code:java}
> public synchronized boolean requestSplit(final Region r) {
>   // don't split regions that are blocking
>   HRegion hr = (HRegion)r;
>   try {
>     if (shouldSplitRegion() && hr.getCompactPriority() >= PRIORITY_USER) {
>       byte[] midKey = hr.checkSplit().orElse(null);
>       if (midKey != null) {
>         requestSplit(r, midKey);
>         return true;
>       }
>     }
> .... {code}
> But the region's compact priority is the minimum of all the stores, when the 
> number of storefiles in a store is larger than the configed 
> `hbase.hstore.blockingStoreFiles`, the compact priority will be a negative 
> number, while PRIORITY_USER = 1.
> {code:java}
> public int getStoreCompactionPriority() {
>   int priority = blockingFileCount - storefiles.size();
>   return (priority == HStore.PRIORITY_USER) ? priority + 1 : priority;
> } {code}
> As a result, when a region size is up to the split limit, but its speed of 
> reducing the number of files through compaction is slower than the speed of 
> generating new files(e.g. compacting L0 files to stripes, bulk load, flush 
> memstore), the region will never split. 
> The problem is obvious in StripeStoreEngine, though memstore flushing is 
> pending when store file count up to the blocking count, each L0 compaction 
> may generate the stripe count new files to each stripe. And in this scenario, 
> since the store always compact priority to split, the stripe count is larger 
> and larger, the new files generated by compact is more and more, no split in 
> the end. While split can divide the compaction pressure(1 parent compaction + 
> 2 children compaction can be reduced to 2 children compaction).
> We can add a configuration to enable split when blocking, not only keep the 
> origin behavior but also support flexible control. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HBASE-26242) Allow split when store file count larger than the configured blocking file count

Reply via email to