[ 
https://issues.apache.org/jira/browse/HBASE-16162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-16162:
-----------------------------------
    Description: 
We have flow like this
{code}
protected void checkActiveSize() {
    if (shouldFlushInMemory()) {
         InMemoryFlushRunnable runnable = new InMemoryFlushRunnable();
      }
      getPool().execute(runnable);
    }
  }
private boolean shouldFlushInMemory() {
    if(getActive().getSize() > inmemoryFlushSize) {
      // size above flush threshold
      return (allowCompaction.get() && !inMemoryFlushInProgress.get());
    }
    return false;
  }

void flushInMemory() throws IOException {
    // Phase I: Update the pipeline
    getRegionServices().blockUpdates();
    try {
      MutableSegment active = getActive();
      pushActiveToPipeline(active);
    } finally {
      getRegionServices().unblockUpdates();
    }
    // Phase II: Compact the pipeline
    try {
      if (allowCompaction.get() && inMemoryFlushInProgress.compareAndSet(false, 
true)) {
        // setting the inMemoryFlushInProgress flag again for the case this 
method is invoked
        // directly (only in tests) in the common path setting from true to 
true is idempotent
        // Speculative compaction execution, may be interrupted if flush is 
forced while
        // compaction is in progress
        compactor.startCompaction();
      }
{code}
So every write of cell will produce the check checkActiveSize().   When we are 
at border of in mem flush,  many threads doing writes to this memstore can get 
this checkActiveSize () to pass.  Yes the AtomicBoolean is still false only. It 
is turned ON after some time once the new thread is started run and it push the 
active to pipeline etc.
In the new thread code of inMemFlush, we dont have any size check. It just 
takes the active segment and pushes that to pipeline. Yes we dont allow any new 
writes to memstore at this time.     But before that write lock on region, 
other handler thread also might have added entry to this thread pool.  When the 
1st one finishes, it releases the lock on region and handler threads trying for 
write to memstore, might get lock and add some data. Now this 2nd in mem flush 
thread may get a chance and get the lock and so it just takes current active 
segment and flush that in memory !    This will produce very small sized 
segments to pipeline.


  was:
We have flow like this
{code}
protected void checkActiveSize() {
    if (shouldFlushInMemory()) {
         InMemoryFlushRunnable runnable = new InMemoryFlushRunnable();
      }
      getPool().execute(runnable);
    }
  }
private boolean shouldFlushInMemory() {
    if(getActive().getSize() > inmemoryFlushSize) {
      // size above flush threshold
      return (allowCompaction.get() && !inMemoryFlushInProgress.get());
    }
    return false;
  }

void flushInMemory() throws IOException {
    // Phase I: Update the pipeline
    getRegionServices().blockUpdates();
    try {
      MutableSegment active = getActive();
      pushActiveToPipeline(active);
    } finally {
      getRegionServices().unblockUpdates();
    }
    // Phase II: Compact the pipeline
    try {
      if (allowCompaction.get() && inMemoryFlushInProgress.compareAndSet(false, 
true)) {
        // setting the inMemoryFlushInProgress flag again for the case this 
method is invoked
        // directly (only in tests) in the common path setting from true to 
true is idempotent
        // Speculative compaction execution, may be interrupted if flush is 
forced while
        // compaction is in progress
        compactor.startCompaction();
      }
{code}
So every write of cell will produce the check checkActiveSize().   When we are 
at border of in mem flush,  many threads doing writes to this memstore can get 
this check to pass.  Yes the AtomicBoolean is still false only. It is turned ON 
after some time once the new thread is started run and it push the active to 
pipeline etc.
The time gap btw the size check return true and the block updates.   In btw all 
updates to memstore would have kicked this in memory flush
So all these threads can push the same active to pipeline.  We unnecessarily 
passing through same segment many times. 


> Unnecessary push active segments to pipeline
> --------------------------------------------
>
>                 Key: HBASE-16162
>                 URL: https://issues.apache.org/jira/browse/HBASE-16162
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>            Priority: Critical
>
> We have flow like this
> {code}
> protected void checkActiveSize() {
>     if (shouldFlushInMemory()) {
>          InMemoryFlushRunnable runnable = new InMemoryFlushRunnable();
>       }
>       getPool().execute(runnable);
>     }
>   }
> private boolean shouldFlushInMemory() {
>     if(getActive().getSize() > inmemoryFlushSize) {
>       // size above flush threshold
>       return (allowCompaction.get() && !inMemoryFlushInProgress.get());
>     }
>     return false;
>   }
> void flushInMemory() throws IOException {
>     // Phase I: Update the pipeline
>     getRegionServices().blockUpdates();
>     try {
>       MutableSegment active = getActive();
>       pushActiveToPipeline(active);
>     } finally {
>       getRegionServices().unblockUpdates();
>     }
>     // Phase II: Compact the pipeline
>     try {
>       if (allowCompaction.get() && 
> inMemoryFlushInProgress.compareAndSet(false, true)) {
>         // setting the inMemoryFlushInProgress flag again for the case this 
> method is invoked
>         // directly (only in tests) in the common path setting from true to 
> true is idempotent
>         // Speculative compaction execution, may be interrupted if flush is 
> forced while
>         // compaction is in progress
>         compactor.startCompaction();
>       }
> {code}
> So every write of cell will produce the check checkActiveSize().   When we 
> are at border of in mem flush,  many threads doing writes to this memstore 
> can get this checkActiveSize () to pass.  Yes the AtomicBoolean is still 
> false only. It is turned ON after some time once the new thread is started 
> run and it push the active to pipeline etc.
> In the new thread code of inMemFlush, we dont have any size check. It just 
> takes the active segment and pushes that to pipeline. Yes we dont allow any 
> new writes to memstore at this time.     But before that write lock on 
> region, other handler thread also might have added entry to this thread pool. 
>  When the 1st one finishes, it releases the lock on region and handler 
> threads trying for write to memstore, might get lock and add some data. Now 
> this 2nd in mem flush thread may get a chance and get the lock and so it just 
> takes current active segment and flush that in memory !    This will produce 
> very small sized segments to pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to