[ 
https://issues.apache.org/jira/browse/HBASE-22293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinhui Xu updated HBASE-22293:
------------------------------
    Description: 
Trying region replica feature recently on version 2.1.3 and 2.2.0. And saw some 
of region servers were killed due to OOM after changing REGION_REPLICATION from 
3 to 2 followed by some put operations. (the operations I performed are kind of 
similar with https://issues.apache.org/jira/browse/HBASE-20908)

I looked into the code and had following findings (based on the code of version 
2.2.0)

1. A WriterThread, which is responsible for writing per region wal entries to 
target replicas,  would just exit if it meets any exception and write its error 
to controller. One more thing to note, once an exception is written into the 
controller it will never be erased.
{code:java}
public void run()  {
  try {
    doRun();
  } catch (Throwable t) {
    LOG.error("Exiting thread", t);
    controller.writerThreadError(t);
  }
}

private void doRun() throws IOException {
  LOG.trace("Writer thread starting");
  while (true) {
    RegionEntryBuffer buffer = entryBuffers.getChunkToWrite();

    // ...

    assert buffer != null;
    try {
      writeBuffer(buffer);
    } finally {
      entryBuffers.doneWriting(buffer);
    }
  }
}
{code}
{code:java}
public static class PipelineController {
  AtomicReference<Throwable> thrown = new AtomicReference<>();

  void writerThreadError(Throwable t) {
    thrown.compareAndSet(null, t);
  }

  void checkForErrors() throws IOException {
    Throwable thrown = this.thrown.get();
    if (thrown == null) return;
    if (thrown instanceof IOException) {
      throw new IOException(thrown);
    } else {
      throw new RuntimeException(thrown);
    }
  }
}
{code}
2. For the replicate() logic in RegionReplicaReplicationEndpoint class, it will 
repeatedly append the same entry again if it meets any IOException.
{code:java}
public boolean replicate(ReplicateContext replicateContext) {
  while (this.isRunning()) {
    try {
      for (Entry entry: replicateContext.getEntries()) {
        entryBuffers.appendEntry(entry);
      }
      // ...
    } catch (IOException e) {
      LOG.warn("Received IOException while trying to replicate"
          + StringUtils.stringifyException(e));
    }
  }
{code}
3. While for the appendEntry() logic, it does aim to block the thread when the 
buffer already used exceed a pre-defined limit, but the buggy part is it's only 
valid when there is no error in the controller (controller.thrown.get() == 
null). In other words, once the controller has any error, the blocking logic 
will lose effect and it will eventually cause OOM.
{code:java}
public void appendEntry(Entry entry) throws InterruptedException, IOException {
  WALKey key = entry.getKey();

  RegionEntryBuffer buffer;
  long incrHeap;
  synchronized (this) {
    buffer = buffers.get(key.getEncodedRegionName());
    if (buffer == null) {
      buffer = new RegionEntryBuffer(key.getTableName(), 
key.getEncodedRegionName());
      buffers.put(key.getEncodedRegionName(), buffer);
    }
    incrHeap= buffer.appendEntry(entry);
  }

  synchronized (controller.dataAvailable) {
    totalBuffered += incrHeap;
    while (totalBuffered > maxHeapUsage && controller.thrown.get() == null) {
      LOG.debug("Used {} bytes of buffered edits, waiting for IO threads", 
totalBuffered);
      controller.dataAvailable.wait(2000);
    }
    controller.dataAvailable.notifyAll();
  }
  controller.checkForErrors();
}{code}
4. The throwable is rooted in the implementation of 
RegionReplicaSinkWriter.append(). And most of them are region location errors 
as I see. (again like the one mentioned in 
https://issues.apache.org/jira/browse/HBASE-20908)

Below are some of my inmature thoughts on how-to-fix: 
 # EntryBuffers's appendEntry interface should be a real blocking interface 
regardless of any errors
 # a WriterThread is the only consumer of EntryBuffers, it shouldn't exit when 
seeing any errors, as these errors might be transient and can be later retryed
 # there is no need to write any error to the controller, except any 
DoNotRetryExceptions I am not aware of as of now

just to arouse more discussions

 

  was:
Trying region replica feature recently on version 2.1.3 and 2.2.0. And saw some 
of region servers were killed due to OOM after changing REGION_REPLICATION from 
3 to 2 followed by some put operations. (the operations I performed are kind of 
similar with https://issues.apache.org/jira/browse/HBASE-20908)

I looked into the code and had following findings (based on the code of version 
2.2.0)

1. A WriterThread, which is responsible for writing per region wal entries to 
target replicas,  would just exit if it meets any exception and write its error 
to controller. One more thing to note, once an exception is written into the 
controller it will never be erased.

 
{code:java}
public void run()  {
  try {
    doRun();
  } catch (Throwable t) {
    LOG.error("Exiting thread", t);
    controller.writerThreadError(t);
  }
}

private void doRun() throws IOException {
  LOG.trace("Writer thread starting");
  while (true) {
    RegionEntryBuffer buffer = entryBuffers.getChunkToWrite();

    // ...

    assert buffer != null;
    try {
      writeBuffer(buffer);
    } finally {
      entryBuffers.doneWriting(buffer);
    }
  }
}
{code}
 
{code:java}
public static class PipelineController {
  AtomicReference<Throwable> thrown = new AtomicReference<>();

  void writerThreadError(Throwable t) {
    thrown.compareAndSet(null, t);
  }

  void checkForErrors() throws IOException {
    Throwable thrown = this.thrown.get();
    if (thrown == null) return;
    if (thrown instanceof IOException) {
      throw new IOException(thrown);
    } else {
      throw new RuntimeException(thrown);
    }
  }
}
{code}
 

2. For the replicate() logic in RegionReplicaReplicationEndpoint class, it will 
repeatedly append the same entry again if it meets any IOException.
{code:java}
public boolean replicate(ReplicateContext replicateContext) {
  while (this.isRunning()) {
    try {
      for (Entry entry: replicateContext.getEntries()) {
        entryBuffers.appendEntry(entry);
      }
      // ...
    } catch (IOException e) {
      LOG.warn("Received IOException while trying to replicate"
          + StringUtils.stringifyException(e));
    }
  }
{code}
3. While for the appendEntry() logic, it does aim to block the thread when the 
buffer already used exceed a pre-defined limit, but the buggy part is it's only 
valid when there is no error in the controller (controller.thrown.get() == 
null). In other words, once the controller has any error, the blocking logic 
will lose effect and it will eventually cause OOM.

 

 
{code:java}
// code placeholder
public void appendEntry(Entry entry) throws InterruptedException, IOException {
  WALKey key = entry.getKey();

  RegionEntryBuffer buffer;
  long incrHeap;
  synchronized (this) {
    buffer = buffers.get(key.getEncodedRegionName());
    if (buffer == null) {
      buffer = new RegionEntryBuffer(key.getTableName(), 
key.getEncodedRegionName());
      buffers.put(key.getEncodedRegionName(), buffer);
    }
    incrHeap= buffer.appendEntry(entry);
  }

  synchronized (controller.dataAvailable) {
    totalBuffered += incrHeap;
    while (totalBuffered > maxHeapUsage && controller.thrown.get() == null) {
      LOG.debug("Used {} bytes of buffered edits, waiting for IO threads", 
totalBuffered);
      controller.dataAvailable.wait(2000);
    }
    controller.dataAvailable.notifyAll();
  }
  controller.checkForErrors();
}{code}
 

 

4. The throwable is rooted in the implementation of 
RegionReplicaSinkWriter.append(). And most of them are region location errors 
as I see. (again like the one mentioned in 
https://issues.apache.org/jira/browse/HBASE-20908)

 

Below are some of my inmature thoughts on how-to-fix: 
 # EntryBuffers's appendEntry interface should be a real blocking interface 
regardless of any errors
 # a WriterThread is the only consumer of EntryBuffers, it shouldn't exit when 
seeing any errors, as these errors might be transient and can be later retryed
 # there is no need to write any error to the controller, except any 
DoNotRetryExceptions I am not aware of as of now

 

just to arouse more discussions

 

 


> region server OOM risk when doing replica replication
> -----------------------------------------------------
>
>                 Key: HBASE-22293
>                 URL: https://issues.apache.org/jira/browse/HBASE-22293
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>    Affects Versions: 2.2.0, 2.1.3
>            Reporter: Xinhui Xu
>            Priority: Major
>
> Trying region replica feature recently on version 2.1.3 and 2.2.0. And saw 
> some of region servers were killed due to OOM after changing 
> REGION_REPLICATION from 3 to 2 followed by some put operations. (the 
> operations I performed are kind of similar with 
> https://issues.apache.org/jira/browse/HBASE-20908)
> I looked into the code and had following findings (based on the code of 
> version 2.2.0)
> 1. A WriterThread, which is responsible for writing per region wal entries to 
> target replicas,  would just exit if it meets any exception and write its 
> error to controller. One more thing to note, once an exception is written 
> into the controller it will never be erased.
> {code:java}
> public void run()  {
>   try {
>     doRun();
>   } catch (Throwable t) {
>     LOG.error("Exiting thread", t);
>     controller.writerThreadError(t);
>   }
> }
> private void doRun() throws IOException {
>   LOG.trace("Writer thread starting");
>   while (true) {
>     RegionEntryBuffer buffer = entryBuffers.getChunkToWrite();
>     // ...
>     assert buffer != null;
>     try {
>       writeBuffer(buffer);
>     } finally {
>       entryBuffers.doneWriting(buffer);
>     }
>   }
> }
> {code}
> {code:java}
> public static class PipelineController {
>   AtomicReference<Throwable> thrown = new AtomicReference<>();
>   void writerThreadError(Throwable t) {
>     thrown.compareAndSet(null, t);
>   }
>   void checkForErrors() throws IOException {
>     Throwable thrown = this.thrown.get();
>     if (thrown == null) return;
>     if (thrown instanceof IOException) {
>       throw new IOException(thrown);
>     } else {
>       throw new RuntimeException(thrown);
>     }
>   }
> }
> {code}
> 2. For the replicate() logic in RegionReplicaReplicationEndpoint class, it 
> will repeatedly append the same entry again if it meets any IOException.
> {code:java}
> public boolean replicate(ReplicateContext replicateContext) {
>   while (this.isRunning()) {
>     try {
>       for (Entry entry: replicateContext.getEntries()) {
>         entryBuffers.appendEntry(entry);
>       }
>       // ...
>     } catch (IOException e) {
>       LOG.warn("Received IOException while trying to replicate"
>           + StringUtils.stringifyException(e));
>     }
>   }
> {code}
> 3. While for the appendEntry() logic, it does aim to block the thread when 
> the buffer already used exceed a pre-defined limit, but the buggy part is 
> it's only valid when there is no error in the controller 
> (controller.thrown.get() == null). In other words, once the controller has 
> any error, the blocking logic will lose effect and it will eventually cause 
> OOM.
> {code:java}
> public void appendEntry(Entry entry) throws InterruptedException, IOException 
> {
>   WALKey key = entry.getKey();
>   RegionEntryBuffer buffer;
>   long incrHeap;
>   synchronized (this) {
>     buffer = buffers.get(key.getEncodedRegionName());
>     if (buffer == null) {
>       buffer = new RegionEntryBuffer(key.getTableName(), 
> key.getEncodedRegionName());
>       buffers.put(key.getEncodedRegionName(), buffer);
>     }
>     incrHeap= buffer.appendEntry(entry);
>   }
>   synchronized (controller.dataAvailable) {
>     totalBuffered += incrHeap;
>     while (totalBuffered > maxHeapUsage && controller.thrown.get() == null) {
>       LOG.debug("Used {} bytes of buffered edits, waiting for IO threads", 
> totalBuffered);
>       controller.dataAvailable.wait(2000);
>     }
>     controller.dataAvailable.notifyAll();
>   }
>   controller.checkForErrors();
> }{code}
> 4. The throwable is rooted in the implementation of 
> RegionReplicaSinkWriter.append(). And most of them are region location errors 
> as I see. (again like the one mentioned in 
> https://issues.apache.org/jira/browse/HBASE-20908)
> Below are some of my inmature thoughts on how-to-fix: 
>  # EntryBuffers's appendEntry interface should be a real blocking interface 
> regardless of any errors
>  # a WriterThread is the only consumer of EntryBuffers, it shouldn't exit 
> when seeing any errors, as these errors might be transient and can be later 
> retryed
>  # there is no need to write any error to the controller, except any 
> DoNotRetryExceptions I am not aware of as of now
> just to arouse more discussions
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to