[ 
https://issues.apache.org/jira/browse/HBASE-22293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biju Nair updated HBASE-22293:
------------------------------
    Labels: read-replicas  (was: )

> region server OOM risk when doing replica replication
> -----------------------------------------------------
>
>                 Key: HBASE-22293
>                 URL: https://issues.apache.org/jira/browse/HBASE-22293
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>    Affects Versions: 2.2.0, 2.1.3
>            Reporter: Xinhui Xu
>            Priority: Major
>              Labels: read-replicas
>
> Trying region replica feature recently on version 2.1.3 and 2.2.0. And saw 
> some of region servers were killed due to OOM after changing 
> REGION_REPLICATION from 3 to 2 followed by some put operations. (the 
> operations I performed are kind of similar with 
> https://issues.apache.org/jira/browse/HBASE-20908)
> I looked into the code and had following findings (based on the code of 
> version 2.2.0)
> 1. A WriterThread, which is responsible for writing per region wal entries to 
> target replicas,  would just exit if it meets any exception and write its 
> error to controller. One more thing to note, once an exception is written 
> into the controller it will never be erased.
> {code:java}
> public void run()  {
>   try {
>     doRun();
>   } catch (Throwable t) {
>     LOG.error("Exiting thread", t);
>     controller.writerThreadError(t);
>   }
> }
> private void doRun() throws IOException {
>   LOG.trace("Writer thread starting");
>   while (true) {
>     RegionEntryBuffer buffer = entryBuffers.getChunkToWrite();
>     // ...
>     assert buffer != null;
>     try {
>       writeBuffer(buffer);
>     } finally {
>       entryBuffers.doneWriting(buffer);
>     }
>   }
> }
> {code}
> {code:java}
> public static class PipelineController {
>   AtomicReference<Throwable> thrown = new AtomicReference<>();
>   void writerThreadError(Throwable t) {
>     thrown.compareAndSet(null, t);
>   }
>   void checkForErrors() throws IOException {
>     Throwable thrown = this.thrown.get();
>     if (thrown == null) return;
>     if (thrown instanceof IOException) {
>       throw new IOException(thrown);
>     } else {
>       throw new RuntimeException(thrown);
>     }
>   }
> }
> {code}
> 2. For the replicate() logic in RegionReplicaReplicationEndpoint class, it 
> will repeatedly append the same entry again if it meets any IOException.
> {code:java}
> public boolean replicate(ReplicateContext replicateContext) {
>   while (this.isRunning()) {
>     try {
>       for (Entry entry: replicateContext.getEntries()) {
>         entryBuffers.appendEntry(entry);
>       }
>       // ...
>     } catch (IOException e) {
>       LOG.warn("Received IOException while trying to replicate"
>           + StringUtils.stringifyException(e));
>     }
>   }
> {code}
> 3. While for the appendEntry() logic, it does aim to block the thread when 
> the buffer already used exceed a pre-defined limit, but the buggy part is 
> it's only valid when there is no error in the controller 
> (controller.thrown.get() == null). In other words, once the controller has 
> any error, the blocking logic will lose effect and it will eventually cause 
> OOM.
> {code:java}
> public void appendEntry(Entry entry) throws InterruptedException, IOException 
> {
>   WALKey key = entry.getKey();
>   RegionEntryBuffer buffer;
>   long incrHeap;
>   synchronized (this) {
>     buffer = buffers.get(key.getEncodedRegionName());
>     if (buffer == null) {
>       buffer = new RegionEntryBuffer(key.getTableName(), 
> key.getEncodedRegionName());
>       buffers.put(key.getEncodedRegionName(), buffer);
>     }
>     incrHeap= buffer.appendEntry(entry);
>   }
>   synchronized (controller.dataAvailable) {
>     totalBuffered += incrHeap;
>     while (totalBuffered > maxHeapUsage && controller.thrown.get() == null) {
>       LOG.debug("Used {} bytes of buffered edits, waiting for IO threads", 
> totalBuffered);
>       controller.dataAvailable.wait(2000);
>     }
>     controller.dataAvailable.notifyAll();
>   }
>   controller.checkForErrors();
> }{code}
> 4. The throwable is rooted in the implementation of 
> RegionReplicaSinkWriter.append(). And most of them are region location errors 
> as I see. (again like the one mentioned in 
> https://issues.apache.org/jira/browse/HBASE-20908)
> Below are some of my inmature thoughts on how-to-fix: 
>  # EntryBuffers's appendEntry interface should be a real blocking interface 
> regardless of any errors
>  # a WriterThread is the only consumer of EntryBuffers, it shouldn't exit 
> when seeing any errors, as these errors might be transient and can be later 
> retryed
>  # there is no need to write any error to the controller, except any 
> DoNotRetryExceptions I am not aware of as of now
> just to arouse more discussions
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to