[ 
https://issues.apache.org/jira/browse/HBASE-21503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694509#comment-16694509
 ] 

Wellington Chevreuil commented on HBASE-21503:
----------------------------------------------

{quote}
So the problem here is that, we call WALListener before we actually add the WAL 
into the WAL list? Maybe we need a read write lock here...
{quote}
Yeah, in AbstractFSWALProvider.getWAL, when we find current WAL is null, we 
create new WAL then call WAL.init(), which in case of AbstractFSWAL will follow 
this sequence AbstractFSWAL.init() -> AbstractFSWAL.rollWriter() -> 
AbstractFSWAL.tellListenersAboutPostLogRoll(), where registered wal listeners 
are notified. Since ReplicationSourceManager is one of the listeners, it will 
enqueue the path for newly created log to related ReplicationSource, and that 
will trigger the wal reader thread, that may get into getWALs while wal 
instance in AbstractFSWALProvider is still null.

{quote}The intention for no exceptions for getWALs method is that, it aims to 
get the WALs currently in use, so we should not try to create a WAL in getWALs 
method.
{quote}
I get that, but current  AbstractFSWALProvider.getWAL implementation already 
returns current wal in use, it just creates a new WAL if current is null, so 
maybe we don't really new the penalty of introducing more synchronization 
points. That's why my original patch just implemented the simple reuse of 
getWAL.

> Replication normal source can get stuck due potential race conditions between 
> source wal reader and wal provider initialization threads.
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21503
>                 URL: https://issues.apache.org/jira/browse/HBASE-21503
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>             Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
>         Attachments: HBASE-21503-master.001.patch, HBASE-21503.patch
>
>
> Noticed replication sources could get stuck while doing some tests that 
> involved RS restart. On these cases, upon RS restart, the newly created 
> normal source was reaching wal end and not recognising it was open for write, 
> what leads to remove it from source queue. Thus, no new OP get's replicated 
> unless this log goes to a recovery queue.
> Checking this further, my understanding is that, during restart, RS will 
> start replication services, which inits ReplicationSourceManager and 
> ReplicationSources for each wal group id, in below sequence:
> {noformat}
> HRegionServer -> Replication.startReplicationService() -> 
> ReplicationSourceManager.init() -> add ReplicationSource 
> {noformat}
> At this point, ReplicationSources have no paths yet, so WAL reader thread is 
> not running. ReplicationSourceManager is registered as a WAL listener, in 
> order to get notified whenever new wal file is available. During 
> ReplicationSourceManager and ReplicationSource instances creation, a 
> WALFileLengthProvider instance is obtained from WALProvider and cached by 
> both ReplicationSourceManager and ReplicationSource. The default 
> implementation for this WALFileLengthProvider is below, on WALProvider 
> interface:
> {noformat}
> default WALFileLengthProvider getWALFileLengthProvider() {
>     return path -> getWALs().stream().map(w -> 
> w.getLogFileSizeIfBeingWritten(path))
>         .filter(o -> o.isPresent()).findAny().orElse(OptionalLong.empty());
>   } 
> {noformat}
> Notice that if WALProvider.getWALs returns an empty list, this 
> WALFileLengthProvider instance is always going to return nothing. This is 
> relevant because when ReplicationSource finally starts 
> ReplicationSourceWALReader thread, it passes this WALFileLengthProvider, 
> which is used by WALEntryStream (inside the wal reader) to determine if wal 
> is being written (and should be kept in the queue) here:
> {noformat}
>   private void tryAdvanceEntry() throws IOException {
>     if (checkReader()) {
>       boolean beingWritten = readNextEntryAndRecordReaderPosition();
>       LOG.trace("reading wal file {}. Current open for write: {}", 
> this.currentPath, beingWritten);
>       if (currentEntry == null && !beingWritten) {
>         // no more entries in this log file, and the file is already closed, 
> i.e, rolled
>         // Before dequeueing, we should always get one more attempt at 
> reading.
>         // This is in case more entries came in after we opened the reader, 
> and the log is rolled
>         // while we were reading. See HBASE-6758
>         resetReader();
>         readNextEntryAndRecordReaderPosition();
>         if (currentEntry == null) {
>           if (checkAllBytesParsed()) { // now we're certain we're done with 
> this log file
>             dequeueCurrentLog();
>             if (openNextLog()) {
>               readNextEntryAndRecordReaderPosition();
>             }
>           }
>         }
>       }
> ...
> {noformat}
> Here code snippet for WALEntryStream.readNextEntryAndRecordReaderPosition() 
> method that relies on the WALFileLengthProvider:
> {noformat}
> ...
> #1   OptionalLong fileLength = 
> walFileLengthProvider.getLogFileSizeIfBeingWritten(currentPath);
>     if (fileLength.isPresent() && readerPos > fileLength.getAsLong()) {
>       // see HBASE-14004, for AsyncFSWAL which uses fan-out, it is possible 
> that we read uncommitted
>       // data, so we need to make sure that we do not read beyond the 
> committed file length.
>       if (LOG.isDebugEnabled()) {
>         LOG.debug("The provider tells us the valid length for " + currentPath 
> + " is " +
>             fileLength.getAsLong() + ", but we have advanced to " + 
> readerPos);
>       }
>       resetReader();
>       return true;
>     }
>     if (readEntry != null) {
>       metrics.incrLogEditsRead();
>       metrics.incrLogReadInBytes(readerPos - currentPositionOfEntry);
>     }
>     currentEntry = readEntry; // could be null
>     this.currentPositionOfReader = readerPos;
>     return fileLength.isPresent();
> ...
> {noformat}
> The problem can occur because when wal file is indeed created in 
> AbstractFSWALProvider.getWAL() method (snippet shown below), line marked as 
> #2 in below snippet triggers notification of registered WALListeners, 
> including ReplicationSourceManager, which will start 
> ReplicationSourceWALReader thread. If ReplicationSourceWALReader thread 
> reaches the point #1 from snippet above before the thread running 
> AbstractFSWALProvider.getWAL() method gets to point #3 from below snippet, 
> then WALFileLengthProvider will return empty and the wal will not be 
> considered as open, causing it to be dequeued:
> {noformat}
>   @Override
>   public T getWAL(RegionInfo region) throws IOException {
>     T walCopy = wal;
>     if (walCopy == null) {
>       // only lock when need to create wal, and need to lock since
>       // creating hlog on fs is time consuming
>       synchronized (walCreateLock) {
>         walCopy = wal;
>         if (walCopy == null) {
>           walCopy = createWAL();
>           boolean succ = false;
>           try {
> #2            walCopy.init();
>             succ = true;
>           } finally {
>             if (!succ) {
>               walCopy.close();
>             }
>           }
>  #3         wal = walCopy;
>         }
>       }
>     }
>     return walCopy;
>   }
> {noformat}
> This can be sorted by making AbstractFSWALProvider.getWALs reuse 
> AbstractFSWALProvider.getWAL method to obtain the WAL instance. Do we really 
> have scenarios where we want to return no WAL instance? Another possibility 
> could be to synchronize getWALs on same lock currently used by getWAL.
> Am proposing an initial patch with the 1st solution, after some tests, it 
> does seem to be enough to sort the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to