[
https://issues.apache.org/jira/browse/HBASE-17712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890163#comment-15890163
]
Duo Zhang commented on HBASE-17712:
-----------------------------------
This not [~tedyu]'s fault. Skimmed the comments in HBASE-16304, I do not think
he knew the reason why we call dropMemstoreContents() for increment and append
either. He just follow the old behavior. We can refreshStoreFiles in
handleFileNotFound, and refreshStoreFiles will call dropMemstoreContents before
HBASE-16304. And in append and increment we acquire write lock so it could lead
to a dead lock then he moved the dropMemstoreContents out of write lock
protection. The dropMemstoreContents is part of refreshStoreFiles, we split it
into two pieces to avoid dead lock.
'refreshStoreFiles' is designed to be used by secondary replica only, and we
reuse it in HBASE-13651 to handle FileNotFoundException. This is the root cause.
> Remove/Simplify the logic of RegionScannerImpl.handleFileNotFound
> -----------------------------------------------------------------
>
> Key: HBASE-17712
> URL: https://issues.apache.org/jira/browse/HBASE-17712
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0, 1.4.0
> Reporter: Duo Zhang
> Fix For: 2.0.0, 1.4.0
>
>
> It is introduced in HBASE-13651 and the logic became much more complicated
> after HBASE-16304 due to a dead lock issue. It is really tough as sequence id
> is involved in and the method we called is used to serve secondary replica
> originally which does not handle write.
> In fact, in 1.x release, the problem described in HBASE-13651 is gone. Now we
> will write a compaction marker to WAL before deleting the compacted files. We
> can only consider a RS as dead after its WAL files are all closed so if the
> region has already been reassigned the compaction will fail as we can not
> write out the compaction marker.
> So theoretically, if we still hit FileNotFound exception, it should be a
> critical bug which means we may loss data. I do not think it is a good idea
> to just eat the exception and refresh store files. Or even if we want to do
> this, we can just refresh store files without dropping memstore contents.
> This will also simplify the logic a lot.
> Suggestions are welcomed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)