[
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338000#comment-14338000
]
Lars Hofhansl commented on HBASE-13082:
---------------------------------------
The StoreScanner.next() loop we can simply exit after some time limit with a
empty result but returning true (i.e. more rows expected). That would be the
same that happens when we exhaust the region, the region scanner will continue.
In RegionScanner we could do the same and return a special indicator that the
client just ignores (as described above). I guess what's tricky coprocessors
that wrap a region scanner (such as Phoenix does). They'd have to honor the
protocol and pass the marker results to the client (or at the very least ignore
them).
Let's do that in another jira, though.
This patch will not make things worse in principle. A store scanner can be
stuck exhausting the entire store in a single next(...) call while holding the
lock, prevent flushes from finishing. See extremely long scan times we've seen
have other reasons too - see HBASE-13109.
The only detriment this patch can cause is that one store scanner is stuck this
way, and now prevent other stores in the region from flushing/compacting. (and
note that that is only the case when no Cells in the store are returned by the
store scanner).
> Coarsen StoreScanner locks to RegionScanner
> -------------------------------------------
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Attachments: 13082-test.txt, 13082.txt
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and
> required in the documentation (not picking on Phoenix, this one is my fault
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load.
> RegionScanner operations would keep getting the locks and the
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)