[
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339788#comment-14339788
]
Lars Hofhansl commented on HBASE-13082:
---------------------------------------
We have three problems:
# StoreScanner is locked too often
# If StoreScanner.next(List<Cell>) does not find any Cells (for example if they
do not match timerange or filter) it will exhaust the entire store while
holding the lock, preventing flushes/compactions from finishing
# Client can timeout even though the server is still working, because the
server does not currently indicate that it is working but just not returning
anything.
This patch is for #1. We can fix #2 in many cases by just returning an empty
result after some number of iterations - but we *only* do that if we not found
any Cells for the current row, otherwise we need to finish the row, i.e. find
the next row (which of course could then exhaust the region if we're unlucky).
But note that the solution for #2 would *clash* with this patch. With this
patch it is no longer the lock on StoreScanner that protects it from concurrent
flushes, but the synchronized on RegionScannerImpl, and that we cannot easily
let without actually returning something back to the client.
#3 would only work with HBASE-11544 since we still need to be able to guarantee
entire rows to the client, but if we break out of the loops because we did not
find any Cell after some time we do not know whether we do a whole row or not.
So in reality all these things look like need to be fixed together. Given that
neither #2 nor #3 can be satisfactorily fixed without HBASE-11544, I propose
doing a bit more testing on patch, and then committing this here. Then we fix
#3 (which would incidentally also fix #2 after HBASE-11544 is in).
> Coarsen StoreScanner locks to RegionScanner
> -------------------------------------------
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Attachments: 13082-test.txt, 13082.txt
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and
> required in the documentation (not picking on Phoenix, this one is my fault
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load.
> RegionScanner operations would keep getting the locks and the
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)