[ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339788#comment-14339788
 ] 

Lars Hofhansl commented on HBASE-13082:
---------------------------------------

We have three problems:
# StoreScanner is locked too often
# If StoreScanner.next(List<Cell>) does not find any Cells (for example if they 
do not match timerange or filter) it will exhaust the entire store while 
holding the lock, preventing flushes/compactions from finishing
# Client can timeout even though the server is still working, because the 
server does not currently indicate that it is working but just not returning 
anything.

This patch is for #1. We can fix #2 in many cases by just returning an empty 
result after some number of iterations - but we *only* do that if we not found 
any Cells for the current row, otherwise we need to finish the row, i.e.  find 
the next row (which of course could then exhaust the region if we're unlucky).
But note that the solution for #2 would *clash* with this patch. With this 
patch it is no longer the lock on StoreScanner that protects it from concurrent 
flushes, but the synchronized on RegionScannerImpl, and that we cannot easily 
let without actually returning something back to the client.
#3 would only work with HBASE-11544 since we still need to be able to guarantee 
entire rows to the client, but if we break out of the loops because we did not 
find any Cell after some time we do not know whether we do a whole row or not.

So in reality all these things look like need to be fixed together. Given that 
neither #2 nor #3 can be satisfactorily fixed without HBASE-11544, I propose 
doing a bit more testing on patch, and then committing this here. Then we fix 
#3 (which would incidentally also fix #2 after HBASE-11544 is in).


> Coarsen StoreScanner locks to RegionScanner
> -------------------------------------------
>
>                 Key: HBASE-13082
>                 URL: https://issues.apache.org/jira/browse/HBASE-13082
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>         Attachments: 13082-test.txt, 13082.txt
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to