[jira] [Commented] (HBASE-18703) Inconsistent behavior for preBatchMutate in doMiniBatchMutate and processRowsWithLocks

Appy (JIRA) Mon, 09 Oct 2017 19:27:23 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198045#comment-16198045
 ]


Appy commented on HBASE-18703:
------------------------------

Catching up from the top.
There has already been a lot of discussion, some makes sense, while a lot 
doesn't :). Let me start with the main points.

---- 
bq. HRegion.processRowsWithLocks() implementation, doesn't call coprocessor 
hooks but instead calls RowProcessor hooks at appropriate point in execution. 
Many of these hooks/ methods have same names and are called at similar points 
during the course of execution but they are not related!

That's because RowProcessor is not necessarily doing mutations. It's can be 
doing anything- reading, writing, aggregating, etc.  For eg. if i am storing 
analytics data, I may want to write a RowProcessor which can aggregates large 
amount of data on server itself and just returns the result. There's no need of 
coprocessor hooks for this use case. (yeah yeah....better use something else 
for analytics, but you get the point). It's not necessarily for mutations, it's 
much more generic and powerful. 

bq. HRegion.batchMutate() methods call coprocessor hooks but not row 
RowProcessor hooks.
I think hook is bad word to use here. Yeah both CP and RP have overridable 
function, but that's where similarity ends.
CP hooks are events which are invoked when specific code path is travelled. 
This is a promise we make to third party.
That's why on every put, get, delete, we invoke pre\*/post\* functions.
RowProcessor functions are not events. RowProcessor class is like a task - a 
collection of functions - which is given to HRegion. And at different stages, 
HRegion invokes appropriate function of the processor. The act of running a 
RowProcessor has no *direct* relation to Coprocessors (in specific 
implementations, work done by RP might be related to CP).
So batchMutate(), which has nothing to do with RowProcessor and should not be 
invoking RowProcessor hooks anyways.

bq. The major inconsistency here is, one code path uses coprocessors while 
other uses RowProcessor. There are other minor inconsistencies along those two 
code paths.
It's confusing as hell, sure, but my understanding is that there is no 
inconsistency here. 
One path is mutations, and is supposed to invoke coprocessors.
Other path is very high level, generic, row processing mechanism. And is only 
supposed to invoke right RowProcessors steps at right time.

Since we use the second more generic path for mutateRows(), with the help of 
MultiRowMutationProcessor, it's only correct to invoke appropriate CP hooks 
from inside it, otherwise we'll be violating our CP promises.

bq. Comparing these list of steps for 2 methods, we can see the correlation for 
most hooks except for RowProcessor.process()
I think it's only expected.
batchMutate()  already knows what the mutations are. It's coming into it as a 
param. Easy peesy.
But poor processRowsWithLocks() doesn't know what it was called for, so it 
still has to figure it out, and it does so by asking RP for the mutations. 
Let's be sympathetic.
On a serious note, that's by design.

bq. I am proposing that Coprocessors can be used for customized processing 
instead of RowProcessor. Currently this can be done either with RowProcessors 
by calling Region.processRowsWithLocks() or with coprocessors by calling 
Region.batchMutate(). The intended difference between these 2 APIs is that 
Region.batchMutate() will only perform PUT and DELETE operations and 
Region.processRowsWithLocks() can perform any of GET, PUT, DELETE, 
CheckAndMutate etc operations.

So if RowProcessor is more generic, shouldn't we move to it? Why move to an 
option which is the limited one.

----
Note that in batchMutate(), mutations are not related. The batch is not a 
transaction. Some can fail, some can pass. Whereas in processRowsWithLocks(), 
it's transactional. So even if we merge two code paths, we need to keep two 
kinds of locking .

----
The way i see RowProcessor & processRows() combo is -  it will allow us to 
separate out row operations and pack them neatly in classes (outside of 
HRegion, 8000lines!!), and that too in a way such that it is easy to inject 
back.

I think the better design here would be, improve the RowProcessor to provide 
native support for mutations - locks, invoking cp hooks etc. That way we can 
use it for our internal purpose. Advantages are (explicitly listing them again):
1) lot of code sharing : Code around locking, wal edits, pre/postCP hooks can 
be shared. Move batchMutation() to BatchMutationProcessor, and rename 
MultiRowMutationProcessor to denote that unlike BatchMutationProcessor, it's 
transactional.
2) We can split out all mutations related logic from 8000 lines of HRegion 
class.
3) It'll be a better design to expose for use in CPs.

----
Taking a step back, yes, it's a mess. Unification of two paths would be great!

> Inconsistent behavior for preBatchMutate in doMiniBatchMutate and 
> processRowsWithLocks
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-18703
>                 URL: https://issues.apache.org/jira/browse/HBASE-18703
>             Project: HBase
>          Issue Type: Bug
>          Components: Coprocessors
>            Reporter: Duo Zhang
>            Assignee: Umesh Agashe
>            Priority: Critical
>             Fix For: 2.0.0-alpha-4
>
>         Attachments: hbase-18703.master.001.patch, 
> hbase-18703.master.002.patch, hbase-18703.master.003.patch, 
> hbase-18703.master.004.patch, hbase-18703.master.005.patch, 
> hbase-18703.master.005.patch, hbase-18703.master.005.patch, 
> hbase-18703.master.006.patch, hbase-18703.master.007.patch
>
>
> In doMiniBatchMutate, the preBatchMutate is called before building WAL, but 
> in processRowsWithLocks, we suggest the RowProcessor implementation to build 
> WAL in process  method, which is ahead of preBatchMutate.
> If a CP modifies the mutations, especially if it removes some cells from the 
> mutations, then the behavior of processRowsWithLocks is broken. The changes 
> applied to memstore and WAL will be different. And there is no way to remove 
> entries from a WALEdit through CP. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18703) Inconsistent behavior for preBatchMutate in doMiniBatchMutate and processRowsWithLocks

Reply via email to