[jira] [Commented] (HBASE-10277) refactor AsyncProcess

Sergey Shelukhin (JIRA) Mon, 20 Jan 2014 11:29:58 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876760#comment-13876760
 ]


Sergey Shelukhin commented on HBASE-10277:
------------------------------------------


bq. Could AsyncRequests be done as a 'Future'? Seems to have a bunch in common.
I'll take a look at it... it's more like a multi-future. Maybe even FutureTask 
can be used.

bq. We have something called Set but it does not implement java.util.Set:
Will fix.

bq. We make an ap and a multiap each time?
Once per HTable. The difference is the mode of operation - the "legacy" mode or 
the normal one.

bq. Batch should return an array of objects?
It does? Don't quite understand the comment. Some of the existing interfaces 
accept the array for results that fill it. Backward compat.


bq. 1) Some changes are cosmetics: some protected becomes private, some "this." 
are removed. I'm not against these changes, but it makes the real meat more 
difficult to find.
Removing "this." is not cosmetic (at least in the cases I am aware of) - 
methods moved to a non-static nested class, there's no more "this.".
Changing to private can be removed, although it's a good change to have.

bq. 2) The javadoc has not been updated, so when the code differs from the 
javadoc, the reader has to sort out himself if it's just that the javadoc is 
now out dated or if there is a regression. 
Will update.

bq. AsyncProcess#submit. Why does it take a tableName? Does it mean that an 
AsyncProcess can now be shared between Tables?
Yes.

bq. AsyncRequestSet#waitUntilDone
bq. Same responsibility as AsyncProcess#waitUntilDone, but less features (no 
logs. These logs are useful).
Some logs were preserved. 
Previous waitUntilDone had two wait conditions and loops in an effort not to 
loop often, whereas only one seems to be necessary, so I don't think features 
were lost... I can add more logs.

bq. This part should go in HConnection, as we should manage the load tracking 
globally and not only for a single call. Iit would be a change in bahavior 
compared to the 0.94, but I think we should do it. Would it make you life 
easier here?
It may, actually... but HTable uses AP directly. In fact batch calls from HCM 
currently don't use throttling at all (it calls "submitAll"), so only 
HTable-direct-usage uses this code.

{quote}
bq. Probably this perf difference will not be noticeable on real requests 
(remains to be tested).
Let me be more pessimistic here .
{quote}
Yeah, I'd probably need to test. I was not able to figure out what exactly is 
the cause of slowdown - YourKit gives very low % numbers for AP, CPU or wall 
clock, and almost identical between old and equivalent new methods across runs.

{quote}
bq. Also got rid of callback that was mostly used for tests, tests can check 
results without it.
I'm not a big fan of this part of the change. Callbacks can be reused in 
different context (for example to have a different policy, such as ignoring 
errors as in HTableMultiplexer). As well, we now have a hardRetryLimit, but 
this attribute is used only in tests.
{quote}
Callback was already only used for tests, for practical purposes (also for 
array filling, but that is no longer necessary).
I am not a big fan of having test-oriented code in production code; thus I 
replaced the only necessary test usage (stopping retries) with a field. I 
wanted to get rid of that too, but that would be too convoluted it seems.
When we have a scenario to use some callback, we can add it, under YAGNI 
principle :)

bq. More globally, This patch allows to reuse a single AsyncProcess between 
independant streams of writes. Would that be necessary if it was cheaper to 
create ? The cost is reading the configuration, as when we do a HTable#get and 
create a RegionServerCallable.
The main reason is to have well-defined context for single call for "normal" 
(as opposed to HTable cross-put stuff) usage pattern. So for example replica 
calls could group requests differently and synchronize/populate the same 
result, cancel other calls when they are no longer useful, etc.
It also separates the two patterns better (by ctor argument); see potential 
problems outlined in JIRA description.

bq. The problem is that with this patch, we still create a AsyncProcess in some 
cases, for example on the batchCallback path...
Yeah, that is due to ability to pass a custom execution pool (could be changed 
to accomodate it by making pool per request, but I am not sure it's 
necessary...) and limitations of java generics + current result type handling 
(AP will have to be per result type). IIRC most of these paths are deprecated.



> refactor AsyncProcess
> ---------------------
>
>                 Key: HBASE-10277
>                 URL: https://issues.apache.org/jira/browse/HBASE-10277
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBASE-10277.patch
>
>
> AsyncProcess currently has two patterns of usage, one from HTable flush w/o 
> callback and with reuse, and one from HCM/HTable batch call, with callback 
> and w/o reuse. In the former case (but not the latter), it also does some 
> throttling of actions on initial submit call, limiting the number of 
> outstanding actions per server.
> The latter case is relatively straightforward. The former appears to be error 
> prone due to reuse - if, as javadoc claims should be safe, multiple submit 
> calls are performed without waiting for the async part of the previous call 
> to finish, fields like hasError become ambiguous and can be used for the 
> wrong call; callback for success/failure is called based on "original index" 
> of an action in submitted list, but with only one callback supplied to AP in 
> ctor it's not clear to which submit call the index belongs, if several are 
> outstanding.
> I was going to add support for HBASE-10070 to AP, and found that it might be 
> difficult to do cleanly.
> It would be nice to normalize AP usage patterns; in particular, separate the 
> "global" part (load tracking) from per-submit-call part.
> Per-submit part can more conveniently track stuff like initialActions, 
> mapping of indexes and retry information, that is currently passed around the 
> method calls.
> -I am not sure yet, but maybe sending of the original index to server in 
> "ClientProtos.MultiAction" can also be avoided.- Cannot be avoided because 
> the API to server doesn't have one-to-one correspondence between requests and 
> responses in an individual call to multi (retries/rearrangement have nothing 
> to do with it)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10277) refactor AsyncProcess

Reply via email to