[
https://issues.apache.org/jira/browse/HBASE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876760#comment-13876760
]
Sergey Shelukhin commented on HBASE-10277:
------------------------------------------
bq. Could AsyncRequests be done as a 'Future'? Seems to have a bunch in common.
I'll take a look at it... it's more like a multi-future. Maybe even FutureTask
can be used.
bq. We have something called Set but it does not implement java.util.Set:
Will fix.
bq. We make an ap and a multiap each time?
Once per HTable. The difference is the mode of operation - the "legacy" mode or
the normal one.
bq. Batch should return an array of objects?
It does? Don't quite understand the comment. Some of the existing interfaces
accept the array for results that fill it. Backward compat.
bq. 1) Some changes are cosmetics: some protected becomes private, some "this."
are removed. I'm not against these changes, but it makes the real meat more
difficult to find.
Removing "this." is not cosmetic (at least in the cases I am aware of) -
methods moved to a non-static nested class, there's no more "this.".
Changing to private can be removed, although it's a good change to have.
bq. 2) The javadoc has not been updated, so when the code differs from the
javadoc, the reader has to sort out himself if it's just that the javadoc is
now out dated or if there is a regression.
Will update.
bq. AsyncProcess#submit. Why does it take a tableName? Does it mean that an
AsyncProcess can now be shared between Tables?
Yes.
bq. AsyncRequestSet#waitUntilDone
bq. Same responsibility as AsyncProcess#waitUntilDone, but less features (no
logs. These logs are useful).
Some logs were preserved.
Previous waitUntilDone had two wait conditions and loops in an effort not to
loop often, whereas only one seems to be necessary, so I don't think features
were lost... I can add more logs.
bq. This part should go in HConnection, as we should manage the load tracking
globally and not only for a single call. Iit would be a change in bahavior
compared to the 0.94, but I think we should do it. Would it make you life
easier here?
It may, actually... but HTable uses AP directly. In fact batch calls from HCM
currently don't use throttling at all (it calls "submitAll"), so only
HTable-direct-usage uses this code.
{quote}
bq. Probably this perf difference will not be noticeable on real requests
(remains to be tested).
Let me be more pessimistic here .
{quote}
Yeah, I'd probably need to test. I was not able to figure out what exactly is
the cause of slowdown - YourKit gives very low % numbers for AP, CPU or wall
clock, and almost identical between old and equivalent new methods across runs.
{quote}
bq. Also got rid of callback that was mostly used for tests, tests can check
results without it.
I'm not a big fan of this part of the change. Callbacks can be reused in
different context (for example to have a different policy, such as ignoring
errors as in HTableMultiplexer). As well, we now have a hardRetryLimit, but
this attribute is used only in tests.
{quote}
Callback was already only used for tests, for practical purposes (also for
array filling, but that is no longer necessary).
I am not a big fan of having test-oriented code in production code; thus I
replaced the only necessary test usage (stopping retries) with a field. I
wanted to get rid of that too, but that would be too convoluted it seems.
When we have a scenario to use some callback, we can add it, under YAGNI
principle :)
bq. More globally, This patch allows to reuse a single AsyncProcess between
independant streams of writes. Would that be necessary if it was cheaper to
create ? The cost is reading the configuration, as when we do a HTable#get and
create a RegionServerCallable.
The main reason is to have well-defined context for single call for "normal"
(as opposed to HTable cross-put stuff) usage pattern. So for example replica
calls could group requests differently and synchronize/populate the same
result, cancel other calls when they are no longer useful, etc.
It also separates the two patterns better (by ctor argument); see potential
problems outlined in JIRA description.
bq. The problem is that with this patch, we still create a AsyncProcess in some
cases, for example on the batchCallback path...
Yeah, that is due to ability to pass a custom execution pool (could be changed
to accomodate it by making pool per request, but I am not sure it's
necessary...) and limitations of java generics + current result type handling
(AP will have to be per result type). IIRC most of these paths are deprecated.
> refactor AsyncProcess
> ---------------------
>
> Key: HBASE-10277
> URL: https://issues.apache.org/jira/browse/HBASE-10277
> Project: HBase
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HBASE-10277.patch
>
>
> AsyncProcess currently has two patterns of usage, one from HTable flush w/o
> callback and with reuse, and one from HCM/HTable batch call, with callback
> and w/o reuse. In the former case (but not the latter), it also does some
> throttling of actions on initial submit call, limiting the number of
> outstanding actions per server.
> The latter case is relatively straightforward. The former appears to be error
> prone due to reuse - if, as javadoc claims should be safe, multiple submit
> calls are performed without waiting for the async part of the previous call
> to finish, fields like hasError become ambiguous and can be used for the
> wrong call; callback for success/failure is called based on "original index"
> of an action in submitted list, but with only one callback supplied to AP in
> ctor it's not clear to which submit call the index belongs, if several are
> outstanding.
> I was going to add support for HBASE-10070 to AP, and found that it might be
> difficult to do cleanly.
> It would be nice to normalize AP usage patterns; in particular, separate the
> "global" part (load tracking) from per-submit-call part.
> Per-submit part can more conveniently track stuff like initialActions,
> mapping of indexes and retry information, that is currently passed around the
> method calls.
> -I am not sure yet, but maybe sending of the original index to server in
> "ClientProtos.MultiAction" can also be avoided.- Cannot be avoided because
> the API to server doesn't have one-to-one correspondence between requests and
> responses in an individual call to multi (retries/rearrangement have nothing
> to do with it)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)