[
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631022#comment-13631022
]
Nicolas Liochon commented on HBASE-6295:
----------------------------------------
It does help. Cases are:
1) The client wants a synchronous (multi)put or (multi)get. We have to wait.
2) The client uses today the htable interface with autoflush set to false. In
this case, with this patch we will be way faster, as we will continue to accept
put from the client and send them to the servers.
So today, we do in the client:
void put(Put put){
toSend.add(put)
if (toSend.isBigEnough(){
nbRetry = 0;
while (nbRetry++ < maxRetry && toSend.stillSomethingToSend())
send(toSend)
if (toSend.hasError()
throw Exception();
}
In the patch we do
private BackgroundSendThread backgroundSendThread;
void put(Put put){
if (backgroundSendThread.hasError())
throw Exception();
toSend.add(put)
if (toSend.isBigEnough(){
backgroundSendThread.send(toSend); // Non blocking, do retries. Set
hasError if there are
}
This is 100% compatible with the previous contract. I would like the new
behavior to be the default.
But obviously, there is a limit to what the client can keep in its background
list.
There are 3 possible ways to manage this limit:
- the total buffer size
- the number of task
- the throughput
Here I implemented the simplest: control the number of task. We can implement
the others as well.
What we gain here as well is that it will be a way to control the client. I
expect it will be simple to set a different number of task per client, hence
the map reduce clients will send less writes than the other.
The direct case is a region under recovery: the client will be able to go to
all other regions. We may have some herd effects, but I expect the difference
to be just great for many use cases.
What we could as well is adding a configurable error management: we could just
dismiss the puts that failed too much instead of setting a blocking error. It's
quite easy to add.
This said, the code in HConnection* is quite strange sometimes, I'm may have to
propose some incompatible interface change (for example, when a list of put
failed, there is an option to keep them in the write buffer: it's more
difficult / expensive when there are two threads and not one).
The error aboves seems unrelated to my changes, except that may be I triggered
another set of flakyness by changing the internal behavior.
> Possible performance improvement in client batch operations: presplit and
> send in background
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
> Issue Type: Improvement
> Components: Client, Performance
> Affects Versions: 0.95.2
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
> add o to todolist
> if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
> get location
> add o to todo location.todolist
> if (location.todolist > maxLocationSize)
> send location.todolist to region server
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira