[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646465#comment-13646465
 ] 

Nicolas Liochon commented on HBASE-6295:
----------------------------------------

bq. Also, what happens to rows that are not added in AsyncProcess::submit? Not 
clear on that

[~sershe] Thanks for having a look. I wrote a short summary that I will put in 
the javadoc or in the hbase ref guide to explain what the code is supposed to 
do.
{panel} 
The puts are sent asynchronously. The interface is 100% compatible with the 
HTable interface that we had in 0.94 and before.
If autoflush is set to false, writes are buffered in HTable. When the buffer 
size goes beyond the value defined in "hbase.client.write.buffer", the buffer 
is sent asynchronously to the server. Retries will be also be managed 
independently. We block only:
- if the users code calls HTable#flushCommit
- if the user code calls HTable#close, because it implies a flushCommit
- if we run out of retries for an operation: in this case we finish all the 
writes in progress, an raise a single aggregated error.
- if we met one of the flow control condition detailed below.

It's possible to control the client stream with two parameters:
- "hbase.client.max.total.tasks": number of task that we can run 
simultaneously. If the buffer goes beyond "hbase.client.write.buffer" and the 
number of tasks currently in progress is greater then 
"hbase.client.max.total.tasks", we block until some of the tasks finishes. This 
parameter must be set accordingly with the cluster size: if there are 1000 
machines in the cluster, it may make sense to have a few thousand conccurrent 
tasks for some tables.
-  "hbase.client.max.perregion.tasks": number of tasks in progress for the same 
region. When doing a background flush, puts for a region that has already 
"hbase.client.max.perregion.tasks" or more tasks in progress are skipped, and 
remain in the HTable write buffer. They will be sent into a later background 
flush. If, when doing a background flush, all entries are skipped, we block 
until a slot becomes available.
{panel} 

Now that I wrote this, I think I have a bug in the way I manage errors and 
clearBufferOnFail: may be the write buffer should contain only failes puts. I 
will check this.

bq.  Lots of the code seems to be copied from other parts of HCM, and the 
original is not removed, will it be removed? Otherwise there's duplication.
I really don't know. The problem I have is that this API is public. So while 
it's transparent in HTable (I don't change the interface nor its contract), 
it's not the case for the methods in HConnectionManager. That's why I added 
some methods: it allows to keep the existing interface of HConnectionManager 
while adding the background flush. I thought about implementing the previous 
synchronous interface with the new asynchronous methods, but I feel it can make 
them more fragile. I don't have any real opinion here, the whole existing code 
could be refactored quite a lot. That's why the patch is not final, but I can't 
say if the final patch will/should remove the duplication.


bq. RegionTooBusyException is a new one on me (I'll work on the ugly pb message 
in another issue)
Thanks, [[email protected]]. In my tests, it seems the servers hangs at a 
point. I can stop the client and restart it, the server does not accept any new 
operation (for something like 5 minutes). I don't know if it's related to my 
changes, but it's fishy. I will do a test with a server without 6295.
                
> Possible performance improvement in client batch operations: presplit and 
> send in background
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>         Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to