nkeywal created HBASE-6295:
------------------------------

             Summary: Possible performance improvement in client batch 
operations: presplit and send in background
                 Key: HBASE-6295
                 URL: https://issues.apache.org/jira/browse/HBASE-6295
             Project: HBase
          Issue Type: Improvement
          Components: client
    Affects Versions: 0.96.0
            Reporter: nkeywal


today batch algo is:
{noformat}
for Operation o: List<Op>{
  add o to todolist
  if todolist > maxsize or o last in list
    split todolist per location
    send split lists to region servers
    clear todolist
    wait
}
{noformat}

We could:
- create immediately the final object instead of an intermediate array
- split per location immediately
- instead of sending when the list as a whole is full, send it when there is 
enough data for a single location

It would be:
{noformat}
for Operation o: List<Op>{
  get location
  add o to todo location.todolist
  if (location.todolist > maxLocationSize)
    send location.todolist to region server 
    clear location.todolist
    // don't wait, continue the loop
}
send remaining
wait
{noformat}

It's not trivial to write if you add error management: retried list must be 
shared with the operations added in the todolist. But it's doable.
It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to