I have made the new miniBatchDelete () and made the HTable#delete(List<Delete>) to call this new batch delete. Just tested initially with the one node cluster. In that itself I am getting a performance boost which is very much promising. Only one CF and qualifier. 10K total rows delete with a batch of 100 deletes. Only deletes happening on the table from one thread. With the new way the net time taken is reduced by more than 1/10 Will test in a 4 node cluster also. I think it will worth doing this change.
-Anoop- ________________________________________ From: [email protected] [[email protected]] Sent: Wednesday, June 20, 2012 6:31 PM To: [email protected] Cc: [email protected] Subject: Re: Can there be a doMiniBatchDelete in HRegion? I think you can issue large number of deletes on the same region and observe whether the proposed new method gives us performance boost. Thanks On Jun 20, 2012, at 2:49 AM, Anoop Sam John <[email protected]> wrote: > Hi Devs > > There is a batch put support in the HRegion level. When the > put(List<Put>) happens from client, Puts corresponding to one region might > get grouped together and handled as a batch.[Depending on the availability of > rowlocks.. code in HRegion#doMiniBatchPut] For this batch there will be > single write and sync into the HLog file. > > > > A similar kind of delete operation, I am not able to see in HRegion. The > HTable#delete(List<Delete>) groups the Deletes for the same RS and make one > n/w call only. But within the RS, there will be N number of delete calls on > the region one by one. This will include N number of HLog write and sync. If > this also can be grouped can we get better performance for the multi row > delete. Is there any problem in doing this batch delete? I am not sure any > JIRA is already present for this. > > > > Note : Hregion#mutateRowsWithLock().. we do batch operations of Puts and > Deletes(also) > > > > -Anoop-
