[
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683434#comment-13683434
]
Jean-Marc Spaggiari commented on HBASE-6295:
--------------------------------------------
Hi Nicolas,
Has requested, here are some performances tests for your patch.
org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 765360.3
org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test 21109.7
org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 126617.6
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest 1046473.4
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 762233.175
org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 773127.4
org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test 22348.3
org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 134876.5
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest 115992.9
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 78791.275
First set is with your patch applied on yesterday's trunk.
Second set is yesterday's trunk without your patch.
the reads and scans are not impacted, but the writes are negatively impacted
with the version I tried.
Just let me know when you will be ready with your next version and I will be
very happy test it again.
> Possible performance improvement in client batch operations: presplit and
> send in background
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
> Issue Type: Improvement
> Components: Client, Performance
> Affects Versions: 0.95.2
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch,
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
> add o to todolist
> if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
> get location
> add o to todo location.todolist
> if (location.todolist > maxLocationSize)
> send location.todolist to region server
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira