[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685560#comment-13685560
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:
--------------------------------------------

Other tests seems to be consistent even if I don't get the exact same 
results... Will do some more.

bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 2 1 
3000000 /tmp/biglinkedlist 1

Trunk:
2013-06-17 08:37:08,264 INFO  [main] mapred.JobClient: Job complete: 
job_local_0006
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: Counters: 31
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:   
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     REFERENCED=6000000
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     RPC_CALLS=609
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     
NUM_SCANNER_RESTARTS=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     
MILLIS_BETWEEN_NEXTS=41071
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     
BYTES_IN_RESULTS=360000000
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=4
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:   File Output Format 
Counters
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     
FILE_BYTES_READ=5696162333
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     
FILE_BYTES_WRITTEN=6730223455
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:   File Input Format 
Counters
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     Map output 
materialized bytes=414000024
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Map input 
records=6000000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Reduce shuffle 
bytes=0
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Spilled 
Records=39145720
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Map output 
bytes=390000000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Total committed heap 
usage (bytes)=1303552000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=422
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Combine input 
records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Reduce input 
records=12000000
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Reduce input 
groups=6000000
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Combine output 
records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Physical memory 
(bytes) snapshot=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Reduce output 
records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Virtual memory 
(bytes) snapshot=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Map output 
records=12000000
2013-06-17 08:37:08,271 INFO  [main] test.IntegrationTestBigLinkedList$Loop: 
Verify finished with succees. Total nodes=6000000


Nic:
2013-06-17 08:44:47,530 INFO  [main] mapred.JobClient: Job complete: 
job_local_0006
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient: Counters: 31
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:   
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:     REFERENCED=6000000
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     RPC_CALLS=607
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     
NUM_SCANNER_RESTARTS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     
MILLIS_BETWEEN_NEXTS=39871
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     
BYTES_IN_RESULTS=360000000
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=3
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   File Output Format 
Counters
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     
FILE_BYTES_READ=5185648641
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     
FILE_BYTES_WRITTEN=6110147770
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   File Input Format 
Counters
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     Map output 
materialized bytes=414000018
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Map input 
records=6000000
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Reduce shuffle 
bytes=0
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Spilled 
Records=41455689
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Map output 
bytes=390000000
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Total committed heap 
usage (bytes)=1262878720
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=302
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Combine input 
records=0
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Reduce input 
records=12000000
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Reduce input 
groups=6000000
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Combine output 
records=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Physical memory 
(bytes) snapshot=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Reduce output 
records=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Virtual memory 
(bytes) snapshot=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Map output 
records=12000000
2013-06-17 08:44:47,536 INFO  [main] test.IntegrationTestBigLinkedList$Loop: 
Verify finished with succees. Total nodes=6000000







bin/hbase org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify 
-Dloadmapper.backrefs=10 -Dloadmapper.map.tasks=10 
-Dloadmapper.num_to_write=100000 -Dverify.reduce.tasks=1 
-Dverify.scannercaching=10000 loadAndVerify



Trunk:
2013-06-17 09:01:45,884 INFO  [main] mapred.JobClient: Job complete: 
job_local_0002
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient: Counters: 32
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     RPC_CALLS=196
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
NUM_SCANNER_RESTARTS=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
MILLIS_BETWEEN_NEXTS=19795
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
BYTES_IN_RESULTS=592892544
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=40
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   
org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     ROWS_WRITTEN=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
REFERENCES_CHECKED=9855224
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   File Output Format 
Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
FILE_BYTES_READ=12005531128
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     
FILE_BYTES_WRITTEN=21471152830
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   File Input Format 
Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Map output 
materialized bytes=460630096
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Map input 
records=1000000
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Reduce shuffle 
bytes=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Spilled 
Records=42262109
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Map output 
bytes=438919408
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Total committed heap 
usage (bytes)=15392387072
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=4144
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Combine input 
records=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Reduce input 
records=10855224
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Reduce input 
groups=1000000
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Combine output 
records=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Physical memory 
(bytes) snapshot=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Reduce output 
records=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Virtual memory 
(bytes) snapshot=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Map output 
records=10855224




Nic:
2013-06-17 08:56:38,894 INFO  [main] mapred.JobClient: Job complete: 
job_local_0002
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient: Counters: 32
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     RPC_CALLS=196
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     
NUM_SCANNER_RESTARTS=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     
MILLIS_BETWEEN_NEXTS=19384
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     
BYTES_IN_RESULTS=592944120
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=40
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:   
org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     ROWS_WRITTEN=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     
REFERENCES_CHECKED=9856145
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:   File Output Format 
Counters
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     
FILE_BYTES_READ=12006648901
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     
FILE_BYTES_WRITTEN=21472928417
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:   File Input Format 
Counters
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Map output 
materialized bytes=460670620
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Map input 
records=1000000
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Reduce shuffle 
bytes=0
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Spilled 
Records=42265579
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Map output 
bytes=438958090
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Total committed heap 
usage (bytes)=15534960640
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=4144
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Combine input 
records=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Reduce input 
records=10856145
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Reduce input 
groups=1000000
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Combine output 
records=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Physical memory 
(bytes) snapshot=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Reduce output 
records=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Virtual memory 
(bytes) snapshot=0
2013-06-17 08:56:38,899 INFO  [main] mapred.JobClient:     Map output 
records=10856145



                
> Possible performance improvement in client batch operations: presplit and 
> send in background
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>             Fix For: 0.98.0
>
>         Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to