[
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685560#comment-13685560
]
Jean-Marc Spaggiari commented on HBASE-6295:
--------------------------------------------
Other tests seems to be consistent even if I don't get the exact same
results... Will do some more.
bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 2 1
3000000 /tmp/biglinkedlist 1
Trunk:
2013-06-17 08:37:08,264 INFO [main] mapred.JobClient: Job complete:
job_local_0006
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient: Counters: 31
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient:
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient: REFERENCED=6000000
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient: HBase Counters
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient: REMOTE_RPC_CALLS=0
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient: RPC_CALLS=609
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient: RPC_RETRIES=0
2013-06-17 08:37:08,265 INFO [main] mapred.JobClient:
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient:
NUM_SCANNER_RESTARTS=0
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient:
MILLIS_BETWEEN_NEXTS=41071
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient:
BYTES_IN_RESULTS=360000000
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient:
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient: REGIONS_SCANNED=4
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient: REMOTE_RPC_RETRIES=0
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient: File Output Format
Counters
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient: Bytes Written=8
2013-06-17 08:37:08,266 INFO [main] mapred.JobClient: FileSystemCounters
2013-06-17 08:37:08,267 INFO [main] mapred.JobClient:
FILE_BYTES_READ=5696162333
2013-06-17 08:37:08,267 INFO [main] mapred.JobClient:
FILE_BYTES_WRITTEN=6730223455
2013-06-17 08:37:08,267 INFO [main] mapred.JobClient: File Input Format
Counters
2013-06-17 08:37:08,267 INFO [main] mapred.JobClient: Bytes Read=0
2013-06-17 08:37:08,267 INFO [main] mapred.JobClient: Map-Reduce Framework
2013-06-17 08:37:08,267 INFO [main] mapred.JobClient: Map output
materialized bytes=414000024
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: Map input
records=6000000
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: Reduce shuffle
bytes=0
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: Spilled
Records=39145720
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: Map output
bytes=390000000
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: Total committed heap
usage (bytes)=1303552000
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: CPU time spent (ms)=0
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: SPLIT_RAW_BYTES=422
2013-06-17 08:37:08,268 INFO [main] mapred.JobClient: Combine input
records=0
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Reduce input
records=12000000
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Reduce input
groups=6000000
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Combine output
records=0
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Physical memory
(bytes) snapshot=0
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Reduce output
records=0
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Virtual memory
(bytes) snapshot=0
2013-06-17 08:37:08,269 INFO [main] mapred.JobClient: Map output
records=12000000
2013-06-17 08:37:08,271 INFO [main] test.IntegrationTestBigLinkedList$Loop:
Verify finished with succees. Total nodes=6000000
Nic:
2013-06-17 08:44:47,530 INFO [main] mapred.JobClient: Job complete:
job_local_0006
2013-06-17 08:44:47,531 INFO [main] mapred.JobClient: Counters: 31
2013-06-17 08:44:47,531 INFO [main] mapred.JobClient:
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:44:47,531 INFO [main] mapred.JobClient: REFERENCED=6000000
2013-06-17 08:44:47,531 INFO [main] mapred.JobClient: HBase Counters
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient: REMOTE_RPC_CALLS=0
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient: RPC_CALLS=607
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient: RPC_RETRIES=0
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient:
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient:
NUM_SCANNER_RESTARTS=0
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient:
MILLIS_BETWEEN_NEXTS=39871
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient:
BYTES_IN_RESULTS=360000000
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient:
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient: REGIONS_SCANNED=3
2013-06-17 08:44:47,532 INFO [main] mapred.JobClient: REMOTE_RPC_RETRIES=0
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: File Output Format
Counters
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: Bytes Written=8
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: FileSystemCounters
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient:
FILE_BYTES_READ=5185648641
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient:
FILE_BYTES_WRITTEN=6110147770
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: File Input Format
Counters
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: Bytes Read=0
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: Map-Reduce Framework
2013-06-17 08:44:47,533 INFO [main] mapred.JobClient: Map output
materialized bytes=414000018
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Map input
records=6000000
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Reduce shuffle
bytes=0
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Spilled
Records=41455689
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Map output
bytes=390000000
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Total committed heap
usage (bytes)=1262878720
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: CPU time spent (ms)=0
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: SPLIT_RAW_BYTES=302
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Combine input
records=0
2013-06-17 08:44:47,534 INFO [main] mapred.JobClient: Reduce input
records=12000000
2013-06-17 08:44:47,535 INFO [main] mapred.JobClient: Reduce input
groups=6000000
2013-06-17 08:44:47,535 INFO [main] mapred.JobClient: Combine output
records=0
2013-06-17 08:44:47,535 INFO [main] mapred.JobClient: Physical memory
(bytes) snapshot=0
2013-06-17 08:44:47,535 INFO [main] mapred.JobClient: Reduce output
records=0
2013-06-17 08:44:47,535 INFO [main] mapred.JobClient: Virtual memory
(bytes) snapshot=0
2013-06-17 08:44:47,535 INFO [main] mapred.JobClient: Map output
records=12000000
2013-06-17 08:44:47,536 INFO [main] test.IntegrationTestBigLinkedList$Loop:
Verify finished with succees. Total nodes=6000000
bin/hbase org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify
-Dloadmapper.backrefs=10 -Dloadmapper.map.tasks=10
-Dloadmapper.num_to_write=100000 -Dverify.reduce.tasks=1
-Dverify.scannercaching=10000 loadAndVerify
Trunk:
2013-06-17 09:01:45,884 INFO [main] mapred.JobClient: Job complete:
job_local_0002
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Counters: 32
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: HBase Counters
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: REMOTE_RPC_CALLS=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: RPC_CALLS=196
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: RPC_RETRIES=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
NUM_SCANNER_RESTARTS=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
MILLIS_BETWEEN_NEXTS=19795
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
BYTES_IN_RESULTS=592892544
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: REGIONS_SCANNED=40
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: REMOTE_RPC_RETRIES=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: ROWS_WRITTEN=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
REFERENCES_CHECKED=9855224
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: File Output Format
Counters
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Bytes Written=8
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: FileSystemCounters
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
FILE_BYTES_READ=12005531128
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient:
FILE_BYTES_WRITTEN=21471152830
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: File Input Format
Counters
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Bytes Read=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Map-Reduce Framework
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Map output
materialized bytes=460630096
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Map input
records=1000000
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Reduce shuffle
bytes=0
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Spilled
Records=42262109
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Map output
bytes=438919408
2013-06-17 09:01:45,885 INFO [main] mapred.JobClient: Total committed heap
usage (bytes)=15392387072
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: CPU time spent (ms)=0
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: SPLIT_RAW_BYTES=4144
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Combine input
records=0
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Reduce input
records=10855224
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Reduce input
groups=1000000
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Combine output
records=0
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Physical memory
(bytes) snapshot=0
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Reduce output
records=0
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Virtual memory
(bytes) snapshot=0
2013-06-17 09:01:45,886 INFO [main] mapred.JobClient: Map output
records=10855224
Nic:
2013-06-17 08:56:38,894 INFO [main] mapred.JobClient: Job complete:
job_local_0002
2013-06-17 08:56:38,895 INFO [main] mapred.JobClient: Counters: 32
2013-06-17 08:56:38,895 INFO [main] mapred.JobClient: HBase Counters
2013-06-17 08:56:38,895 INFO [main] mapred.JobClient: REMOTE_RPC_CALLS=0
2013-06-17 08:56:38,895 INFO [main] mapred.JobClient: RPC_CALLS=196
2013-06-17 08:56:38,895 INFO [main] mapred.JobClient: RPC_RETRIES=0
2013-06-17 08:56:38,895 INFO [main] mapred.JobClient:
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient:
NUM_SCANNER_RESTARTS=0
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient:
MILLIS_BETWEEN_NEXTS=19384
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient:
BYTES_IN_RESULTS=592944120
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient:
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient: REGIONS_SCANNED=40
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient: REMOTE_RPC_RETRIES=0
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient:
org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient: ROWS_WRITTEN=0
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient:
REFERENCES_CHECKED=9856145
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient: File Output Format
Counters
2013-06-17 08:56:38,896 INFO [main] mapred.JobClient: Bytes Written=8
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: FileSystemCounters
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient:
FILE_BYTES_READ=12006648901
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient:
FILE_BYTES_WRITTEN=21472928417
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: File Input Format
Counters
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Bytes Read=0
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Map-Reduce Framework
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Map output
materialized bytes=460670620
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Map input
records=1000000
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Reduce shuffle
bytes=0
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Spilled
Records=42265579
2013-06-17 08:56:38,897 INFO [main] mapred.JobClient: Map output
bytes=438958090
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Total committed heap
usage (bytes)=15534960640
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: CPU time spent (ms)=0
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: SPLIT_RAW_BYTES=4144
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Combine input
records=0
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Reduce input
records=10856145
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Reduce input
groups=1000000
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Combine output
records=0
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Physical memory
(bytes) snapshot=0
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Reduce output
records=0
2013-06-17 08:56:38,898 INFO [main] mapred.JobClient: Virtual memory
(bytes) snapshot=0
2013-06-17 08:56:38,899 INFO [main] mapred.JobClient: Map output
records=10856145
> Possible performance improvement in client batch operations: presplit and
> send in background
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
> Issue Type: Improvement
> Components: Client, Performance
> Affects Versions: 0.95.2
> Reporter: Nicolas Liochon
> Assignee: Nicolas Liochon
> Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch,
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch,
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
> add o to todolist
> if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
> get location
> add o to todo location.todolist
> if (location.todolist > maxLocationSize)
> send location.todolist to region server
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira