[ 
https://issues.apache.org/jira/browse/HBASE-20786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524086#comment-16524086
 ] 

stack commented on HBASE-20786:
-------------------------------

Here is where the last create table spent its 28minutes creating 33k regions on 
a 650 node cluster with 270k regions already up on it:


 * It took 20minutes to create all the regions doing the below against 
NN...This is with region open executor configured as default 10 max concurrent 
creates. We should make this half-the cores or 10 which ever is larger?
 * It took 10-12 seconds to "Added 32400 regions to meta.:
 * Took about two minutes to create 33k subprocedures (We could speed this 
up....Do it incremental rather than in a lump)
 * The next six minutes are spent assigning the 33k regions... rpc'ing, 
updating master, etc. This cluster was at INFO level so I can't see how much 
batching we are doing in our RPCs....  Could be room for improvement here.



{code}
2018-06-26 08:18:14,750 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
creating HRegion IntegrationTestBigLinkedList_20180626064758 HTD == 
'IntegrationTestBigLinkedList_20180626064758', {NAME => 'big', VERSIONS => '1', 
EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
'true', BLOCKSIZE => '65536'}, {NAME => 'meta', VERSIONS => '1', 
EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
'true', BLOCKSIZE => '65536'}, {NAME => 'tiny', VERSIONS => '1', 
EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
'true', BLOCKSIZE => '65536'} RootDir = hdfs://ns1/hbase/.tmp Table name == 
IntegrationTestBigLinkedList_20180626064758
{code}



> Table create with thousands of regions takes too long
> -----------------------------------------------------
>
>                 Key: HBASE-20786
>                 URL: https://issues.apache.org/jira/browse/HBASE-20786
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Performance
>            Reporter: stack
>            Priority: Major
>
> Internal testing has create of a table with 33k regions taking 18 minutes. 
> Let me provide more info below. We have an executor with default ten threads 
> handling the creation of the regions in HDFS which helps distribute out the 
> load but its not enough. This cluster had >600 servers. Let me add detail.
> Need to spend some time on speeding up create/assigns. Made this an umbrella 
> issue so can pick off pieces of the problem as subtasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to