[
https://issues.apache.org/jira/browse/HBASE-20786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524086#comment-16524086
]
stack commented on HBASE-20786:
-------------------------------
Here is where the last create table spent its 28minutes creating 33k regions on
a 650 node cluster with 270k regions already up on it:
* It took 20minutes to create all the regions doing the below against
NN...This is with region open executor configured as default 10 max concurrent
creates. We should make this half-the cores or 10 which ever is larger?
* It took 10-12 seconds to "Added 32400 regions to meta.:
* Took about two minutes to create 33k subprocedures (We could speed this
up....Do it incremental rather than in a lump)
* The next six minutes are spent assigning the 33k regions... rpc'ing,
updating master, etc. This cluster was at INFO level so I can't see how much
batching we are doing in our RPCs.... Could be room for improvement here.
{code}
2018-06-26 08:18:14,750 INFO org.apache.hadoop.hbase.regionserver.HRegion:
creating HRegion IntegrationTestBigLinkedList_20180626064758 HTD ==
'IntegrationTestBigLinkedList_20180626064758', {NAME => 'big', VERSIONS => '1',
EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',
KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE =>
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE =>
'true', BLOCKSIZE => '65536'}, {NAME => 'meta', VERSIONS => '1',
EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',
KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE =>
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE =>
'true', BLOCKSIZE => '65536'}, {NAME => 'tiny', VERSIONS => '1',
EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',
KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE =>
'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',
PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE =>
'true', BLOCKSIZE => '65536'} RootDir = hdfs://ns1/hbase/.tmp Table name ==
IntegrationTestBigLinkedList_20180626064758
{code}
> Table create with thousands of regions takes too long
> -----------------------------------------------------
>
> Key: HBASE-20786
> URL: https://issues.apache.org/jira/browse/HBASE-20786
> Project: HBase
> Issue Type: Umbrella
> Components: Performance
> Reporter: stack
> Priority: Major
>
> Internal testing has create of a table with 33k regions taking 18 minutes.
> Let me provide more info below. We have an executor with default ten threads
> handling the creation of the regions in HDFS which helps distribute out the
> load but its not enough. This cluster had >600 servers. Let me add detail.
> Need to spend some time on speeding up create/assigns. Made this an umbrella
> issue so can pick off pieces of the problem as subtasks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)