Lavinia-Stefania Sirbu created HBASE-21286:
----------------------------------------------
Summary: Parallelize computeHDFSBlocksDistribution when getting
splits of a HBaseSnapshot
Key: HBASE-21286
URL: https://issues.apache.org/jira/browse/HBASE-21286
Project: HBase
Issue Type: Improvement
Components: snapshots
Affects Versions: 1.4.0
Reporter: Lavinia-Stefania Sirbu
Even if this step is called computeHDFSBlocksDistribution, this is executed no
matter the file system of the snapshot. For example, we have observed an
important slowness when we have a snapshot in s3 (~26k regions, 5column
families, 2 files per column family) the getsplits time is ~40min due to the
calls in s3 for listing the files to get the best locations.
Parallelizing this operation can reduce the overall setup time. The thread pool
should be configurable and a good choice could be
"hbase.snapshot.thread.pool.max" that is also used in RestoreSnapshotHelper.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)