Tyler Mi created HBASE-21098:
--------------------------------
Summary: HBase on S3 Snapshot Performance Increase
Key: HBASE-21098
URL: https://issues.apache.org/jira/browse/HBASE-21098
Project: HBase
Issue Type: Improvement
Reporter: Tyler Mi
When using Apache HBase, the snapshot feature can be used to make a point in
time recovery. To do this, HBase creates a manifest of all the files in all of
the Regions so that those files can be referenced again when a user restores a
snapshot. With HBase storage mode S3, developers can store their data
off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is inefficient
in some operations, namely renames. Most Hadoop ecosystem applications use an
atomic rename as a method of committing data. However, with S3, a rename is a
separate copy and then a delete of every file which is no longer atomic and, in
fact, quite costly. In addition, puts and deletes on S3 have latency issues
that traditional filesystems do not encounter when manipulating the region
snapshots. When HBase on S3 customers have a significant amount of regions,
puts, deletes, and renames (the final commit stage of the snapshot) become the
bottleneck causing snapshots to take many minutes or even hours to complete.
The purpose of this patch is to increase the overall performance of snapshots
while utilizing HBase on S3 through the use of a temporary directory for the
snapshots that exists on a traditional filesystem to circumvent the bottlenecks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)