[ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596874#comment-16596874
 ] 

Tyler Mi commented on HBASE-21098:
----------------------------------

These are really good points you brought up! I addressed them in code changes 
except for points 4 and 5.

Regarding point 4, I haven't actually changed the default working dir, and the 
default working dir on HDFS that I recommended is dependent on your specific 
cluster nodes, so I have chosen not to add a default value in. Without 
providing that configuration value, the code will still behave normally, 
defaulting based on what you set as your root directory, so again it would be 
hard to provide a default value for it.

Regarding point 5, this is something that I can absolutely do after all the 
code reviewing, time permitting!

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-21098
>                 URL: https://issues.apache.org/jira/browse/HBASE-21098
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 3.0.0, 2.1.1
>            Reporter: Tyler Mi
>            Priority: Major
>         Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to