[ 
https://issues.apache.org/jira/browse/HBASE-15321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172375#comment-15172375
 ] 

churro morales commented on HBASE-15321:
----------------------------------------

Use case: 

Jobs made regionservers slow. Slow regionservers made jobs slow. 

Jobs took up quite a bit of regionserver resources, eg: RS heap, handlers, 
etc...  We had jobs that did full table scans over a really large table, with 
lots of regions and store files.  Hbase snapshots were quite slow on our large 
cluster (even with skip flush and manifests) they took around 20 minutes to 
snapshot this table. This cluster was also taking quite a bit of writes and 
serving random reads so the main goal being to reduce the influence these jobs 
had on cluster resources    

Hdfs snapshots are O(1) operations.  Thus for our jobs, we took a snapshot in 
setup, ran the job over the hdfs snapshot and then deleted the snapshot after 
the job completed.

If the job can afford to have a latency of (Now - 
hbase.regionserver.optionalcacheflushinterval) for your job, M/R over hdfs 
snapshots is a good option.

This improved the speed at which the jobs completed as well as reduced the 
resources being consumed from hbase on our cluster.


> Ability to open a HRegion from hdfs snapshot.
> ---------------------------------------------
>
>                 Key: HBASE-15321
>                 URL: https://issues.apache.org/jira/browse/HBASE-15321
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: churro morales
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15321-v1.patch, HBASE-15321-v2.patch, 
> HBASE-15321-v3.patch, HBASE-15321.patch
>
>
> Now that hdfs snapshots are here, we started to run our mapreduce jobs over 
> hdfs snapshots.  The thing is, hdfs snapshots are read-only point-in-time 
> copies of the file system.  Thus we had to modify the section of code that 
> initialized the region internals in HRegion.   We have to skip cleanup of 
> certain directories if the HRegion is backed by a hdfs snapshot.  I have a 
> patch for trunk with some basic tests if folks are interested.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to