Pavel created HBASE-22072:
-----------------------------

             Summary: High read/write intensive regions may cause long crash 
recovery
                 Key: HBASE-22072
                 URL: https://issues.apache.org/jira/browse/HBASE-22072
             Project: HBase
          Issue Type: Bug
          Components: Performance, Recovery
    Affects Versions: 2.1.2
            Reporter: Pavel


Compaction of high read loaded region may leave compacted files undeleted 
because of existing scan references:

INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted file 
hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file has 
reference, isReferencedInReads=true, refCount=1, skipping for now

If region is either high write loaded this happens quite often and region may 
have few storefiles and tons of undeleted compacted hdfs files.

Region keeps all that files (in my case thousands) untill graceful region 
closing procedure, which ignores existing references and drop obsolete files. 
It works fine unless consuming some extra hdfs space, but only in case of 
normal region closing. If region server crashes than new region server, 
responsible for that overfiling region, reads hdfs folder and try to deal with 
all undeleted files, producing tons of storefiles, compaction tasks and 
consuming abnormal amount of memory, wich may lead to OutOfMemory Exception and 
further region servers crash. This stops writing to region because number of 
storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC duty 
and may take hours to compact all files into working set of files.

Workaround is a periodically check hdfs folders files count and force region 
assign for ones with too many files.

It could be nice if regionserver had a setting similar to 
hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted compacted 
files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to