Pavel created HBASE-22072:
-----------------------------
Summary: High read/write intensive regions may cause long crash
recovery
Key: HBASE-22072
URL: https://issues.apache.org/jira/browse/HBASE-22072
Project: HBase
Issue Type: Bug
Components: Performance, Recovery
Affects Versions: 2.1.2
Reporter: Pavel
Compaction of high read loaded region may leave compacted files undeleted
because of existing scan references:
INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted file
hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file has
reference, isReferencedInReads=true, refCount=1, skipping for now
If region is either high write loaded this happens quite often and region may
have few storefiles and tons of undeleted compacted hdfs files.
Region keeps all that files (in my case thousands) untill graceful region
closing procedure, which ignores existing references and drop obsolete files.
It works fine unless consuming some extra hdfs space, but only in case of
normal region closing. If region server crashes than new region server,
responsible for that overfiling region, reads hdfs folder and try to deal with
all undeleted files, producing tons of storefiles, compaction tasks and
consuming abnormal amount of memory, wich may lead to OutOfMemory Exception and
further region servers crash. This stops writing to region because number of
storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC duty
and may take hours to compact all files into working set of files.
Workaround is a periodically check hdfs folders files count and force region
assign for ones with too many files.
It could be nice if regionserver had a setting similar to
hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted compacted
files if number of files reaches this setting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)