[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

Pavel (JIRA) Wed, 17 Apr 2019 02:54:22 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819903#comment-16819903
 ]


Pavel commented on HBASE-22072:
-------------------------------

[~ram_krish] thanks for the work made, patch looks good, it was included to 
build and deployed to production. With debug log level turned I have both cases:
{noformat}
11:44:32.727 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:46:03.882 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:46:03.882 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already has 
the close lock. There is no need to updateReaders
11:46:32.990 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:48:03.872 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:48:32.878 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:48:32.880 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:50:33.286 [MemStoreFlusher.0] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:52:32.487 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:52:32.487 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:52:32.492 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:54:18.280 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already has 
the close lock. There is no need to updateReaders
11:55:32.467 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders
11:55:32.471 [MemStoreFlusher.1] DEBUG 
org.apache.hadoop.hbase.regionserver.StoreScanner - StoreScanner already 
closing. There is no need to updateReaders{noformat}
More common, than flusher tries to updateReaders for StoreScanner, which is 
already closed.
 And less common than closing is in progress.

Finally regionservers got rid of compacted obsolete storefiles. Victory!

Could you please clarify if *StoreScanner private boolean closing = false;* has 
to volatile or not for the first case.
 Is it possible if other thread, performing updateReaders, see *closing* flag 
still false after StoreScanner#close acomplished?

> High read/write intensive regions may cause long crash recovery
> ---------------------------------------------------------------
>
>                 Key: HBASE-22072
>                 URL: https://issues.apache.org/jira/browse/HBASE-22072
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance, Recovery
>    Affects Versions: 2.1.2
>            Reporter: Pavel
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Major
>         Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

Reply via email to