[
https://issues.apache.org/jira/browse/HBASE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990068#comment-16990068
]
Viraj Jasani edited comment on HBASE-23349 at 12/8/19 2:07 PM:
---------------------------------------------------------------
Thanks [~ram_krish]
What you are suggesting is notify scanner to reset by discharger thread itself
(scanner reset used to happen only before HBASE-13082 right?) Also, as of now
notifyChangedReadersObservers() is called only during store flush, so if we
start using it by discharger thread, the thread should try to reset heap for
all the scanners and not for specific one right? Since the thread might not
have context of a specific scanner. And if so, we can directly reset refCount
to 0 for all compacted away store files for a given store? Because once we
reset heap and lastTop, may be we don't need to worry about refCount?
By default, if archival of store files can't be done due to refCount > 0 till 2
min of discharger thread run, the thread should immediately run
notifyChangedReadersObservers() which should reset heap for all existing
scanners. Please let me know if my understanding is correct.
Edit: If we notify all open scanners to reset themselves, even scanners reading
non-compacted store files would be impacted right? At that moment, all open
scanners would take little longer than usual regardless of whether they are
using compacted away files to scan records? This should be fine?
Also, I was trying to check this notify code and it seems it is difficult to
know which StoreScanner is currently holding lock on impacted(compacted away)
store files. If we know this, probably we might have better implementation
where we reset heap in StoreScanner only for those who are using compacted away
store files and not reset heap for all StoreScanners.
was (Author: vjasani):
Thanks [~ram_krish]
What you are suggesting is notify scanner to reset by discharger thread itself
(scanner reset used to happen only before HBASE-13082 right?) Also, as of now
notifyChangedReadersObservers() is called only during store flush, so if we
start using it by discharger thread, the thread should try to reset heap for
all the scanners and not for specific one right? Since the thread might not
have context of a specific scanner. And if so, we can directly reset refCount
to 0 for all compacted away store files for a given store? Because once we
reset heap and lastTop, may be we don't need to worry about refCount?
To sum up, by default, if archival of store files can't be done due to refCount
> 0 till 2 min of discharger thread run, the thread should immediately run
notifyChangedReadersObservers() which should reset heap for all existing
scanners. Please let me know if my understanding is not correct.
In a way this could take care of open scanners gracefully.
Edit: If we notify all open scanners to reset themselves, even scanners reading
non-compacted store files would be impacted right? At that moment, all open
scanners would take little longer than usual regardless of whether they are
using compacted away files to scan records? This should be fine?
Also, I was trying to check this notify code and it seems it is difficult to
know which StoreScanner is currently holding lock on impacted(compacted away)
store files. If we know this, probably we might have better implementation
where we reset heap in StoreScanner only for those who are using compacted away
store files and not reset heap for all StoreScanners.
> Reader lock on compacted store files preventing archival of compacted files
> ---------------------------------------------------------------------------
>
> Key: HBASE-23349
> URL: https://issues.apache.org/jira/browse/HBASE-23349
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 3.0.0, 2.3.0, 1.6.0
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
>
> refCounts on compacted away store files as low as 1 can also prevent archival.
> {code:java}
> regionserver.HStore - Can't archive compacted file
> hdfs://{{root-dir}}/hbase/data/default/t1/12a9e1112e0371955b3db8d3ebb2d298/cf1/73b72f5ddfce4a34a9e01afe7b83c1f9
> because of either isCompactedAway=true or file has reference,
> isReferencedInReads=true, refCount=1, skipping for now.
> {code}
> We should come up with core code blocking reader lock if client or
> coprocessor has held the lock for significantly high amount of
> time(configurable - mostly same as discharger thread interval) or gracefully
> resolve reader lock issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)