yzang commented on a change in pull request #913: Refactor FileInfo locking and
refcounting out IndexPersistenceMgr
URL: https://github.com/apache/bookkeeper/pull/913#discussion_r158849128
##########
File path:
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/IndexPersistenceMgr.java
##########
@@ -174,90 +170,57 @@ public Number getSample() {
return builder.build();
}
+ private File createFileInfoBackingFile(long ledger, boolean
createIfMissing) throws IOException {
+ File lf = findIndexFile(ledger);
+ if (null == lf) {
+ if (!createIfMissing) {
+ throw new Bookie.NoLedgerException(ledger);
+ }
+ // We don't have a ledger index file on disk or in cache, so
create it.
+ lf = getNewLedgerIndexFile(ledger, null);
+ }
+ return lf;
+ }
+
/**
* When a ledger is evicted, we need to make sure there's no other thread
* trying to get FileInfo for that ledger at the same time when we close
* the FileInfo.
*/
- private void handleLedgerEviction(RemovalNotification<Long, FileInfo>
notification) {
- FileInfo fileInfo = notification.getValue();
+ private void handleLedgerEviction(RemovalNotification<Long,
CachedFileInfo> notification) {
+ CachedFileInfo fileInfo = notification.getValue();
Long ledgerId = notification.getKey();
if (null == fileInfo || null == notification.getKey()) {
return;
}
if (notification.wasEvicted()) {
evictedLedgersCounter.inc();
- // we need to acquire the write lock in another thread,
- // otherwise there could be dead lock happening.
- evictionThreadPool.execute(() -> {
- fileInfoLock.writeLock().lock();
- try {
- // We only close the fileInfo when we evict the FileInfo
from both cache
- if (!readFileInfoCache.asMap().containsKey(ledgerId)
- &&
!writeFileInfoCache.asMap().containsKey(ledgerId)) {
- fileInfo.close(true);
- }
- } catch (IOException e) {
- LOG.error("Exception closing file info when ledger {} is
evicted from file info cache.",
- ledgerId, e);
- } finally {
- fileInfoLock.writeLock().unlock();
- }
- });
}
fileInfo.release();
Review comment:
Here handleEviction is not synchronized with getFileInfo which will cause
race condition.
Race Condition:
1. Thread A just successfully get a FileInfo from cache but haven't
increment the refCount yet at line 221 "fi = readFileInfoCache.get(ledger,
loader);"
2. Thread B is trying to get another FileInfo for read and cache is full, so
it triggers eviction to evict the fileInfo which is referenced by Thread A.
3. Thread B continues to evict FileInfo and find out the refCount=0, so it
close the FileInfo.
4. Thread A now returns the FileInfo which is already closed by Thread B.
5. java.nio.channels.ClosedChannelException will be thrown and cause serious
issues
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services