jvarenina commented on a change in pull request #7145:
URL: https://github.com/apache/geode/pull/7145#discussion_r763045860



##########
File path: geode-core/src/main/java/org/apache/geode/internal/cache/Oplog.java
##########
@@ -1303,16 +1305,18 @@ void close(DiskRegion dr) {
     // while a krf is being created can not close a region
     lockCompactor();
     try {
-      addUnrecoveredRegion(dr.getId());
-      DiskRegionInfo dri = getDRI(dr);
-      if (dri != null) {
-        long clearCount = dri.clear(null);
-        if (clearCount != 0) {
-          totalLiveCount.addAndGet(-clearCount);
-          // no need to call handleNoLiveValues because we now have an
-          // unrecovered region.
+      if (!isDrfOnly()) {

Review comment:
       We have put check `ifDrfOnly()` after the `lockCompactor()` on purpose 
because of the following case:
   
   Oplog files (.crf, .drf and .krf) are in a process of compaction and 
`lockCompactor()` is acquired. If we check `isDrfOnly()` before trying to 
acquire the lock, then it would most probably return false, because all Oplog 
files (.crf, .drf and .krf) exists during compaction (files are deleted as a 
last step of compaction, so it possible for check to return true only during 
that period). After check is performed the thread will continue and block (on 
`lockCompactor()`) until compaction is done. Compaction could then produce the 
.drf only file and release the lock, and in that case thread would resume with 
executing unnecessary code.
   
   Because we put the the `ifDrfOnly() `after `lockCompactor()`, the thread 
will first block until compaction is done. If compaction produces only .drf 
file, then `ifDrfOnly()` will be checked after acquiring the lock and 
unnecessary code will be skipped.
   
   We think that it is safe to skip `addUnrecoveredRegion()` when Oplog is only 
.drf file. This is because only .drf Oplog cannot contain live entries and 
therefore cannot contain unrecoverable live region entries. We would expect 
that having unrecoverable region live entry's is only possible when there is 
.crf file available (please have in mind that .crf file is deleted due to 
compaction when there is only .drf file). And this would only happen when one 
of the multiple regions that has been using this disk-store Oplog is not yet 
recovered. In that case it is not possible to perform compaction as geode did 
not load the the unrecoverable region data from .crf file in memory and it 
doesn't have live entries references to do the compaction (please check linked 
list `DiskRegionInfoWithList.liveEntries`). Not sure if we got this case right 
and if there is something that we misunderstood?
   
   We also think that it is not necessary to place additional check for 
`ifDrfOnly() ` on places where unrecoveredRegionCount > 0 is checked (e.g. 
`hasNoLiveValues() and needsCompaction()`). If compaction produced only .drf 
file then both `totalLiveCount` and `unrecoveredRegionCount` will be 0 for that 
Oplog object. If `unrecoveredRegionCount` is greater then 0 than compaction 
will not be done in the first place, and .crf would not be deleted.
   
   1. Function `hasNoLiveValues(`) -> In this function there is already check 
whether totalLiveCount.get() <= 0. This condition will be always **true** for 
the only .drf files.
   
   2. Function `needsCompaction()` -> I'm not sure in which case geode will try 
to mark only .drf file as ready for compaction. Even if this happens then 
compactor thread will check if `hasNoLiveValues()` is true and then it will 
call `handleNoLiveValues()`. This function will then (if this is oldest only 
.drf Oplog) delete all only .drf files that don't have .crf file with older Id. 
Otherwise (if it is not oldest only .drf file) it would just call 
`handleEmpty()` which would then cancel creation of .krf and will just return 
as only .drf Oplog is already closed. I'm not sure if we will break something 
if we add this additional check in `needsCompaction()`. Please check code below:
   
   ```
   void handleNoLiveValues() {
     if (!doneAppending) {
       return;
     }
     if (hasNoLiveValues()) {
       if (LocalRegion.ISSUE_CALLBACKS_TO_CACHE_OBSERVER) {
         if (calledByCompactorThread()) {
           // after compaction, remove the oplog from the list & destroy it
           CacheObserverHolder.getInstance().beforeDeletingCompactedOplog(this);
         } else {
           CacheObserverHolder.getInstance().beforeDeletingEmptyOplog(this);
         }
       }
       if (isOldest()) {
         if (calledByCompactorThread()) {
           // do it in this compactor thread
           handleEmptyAndOldest(true);
         } else {
           // schedule another thread to do it
           getParent().executeDiskStoreTask(() -> handleEmptyAndOldest(false));
         }
       } else {
         if (calledByCompactorThread()) {
           // do it in this compactor thread
           handleEmpty(true);
         } else {
           // schedule another thread to do it
           getParent().executeDiskStoreTask(() -> handleEmpty(false));
         }
       }
   
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to