jadami10 commented on a change in pull request #7961:
URL: https://github.com/apache/pinot/pull/7961#discussion_r779746410



##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java
##########
@@ -465,9 +465,7 @@ private void downloadSegmentFromDeepStore(String 
segmentName, IndexLoadingConfig
    */
   private void untarAndMoveSegment(String segmentName, IndexLoadingConfig 
indexLoadingConfig, File segmentTarFile)
       throws IOException {
-    // TODO: This could leave temporary directories in _indexDir if JVM shuts 
down before the temporary directory is

Review comment:
       added

##########
File path: 
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/ConvertToRawIndexMinionClusterIntegrationTest.java
##########
@@ -102,6 +102,10 @@ public void testConvertToRawIndexTask()
     File[] indexDirs = tableDataDir.listFiles();
     Assert.assertNotNull(indexDirs);
     for (File indexDir : indexDirs) {
+      // Skip the tmp directory since these are only temporary segments
+      if (indexDir.getName().equals("tmp")) {

Review comment:
       This should be the only thing looking at the `tableDataDir` directly 
like this. But it does worry me that there was at least 1 assumption that every 
directory in `tableDataDir` is a segment directory

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig 
tableDataManagerConfig, String instanceI
     if (!_indexDir.exists()) {
       Preconditions.checkState(_indexDir.mkdirs());
     }
+    _resourceTmpDir = new File(_indexDir, "tmp");

Review comment:
       actually, i'm going to switch the HL and LL segment managers back to 
`_tmp` and leave them as is for now to limit the scope of this PR.
   
   1) I believe our issues are coming from TableDataManager, so this should 
cover us for now.
   2) I have some concerns with the TableDataManager deleting the 
SegmentDataManager's temporary resources. I feel like we should implement this 
same logic in the SegmentDataManager or implement some sort of interface with 
"registerTemporaryResource", "cleanTemporaryResources", etc. 
   
   Either way, I think it will be best to limit this change to TableDataManager 
for now.
   
   As for `_tmp` vs `tmp`. I feel like `tmp` is better since that's the typical 
name you expect for temporary directories in software. But happy to change it 
if you feel `_tmp` works better here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to