jadami10 commented on a change in pull request #7961:
URL: https://github.com/apache/pinot/pull/7961#discussion_r779746410
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java
##########
@@ -465,9 +465,7 @@ private void downloadSegmentFromDeepStore(String
segmentName, IndexLoadingConfig
*/
private void untarAndMoveSegment(String segmentName, IndexLoadingConfig
indexLoadingConfig, File segmentTarFile)
throws IOException {
- // TODO: This could leave temporary directories in _indexDir if JVM shuts
down before the temporary directory is
Review comment:
added
##########
File path:
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/ConvertToRawIndexMinionClusterIntegrationTest.java
##########
@@ -102,6 +102,10 @@ public void testConvertToRawIndexTask()
File[] indexDirs = tableDataDir.listFiles();
Assert.assertNotNull(indexDirs);
for (File indexDir : indexDirs) {
+ // Skip the tmp directory since these are only temporary segments
+ if (indexDir.getName().equals("tmp")) {
Review comment:
This should be the only thing looking at the `tableDataDir` directly
like this. But it does worry me that there was at least 1 assumption that every
directory in `tableDataDir` is a segment directory
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
##########
@@ -102,6 +103,10 @@ public void init(TableDataManagerConfig
tableDataManagerConfig, String instanceI
if (!_indexDir.exists()) {
Preconditions.checkState(_indexDir.mkdirs());
}
+ _resourceTmpDir = new File(_indexDir, "tmp");
Review comment:
actually, i'm going to switch the HL and LL segment managers back to
`_tmp` and leave them as is for now to limit the scope of this PR.
1) I believe our issues are coming from TableDataManager, so this should
cover us for now.
2) I have some concerns with the TableDataManager deleting the
SegmentDataManager's temporary resources. I feel like we should implement this
same logic in the SegmentDataManager or implement some sort of interface with
"registerTemporaryResource", "cleanTemporaryResources", etc.
Either way, I think it will be best to limit this change to TableDataManager
for now.
As for `_tmp` vs `tmp`. I feel like `tmp` is better since that's the typical
name you expect for temporary directories in software. But happy to change it
if you feel `_tmp` works better here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]