jadami10 commented on a change in pull request #7961:
URL: https://github.com/apache/pinot/pull/7961#discussion_r778582685
##########
File path:
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java
##########
@@ -363,6 +377,41 @@ private void startupServiceStatusCheck(long endTimeMs) {
ServiceStatus.getStatusDescription());
}
+ /**
+ * Recursively deletes all data directories starting with tmp- last modified
before the startTime.
+ * @param startTime start time of the application
+ * @param dataDir data directory to start from
+ */
+ @VisibleForTesting
+ public static void deleteTempFilesSinceCutoffTime(long startTime, @Nonnull
File dataDir) {
+ if (!dataDir.exists() || !dataDir.isDirectory()) {
+ LOGGER.warn("Data directory {} does not exist or is not a directory",
dataDir);
+ return;
+ }
+ IOFileFilter beforeStartTimeFilter = new AgeFileFilter(startTime, true);
+ IOFileFilter tmpPrefixFilter = new PrefixFileFilter("tmp-");
Review comment:
> We may extract tmp dir creation and cleanup logic to a util class to
manage its naming convention and location in one place, in case later on, folks
use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic
less effective.
This is all fair feedback. I thought about this and figured it was unlikely
people were creating `tmp-` tables, but let's not leave it to chance. It seems
we have 2 options:
1) encode the tmp segment name with something that cannot be used for real
table names. The only thing disallowed in code is double underscore: `// Double
underscore is reserved for real-time segment name delimiter`
2) put tmp segments in a different directory
2 seems like the better option given that's how most uses of tmp things work
in typical software. I believe our issue does stem from `BaseTableDataManager`,
so how about we add a new `getTmpSegmentDataDir` function that puts things in
`File(_indexDir, "tmp", segmentName);`? At that point, can we just clear that
directory every time the process comes up regardless of age?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]