jadami10 commented on a change in pull request #7961:
URL: https://github.com/apache/pinot/pull/7961#discussion_r778582685



##########
File path: 
pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java
##########
@@ -363,6 +377,41 @@ private void startupServiceStatusCheck(long endTimeMs) {
         ServiceStatus.getStatusDescription());
   }
 
+  /**
+   * Recursively deletes all data directories starting with tmp- last modified 
before the startTime.
+   * @param startTime start time of the application
+   * @param dataDir data directory to start from
+   */
+  @VisibleForTesting
+  public static void deleteTempFilesSinceCutoffTime(long startTime, @Nonnull 
File dataDir) {
+    if (!dataDir.exists() || !dataDir.isDirectory()) {
+      LOGGER.warn("Data directory {} does not exist or is not a directory", 
dataDir);
+      return;
+    }
+    IOFileFilter beforeStartTimeFilter = new AgeFileFilter(startTime, true);
+    IOFileFilter tmpPrefixFilter = new PrefixFileFilter("tmp-");

Review comment:
       > We may extract tmp dir creation and cleanup logic to a util class to 
manage its naming convention and location in one place, in case later on, folks 
use tmp. or temp- prefix or put them outside dataDir, making your cleanup logic 
less effective.
   
   This is all fair feedback. I thought about this and figured it was unlikely 
people were creating `tmp-` tables, but let's not leave it to chance. It seems 
we have 2 options:
   1) encode the tmp segment name with something that cannot be used for real 
table names. The only thing disallowed in code is double underscore: `// Double 
underscore is reserved for real-time segment name delimiter`
   2) put tmp segments in a different directory
   
   2 seems like the better option given that's how most uses of tmp things work 
in typical software. I believe our issue does stem from `BaseTableDataManager`, 
so how about we add a new `getTmpSegmentDataDir` function that puts things in 
`File(_indexDir, "tmp", segmentName);`? At that point, can we just clear that 
directory every time the process comes up regardless of age?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to