dangshazi commented on code in PR #21185:
URL: https://github.com/apache/flink/pull/21185#discussion_r1099972976


##########
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java:
##########
@@ -152,10 +202,68 @@ public ArchiveEventType getType() {
         }
     }
 
+    private void initJobCache() {
+        initArchivedJobCache();
+        initUnzippedJobCache();
+    }
+
+    private void initArchivedJobCache() {
+        if (this.webArchivedDir.list() == null) {
+            LOG.info("No legacy archived jobs");
+            return;
+        }
+        Set<String> jobInLocal =
+                
Arrays.stream(this.webArchivedDir.list()).collect(Collectors.toSet());
+        LOG.info("Reload left archived jobs : [{}]", String.join(",", 
jobInLocal));
+
+        for (HistoryServer.RefreshLocation refreshLocation : refreshDirs) {
+            Path refreshDir = refreshLocation.getPath();
+            try {
+                FileStatus[] jobArchives = 
listArchives(refreshLocation.getFs(), refreshDir);
+                Set<String> jobInRefreshLocation =
+                        Arrays.stream(jobArchives)
+                                .map(FileStatus::getPath)
+                                .map(Path::getName)
+                                .collect(Collectors.toSet());
+                jobInRefreshLocation.retainAll(jobInLocal);
+                
this.cachedArchivesPerRefreshDirectory.get(refreshDir).addAll(jobInRefreshLocation);
+            } catch (IOException e) {
+                LOG.error("Failed to reload archivedJobs in {}.", refreshDir, 
refreshDir, e);
+            }
+        }
+
+        for (String jobId : 
Objects.requireNonNull(this.webArchivedDir.list())) {
+            this.cachedArchivesPerRefreshDirectory.forEach((path, archives) -> 
archives.add(jobId));
+        }

Review Comment:
   > Why do we want to add all local archives to caches of all refresh 
directoreis?
   
   HistoryServer should reload  Job files left by last HistoryServer when it 
starting according to the design doc.
   
   `cachedArchivesPerRefreshDirectory ` maintains the downloads job archives in 
{@link HistoryServerArchiveProcessor#webArchivedDir}. So `HistoryServer`  
should 'add all local archives to caches of all refresh directoreis'



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to