dangshazi commented on code in PR #21185:
URL: https://github.com/apache/flink/pull/21185#discussion_r1099965280


##########
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java:
##########
@@ -152,10 +202,68 @@ public ArchiveEventType getType() {
         }
     }
 
+    private void initJobCache() {
+        initArchivedJobCache();
+        initUnzippedJobCache();
+    }
+
+    private void initArchivedJobCache() {
+        if (this.webArchivedDir.list() == null) {
+            LOG.info("No legacy archived jobs");
+            return;
+        }
+        Set<String> jobInLocal =
+                
Arrays.stream(this.webArchivedDir.list()).collect(Collectors.toSet());
+        LOG.info("Reload left archived jobs : [{}]", String.join(",", 
jobInLocal));
+
+        for (HistoryServer.RefreshLocation refreshLocation : refreshDirs) {
+            Path refreshDir = refreshLocation.getPath();
+            try {
+                FileStatus[] jobArchives = 
listArchives(refreshLocation.getFs(), refreshDir);
+                Set<String> jobInRefreshLocation =
+                        Arrays.stream(jobArchives)
+                                .map(FileStatus::getPath)
+                                .map(Path::getName)
+                                .collect(Collectors.toSet());
+                jobInRefreshLocation.retainAll(jobInLocal);
+                
this.cachedArchivesPerRefreshDirectory.get(refreshDir).addAll(jobInRefreshLocation);
+            } catch (IOException e) {
+                LOG.error("Failed to reload archivedJobs in {}.", refreshDir, 
refreshDir, e);
+            }
+        }
+
+        for (String jobId : 
Objects.requireNonNull(this.webArchivedDir.list())) {
+            this.cachedArchivesPerRefreshDirectory.forEach((path, archives) -> 
archives.add(jobId));
+        }

Review Comment:
   > Thanks for opening this PR, @dangshazi. I apologize for keeping you 
waiting so long before reviewing this.
   > 
   > I have left some comments. I think the biggest problem of the current PR 
is the readability. Some key fields / terminologies are declared / used without 
explanations, making the codes hard to understand. I'm not sure whether I have 
fully understand the logics, thus cannot decide whether the changes are correct.
   
   I have updated related comments and added the Design doc: [History Server 
support lazy 
unzip](https://docs.google.com/document/d/1o7YgXhHJxsObkduHLsr4YSwS8T-mo-tzLWwpsRceMNc/edit?usp=sharing)
 in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to