advancedxy commented on code in PR #424:
URL: https://github.com/apache/incubator-uniffle/pull/424#discussion_r1054283884


##########
server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java:
##########
@@ -139,33 +143,53 @@ public class LocalStorageManager extends 
SingleStorageManager {
 
   @Override
   public Storage selectStorage(ShuffleDataFlushEvent event) {
-    LocalStorage storage = 
localStorages.get(ShuffleStorageUtils.getStorageIndex(
-        localStorages.size(),
-        event.getAppId(),
-        event.getShuffleId(),
-        event.getStartPartition()));
-    if (storage.containsWriteHandler(event.getAppId(), event.getShuffleId(), 
event.getStartPartition())
-        && storage.isCorrupted()) {
-      LOG.error("storage " + storage.getBasePath() + " is corrupted");
-    }
-    if (storage.isCorrupted()) {
-      storage = getRepairedStorage(event.getAppId(), event.getShuffleId(), 
event.getStartPartition());
+    String appId = event.getAppId();
+    int shuffleId = event.getShuffleId();
+    int partitionId = event.getStartPartition();
+
+    LocalStorage storage = partitionsOfStorage.get(UnionKey.toKey(appId, 
shuffleId, partitionId));
+    if (storage != null) {
+      if (storage.isCorrupted()) {
+        if (storage.containsWriteHandler(appId, shuffleId, partitionId)) {
+          throw new RuntimeException("LocalStorage: " + storage.getBasePath() 
+ " is corrupted.");
+        }
+      } else {
+        return storage;
+      }
     }
-    event.setUnderStorage(storage);
-    return storage;
+
+    List<LocalStorage> candidates = localStorages
+        .stream()
+        .filter(x -> x.canWrite() && !x.isCorrupted())
+        .collect(Collectors.toList());
+    final LocalStorage selectedStorage = candidates.get(
+        ShuffleStorageUtils.getStorageIndex(
+            candidates.size(),
+            appId,
+            shuffleId,
+            partitionId
+        )
+    );
+    return partitionsOfStorage.compute(
+        UnionKey.toKey(appId, shuffleId, partitionId),
+        (key, localStorage) -> {
+          // If this is the first time to select storage or existing storage 
is corrupted,
+          // we should refresh the cache.
+          if (localStorage == null || localStorage.isCorrupted()) {

Review Comment:
   > For an event, if the storage is selected but event don't write any data to 
this storage (maybe the event is in pending queue), that means we could replace 
it to new storage, which won't cause data lost.
   
   This only make sense when 
[L154-L156](https://github.com/apache/incubator-uniffle/pull/424/files#diff-74b29a246646d7b44aa81c4436760fda3edc5f645a3c576dd1b50405848db2a6R154-R156)
 doesn't throw an exception, right?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to