advancedxy commented on code in PR #424:
URL: https://github.com/apache/incubator-uniffle/pull/424#discussion_r1050668564


##########
server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java:
##########
@@ -217,6 +244,33 @@ public void removeResources(PurgeEvent event) {
     deleteHandler.delete(deletePaths.toArray(new String[deletePaths.size()]), 
appId, user);
   }
 
+  private void cleanupStorageSelectionCache(PurgeEvent event) {
+    Function<PartitionUnionKey, Boolean> deleteConditionFunc = null;
+    if (event instanceof AppPurgeEvent) {
+      deleteConditionFunc = partitionUnionKey -> 
partitionUnionKey.carryWithAppId(event.getAppId());
+    } else if (event instanceof ShufflePurgeEvent) {
+      deleteConditionFunc =
+          partitionUnionKey -> partitionUnionKey.carryWithAppIdAndShuffleIds(
+              event.getAppId(),
+              new HashSet<>(event.getShuffleIds())

Review Comment:
   For this case, maybe it's not necessary to convert a list into a set.
   
   Since the shuffleIds in one app is limited, most probably is under 10. A for 
each search should be as performant as hash lookup.



##########
server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java:
##########
@@ -511,9 +513,15 @@ public void getLocalShuffleData(GetLocalShuffleDataRequest 
request,
     String requestInfo = "appId[" + appId + "], shuffleId[" + shuffleId + "], 
partitionId["
         + partitionId + "]" + "offset[" + offset + "]" + "length[" + length + 
"]";
 
-    shuffleServer.getStorageManager()
-        .selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId))
-        .updateReadMetrics(new StorageReadMetrics(appId, shuffleId));
+    int[] range = ShuffleStorageUtils.getPartitionRange(partitionId, 
partitionNumPerRange, partitionNum);

Review Comment:
   `ShuffleStorageUtils.getPartitionRange` is not performant... 
   
   It could be calculated directly in O(1).. However current impl is about O(n) 
where n is the number of partition ranges. when partition range size =1, n is 
corresponding to partition numbers.



##########
server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java:
##########
@@ -246,38 +300,70 @@ public void 
checkAndClearLeakedShuffleData(Collection<String> appIds) {
     }
   }
 
-  void repair() {
-    boolean hasNewCorruptedStorage = false;
-    for (LocalStorage storage : localStorages) {
-      if (storage.isCorrupted() && 
!corruptedStorages.contains(storage.getBasePath())) {
-        hasNewCorruptedStorage = true;
-        corruptedStorages.add(storage.getBasePath());
-      }
+  public List<LocalStorage> getStorages() {
+    return localStorages;
+  }
+
+  static class PartitionUnionKey {

Review Comment:
   let's move this into a new file?
   
   I'm not sure which is memory efficient: PartitionUnionKey class or simply a 
String: "appId_shuffleId_partitionId"



##########
server/src/main/java/org/apache/uniffle/server/ShuffleServerGrpcService.java:
##########
@@ -511,9 +513,15 @@ public void getLocalShuffleData(GetLocalShuffleDataRequest 
request,
     String requestInfo = "appId[" + appId + "], shuffleId[" + shuffleId + "], 
partitionId["
         + partitionId + "]" + "offset[" + offset + "]" + "length[" + length + 
"]";
 
-    shuffleServer.getStorageManager()
-        .selectStorage(new ShuffleDataReadEvent(appId, shuffleId, partitionId))
-        .updateReadMetrics(new StorageReadMetrics(appId, shuffleId));
+    int[] range = ShuffleStorageUtils.getPartitionRange(partitionId, 
partitionNumPerRange, partitionNum);

Review Comment:
   This should be addressed in another pr.



##########
server/src/main/java/org/apache/uniffle/server/ShuffleDataReadEvent.java:
##########
@@ -17,16 +17,20 @@
 
 package org.apache.uniffle.server;
 
+import org.apache.uniffle.common.PartitionRange;
+
 public class ShuffleDataReadEvent {
 
   private String appId;
   private int shuffleId;
-  private int startPartition;
+  private int partitionId;
+  private PartitionRange partitionRange;
 
-  public ShuffleDataReadEvent(String appId, int shuffleId, int startPartition) 
{
+  public ShuffleDataReadEvent(String appId, int shuffleId, int partitionId, 
int[] range) {

Review Comment:
   Then an int of `startPartition` should be sufficient, just like how 
`ShuffleDataFlushEvent` did.
   
   A class in JVM is quite expressive compared to native languages, the 
partition range class would occupied ~40 bytes per instance. And it would put 
more pressure to GC.



##########
server/src/main/java/org/apache/uniffle/server/storage/LocalStorageManager.java:
##########
@@ -139,32 +140,55 @@ public class LocalStorageManager extends 
SingleStorageManager {
 
   @Override
   public Storage selectStorage(ShuffleDataFlushEvent event) {
-    LocalStorage storage = 
localStorages.get(ShuffleStorageUtils.getStorageIndex(
-        localStorages.size(),
-        event.getAppId(),
-        event.getShuffleId(),
-        event.getStartPartition()));
-    if (storage.containsWriteHandler(event.getAppId(), event.getShuffleId(), 
event.getStartPartition())
-        && storage.isCorrupted()) {
-      LOG.error("storage " + storage.getBasePath() + " is corrupted");
-    }
-    if (storage.isCorrupted()) {
-      storage = getRepairedStorage(event.getAppId(), event.getShuffleId(), 
event.getStartPartition());
+    String appId = event.getAppId();
+    int shuffleId = event.getShuffleId();
+    int partitionId = event.getStartPartition();
+
+    try {
+      LocalStorage storage = 
partitionsOfStorage.get(appId).get(shuffleId).get(partitionId);
+      if (storage.isCorrupted()) {
+        throw new RuntimeException("LocalStorage: " + storage.getBasePath() + 
" is corrupted.");
+      }
+      return storage;
+    } catch (NullPointerException npe) {
+      // Ignore
     }
+
+    // Firstly getting the storage based on its (appId, shuffleId, 
partitionId) hash value
+    LocalStorage storage =
+        localStorages
+            .stream()
+            .filter(x -> x.canWrite() && !x.isCorrupted())
+            .collect(Collectors.toList())
+            .get(
+                ShuffleStorageUtils.getStorageIndex(
+                    localStorages.size(),
+                    appId,
+                    shuffleId,
+                    partitionId
+                )
+            );
     event.setUnderStorage(storage);
+
+    // store it to cache.
+    partitionsOfStorage.putIfAbsent(appId, Maps.newConcurrentMap());
+    partitionsOfStorage.get(appId).putIfAbsent(shuffleId, 
Maps.newConcurrentMap());
+    partitionsOfStorage.get(appId).get(shuffleId).put(partitionId, storage);
     return storage;
   }
 
   @Override
   public Storage selectStorage(ShuffleDataReadEvent event) {
+    String appId = event.getAppId();
+    int shuffleId = event.getShuffleId();
+    int partitionId = event.getStartPartition();
 
-    LocalStorage storage = 
localStorages.get(ShuffleStorageUtils.getStorageIndex(
-        localStorages.size(),
-        event.getAppId(),
-        event.getShuffleId(),
-        event.getStartPartition()));
-    if (storage.isCorrupted()) {
-      storage = getRepairedStorage(event.getAppId(), event.getShuffleId(), 
event.getStartPartition());
+    LocalStorage storage = null;
+    try {
+      storage = partitionsOfStorage.get(appId).get(shuffleId).get(partitionId);
+    } catch (NullPointerException npe) {

Review Comment:
   Looks like this code is still wrong. By using `partitionsOfStorage`, there 
would be no `NPE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to