the-other-tim-brown commented on code in PR #12982:
URL: https://github.com/apache/hudi/pull/12982#discussion_r2028007767


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -177,31 +178,39 @@ public List<HoodieFileGroup> 
addFilesToView(List<StoragePathInfo> statuses) {
    * Adds the provided statuses into the file system view for a single 
partition, and also caches it inside this object.
    */
   public List<HoodieFileGroup> addFilesToView(String partitionPath, 
List<StoragePathInfo> statuses) {
-    HoodieTimer timer = HoodieTimer.start();
-    List<HoodieFileGroup> fileGroups = buildFileGroups(partitionPath, 
statuses, visibleCommitsAndCompactionTimeline, true);
-    long fgBuildTimeTakenMs = timer.endTimer();
-    timer.startTimer();
-    // Group by partition for efficient updates for both InMemory and 
DiskBased structures.
-    
fileGroups.stream().collect(Collectors.groupingBy(HoodieFileGroup::getPartitionPath))
-        .forEach((partition, value) -> {
-          if (!isPartitionAvailableInStore(partition)) {
-            if (bootstrapIndex.useIndex()) {
-              try (BootstrapIndex.IndexReader reader = 
bootstrapIndex.createReader()) {
-                LOG.info("Bootstrap Index available for partition {}", 
partition);
-                List<BootstrapFileMapping> sourceFileMappings =
-                    reader.getSourceFileMappingForPartition(partition);
-                addBootstrapBaseFileMapping(sourceFileMappings.stream()
-                    .map(s -> new BootstrapBaseFileMapping(new 
HoodieFileGroupId(s.getPartitionPath(),
-                        s.getFileId()), s.getBootstrapFileStatus())));
+    try {
+      writeLock.lock();

Review Comment:
   1. Clarifications: The write-lock is only taken when updating the state. If 
there are no updates because the partition is already loaded, then the 
write-lock is never taken. Currently in these flows where the partitions are 
loaded, there is already synchronization. Changing this to use a read-lock 
allows concurrent access for reads. Are you referring to the actual `sync` 
method when mentioning `#sync`? I can only find two references to it so I am 
confused what you mean by `most of the query APIs trigger at the very first 
place`
   2. This diagram does not make sense to me, the read-lock cannot be acquired 
at the same time as the write-lock.
   3. +1 to this, I think that this locking and other logic should be handled 
in this class the more I dig into it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to