danny0405 commented on code in PR #12982:
URL: https://github.com/apache/hudi/pull/12982#discussion_r2029658828
##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -177,31 +178,39 @@ public List<HoodieFileGroup>
addFilesToView(List<StoragePathInfo> statuses) {
* Adds the provided statuses into the file system view for a single
partition, and also caches it inside this object.
*/
public List<HoodieFileGroup> addFilesToView(String partitionPath,
List<StoragePathInfo> statuses) {
- HoodieTimer timer = HoodieTimer.start();
- List<HoodieFileGroup> fileGroups = buildFileGroups(partitionPath,
statuses, visibleCommitsAndCompactionTimeline, true);
- long fgBuildTimeTakenMs = timer.endTimer();
- timer.startTimer();
- // Group by partition for efficient updates for both InMemory and
DiskBased structures.
-
fileGroups.stream().collect(Collectors.groupingBy(HoodieFileGroup::getPartitionPath))
- .forEach((partition, value) -> {
- if (!isPartitionAvailableInStore(partition)) {
- if (bootstrapIndex.useIndex()) {
- try (BootstrapIndex.IndexReader reader =
bootstrapIndex.createReader()) {
- LOG.info("Bootstrap Index available for partition {}",
partition);
- List<BootstrapFileMapping> sourceFileMappings =
- reader.getSourceFileMappingForPartition(partition);
- addBootstrapBaseFileMapping(sourceFileMappings.stream()
- .map(s -> new BootstrapBaseFileMapping(new
HoodieFileGroupId(s.getPartitionPath(),
- s.getFileId()), s.getBootstrapFileStatus())));
+ try {
+ writeLock.lock();
Review Comment:
Then how about this diagram:
```diff
// The query from thread2 could clean the state updated by thread1.
// wl-s: write lock start
// wl-e: write lock end
// rl-s: read lock start
// rl-e: read lock end
thread1: --- wl-s --- updae --- wl-e ------------------------------
rl-s--- query --- rl-end -
thread2: -------------------------------- wl-s--- sync() ---
wl-e----------------------------
```
Before the fix, the thread1 update and query are in whole read lock scope,
so the query integrity can be ensured, but now the update and query are broken
into 2 lock trasanctions, another thread2 can sneak in and update the state
before the query.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]