Github user kkhatua commented on the issue: https://github.com/apache/drill/pull/755 The choice for a `TreeSet` is to basically use a binary structure that keeps the (maximum permitted) profiles sorted and in memory. When Drill detect changes, (Refer https://github.com/kkhatua/drill/blob/f7ad29b9a322bb215d16b3c3b9a2bfc40abfc1ed/exec/java-exec/src/main/java/org/apache/drill/exec/store/sys/store/LocalPersistentStore.java#L146) it will fetch all the available profiles in the PStore and reconstruct the tree (since the order of the profiles returned by the `FileSystem` is not guaranteed). I tried using the `PathFilter` to fetch only new profiles, but the cost of the `FileSystem` fetching only new profiles, versus the entire list is the same! Also, there is the possibility that some profiles might have been deleted as new ones were added, so a full reconstruction would take care of that scenario as well. To evict, as I construct the TreeSet, I simply pop the oldest (by filename) entry. The Guava cache options don't seem to provide a way to define the basis on which to evict entries. I believe, @vrozov's work on DRILL-6053 is to address locking during writes specifically. The lock I used (and need) is for reads to ensure that multiple requests don't trigger an expensive FileSystem call for the same state of the PStore. e.g. consider T# as timestamps * `currBasePathModified` = T0 * _ThreadA_ requests at t=T1 and issues a read-lock * _ThreadB_ requests at t=T2 but is waiting for read-lock If the tree exists and no change is detected, _ThreadA_ will use the `TreeSet` contents and resume by releasing the lock. If the `TreeSet` exists and a change is detected, _ThreadA_ will reconstruct the `TreeSet` before using its contents and it will update `lastBasePathModified`, before releasing the lock. When _ThreadB_ gets the read-lock, it discovers that during the wait, the `TreeSet` was already updated. So, in terms of t=T2, this is the most recent snapshot, so it proceeds to use the treeSet's contents rather than reconstruct. That will be deferred to the next request. We're using the `lastBasePathModified` as a way to provide a pseudo-versioned access to the list. That means if there are more profiles added *after* _ThreadB_ was waiting for the read-lock, it will not trigger the `FileSystem` call right away.
---