[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

via GitHub Sat, 01 Jul 2023 10:48:30 -0700


virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1248909177



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -103,13 +114,17 @@ private enum LockType {
       READ,
       WRITE
     }
+    private Entry previous;

Review Comment:
   we could have also used LinkedList that Java provides, which internally does 
use doubly linked list, however the catch is, the only way we can remove an 
element from the middle of the list is by using `remove(Object o)` method and 
that does traverse the list from the head, and hence has time complexity of 
O(n), whereas by maintaining pointers ourselves, we can remove any element from 
anywhere in the list with O(1) time complexity.
   
   LinkedList 
https://docs.oracle.com/javase/8/docs/api/java/util/LinkedList.html
   ```
       /**
        * Removes the first occurrence of the specified element from this list,
        * if it is present.  If this list does not contain the element, it is
        * unchanged.  More formally, removes the element with the lowest index
        * {@code i} such that
        * 
<tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>
        * (if such an element exists).  Returns {@code true} if this list
        * contained the specified element (or equivalently, if this list
        * changed as a result of the call).
        *
        * @param o element to be removed from this list, if present
        * @return {@code true} if this list contained the specified element
        */
       public boolean remove(Object o) {
           if (o == null) {
               for (Node<E> x = first; x != null; x = x.next) {
                   if (x.item == null) {
                       unlink(x);
                       return true;
                   }
               }
           } else {
               for (Node<E> x = first; x != null; x = x.next) {
                   if (o.equals(x.item)) {
                       unlink(x);
                       return true;
                   }
               }
           }
           return false;
       }
   ```
   
   we could use `remove(int index)` but then we will end up maintaining indexes 
which is even more complicated because as any element gets added or removed to 
the list, we will have to update our entire map with the indexes of all 
affected elements and hence it ends up being O(n) operation anyways.
   
   hence, i decided to maintain doubly linked list ourselves, the algo is quite 
less error-prone (based on good amount of concurrency testing that i did).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Reply via email to