[
https://issues.apache.org/jira/browse/HADOOP-18291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736687#comment-17736687
]
ASF GitHub Bot commented on HADOOP-18291:
-----------------------------------------
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240673021
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
throw new IllegalStateException(String.format("block %d not found in
cache", blockNumber));
}
numGets++;
+ addToHeadOfLinkedList(entry);
return entry;
}
+ /**
+ * Add the given entry to the head of the linked list.
+ *
+ * @param entry Block entry to add.
+ */
+ private void addToHeadOfLinkedList(Entry entry) {
+ blocksLock.writeLock().lock();
+ try {
+ if (head == null) {
+ head = entry;
+ tail = entry;
+ }
+ if (entry != head) {
+ Entry prev = entry.getPrevious();
+ Entry nxt = entry.getNext();
+ if (prev != null) {
+ prev.setNext(nxt);
+ }
+ if (nxt != null) {
+ nxt.setPrevious(prev);
+ }
+ entry.setPrevious(null);
+ entry.setNext(head);
+ head.setPrevious(entry);
+ head = entry;
+ }
+ if (tail != null) {
+ while (tail.getNext() != null) {
+ tail = tail.getNext();
+ }
+ }
Review Comment:
Okay, but when we are adding the node to the head, doesn't it make more
sense to check if the current node is tail, get the previous of this, and set
that to tail? This would work, was just interested if we can avoid traversing
the list 🤔
> S3A prefetch - Implement LRU cache for SingleFilePerBlockCache
> --------------------------------------------------------------
>
> Key: HADOOP-18291
> URL: https://issues.apache.org/jira/browse/HADOOP-18291
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: 3.4.0
> Reporter: Ahmar Suhail
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
>
> Currently there is no limit on the size of disk cache. This means we could
> have a large number of files on files, especially for access patterns that
> are very random and do not always read the block fully.Â
> Â
> eg:
> in.seek(5);
> in.read();Â
> in.seek(blockSize + 10) // block 0 gets saved to disk as it's not fully read
> in.read();
> in.seek(2 * blockSize + 10) // block 1 gets saved to disk
> .. and so on
> Â
> The in memory cache is bounded, and by default has a limit of 72MB (9
> blocks). When a block is fully read, and a seek is issued it's released
> [here|https://github.com/apache/hadoop/blob/feature-HADOOP-18028-s3a-prefetch/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/S3CachingInputStream.java#L109].
> We can also delete the on disk file for the block here if it exists.Â
> Â
> Also maybe add an upper limit on disk space, and delete the file which stores
> data of the block furthest from the current block (similar to the in memory
> cache) when this limit is reached.Â
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]