Glavo commented on code in PR #378:
URL: https://github.com/apache/commons-compress/pull/378#discussion_r1188988481


##########
src/main/java/org/apache/commons/compress/archivers/zip/ZipFile.java:
##########
@@ -792,11 +798,24 @@ private BoundedArchiveInputStream 
createBoundedInputStream(final long start, fin
 
     private void fillNameMap() {
         entries.forEach(ze -> {
-            // entries is filled in populateFromCentralDirectory and
+            // entries are filled in populateFromCentralDirectory and
             // never modified
             final String name = ze.getName();
-            final LinkedList<ZipArchiveEntry> entriesOfThatName = 
nameMap.computeIfAbsent(name, k -> new LinkedList<>());
-            entriesOfThatName.addLast(ze);
+            ZipArchiveEntry firstEntry = nameMap.putIfAbsent(name, ze);
+
+            if (firstEntry != null) {
+                if (duplicateNameMap == null) {
+                    duplicateNameMap = new HashMap<>();
+                }
+
+                List<ZipArchiveEntry> entriesOfThatName = 
duplicateNameMap.computeIfAbsent(name, k -> {
+                    ArrayList<ZipArchiveEntry> list = new ArrayList<>(2);

Review Comment:
   The reason for using `ArrayList` instead of `LinkedList` here is that 
`ArrayList` is almost always better than `LinkedList`.
   
   In general, we can calculate the memory footprint of a lists with `N` 
elements like this:
   
   * LinkedList: `24 + 24*N`
   * ArrayList: 
     * Minimal (internal array is fully used, no wasted space): 
       `24 (ArrayList) + 16 (array object header) + 4*N`
     * Maximum (internal array has just grown, the reserved space is the 
largest):
       `24 (ArrayList) + 16 (array object header) + 6*N`
     
   If the list has only two elements, `ArrayList` with 2 as the initial 
capacity saves 24 bytes of memory compared to `LinkedList`.
   
   As the list grows, the memory footprint of `ArrayList` is usually only 
1/6~1/4 of that of `LinkedList`. 
   
   The combined memory footprint of all objects allocated during `ArrayList` 
construction (including those small temporary arrays) is smaller than the 
long-term memory footprint of `LinkedList`, so using `ArrayList` is more memory 
efficient.
   
   For smaller lists, iterating over a `LinkedList` might be faster, but 
`ArrayList` has better data locality. So usually we have no reason to use 
`LinkedList`, `ArrayList`/`ArrayDeque` are better choices.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to