Hi,

I'm not sure where you ended up in this succession of messages. I do think there are some things going on that are worthy of discussion and possibly fixing. Let me try to break them down.

1) With the default load factor of 0.75, it's possible to have 12 entries in a map whose table length is 16. This works as expected if the entries are added individually using the put() method. However, adding 12 entries into a map via the copy constructor or via putAll() results in a table length of 32. That seems wrong. Well, if not exactly wrong, it's suboptimal, in that it uses more space than necessary.

2) The root cause of the problem is with expressions such as these:

    (int) (expected / 0.75f + 1.0f)
or
    (int) (expected / 0.75f) + 1

(where "expected" is the expected number of entries). They attempt to round the division result up to the nearest int by adding 1 or 1.0f. This results in a value that's one too large if the result of the division exactly equals an integer. So, when "expected" is 12, this results in 17 instead of 16. This in turn is inflated to the next power-of-two, which is 32.

3) This was discussed previously in a different context. [1] I agree that it would be preferable for the expression to be something like this:

    (int) Math.ceil(expected / 0.75)

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066258.html

This uses doubles instead of floats. But I think this is a fairly low-frequency operation, so I don't think using doubles is a problem. Thus I don't think we need to add a Math.ceil(float) overload.

**

So what should be done about this? Possibly, the computations in HashMap and related classes should be adjusted to use Math.ceil() instead. Some tests use the suboptimal calculation as well; those might need to be adjusted also.

There are a variety of places around the JDK that use a similar expression in order to pre-size HashMaps and HashSets. We could go around and fix all them, but it might be worth considering adding an API that creates a HashMap (or LinkedHashMap) that is pre-sized for the expected number of elements. Guava has such an API, Maps.newHashMapWithExpectedSize [2]. But note that it tries (but does not guarantee) to size the map such that it can accommodate the expected number of elements without resizing, but it doesn't promise that the map isn't oversized.

[2] https://guava.dev/releases/31.0.1-jre/api/docs/com/google/common/collect/Maps.html#newHashMapWithExpectedSize(int)

(Also note that the ConcurrentHashMap(int) constructor takes the expected number of elements, not the initial table length like HashMap(int). CHM does what probably most users want, but it's confusing that its behavior differs from HashMap in this regard.)

If we don't add a Java SE utility like "newHashMapWithExpectedSize", fallbacks would be to introduce an internal JDK utility method that is called from various places, or just to update the individual call sites to use the more precise calculation.

s'marks


On 2/4/22 10:48 AM, Xeno Amess wrote:
wait, things get interesting now.
We do have such a test named ConstantDirectoryOptimalCapacity to make sure
this does not happen, but my tests still fill under the idea debugger.
Is there any specialist for help? I would dig in but it is a little
complicated.
[image: image.png]

Xeno Amess <xenoam...@gmail.com> 于2022年2月5日周六 02:39写道:

Sorry for my mistake.
After a more throughout dig in, this thing has already fixed several years
ago, at year 2015.
Issue close.

Xeno Amess <xenoam...@gmail.com> 于2022年2月4日周五 21:40写道:

Besides, is it worthy to develop a public float Math.ceil(float) function?

Xeno Amess <xenoam...@gmail.com> 于2022年2月4日周五 21:39写道:

patch applied.

Index: src/java.base/share/classes/java/lang/Module.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/lang/Module.java
b/src/java.base/share/classes/java/lang/Module.java
--- a/src/java.base/share/classes/java/lang/Module.java (revision
3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
+++ b/src/java.base/share/classes/java/lang/Module.java (revision
deeba25d15398fea8bc971ac915e348878b2c27a)
@@ -1133,7 +1133,7 @@
          boolean isBootLayer = (ModuleLayer.boot() == null);

          int numModules = cf.modules().size();
-        int cap = (int)(numModules / 0.75f + 1.0f);
+        int cap = (int)Math.ceil(numModules / 0.75f);
          Map<String, Module> nameToModule = new HashMap<>(cap);

          // to avoid repeated lookups and reduce iteration overhead, we
create
Index: src/java.base/share/classes/java/util/HashMap.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/util/HashMap.java
b/src/java.base/share/classes/java/util/HashMap.java
--- a/src/java.base/share/classes/java/util/HashMap.java (revision
3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
+++ b/src/java.base/share/classes/java/util/HashMap.java (revision
deeba25d15398fea8bc971ac915e348878b2c27a)
@@ -495,9 +495,9 @@
          int s = m.size();
          if (s > 0) {
              if (table == null) { // pre-size
-                float ft = ((float)s / loadFactor) + 1.0F;
-                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
-                         (int)ft : MAXIMUM_CAPACITY);
+                double dt = Math.ceil(s / loadFactor);
+                int t = ((dt < (double)MAXIMUM_CAPACITY) ?
+                         (int)dt : MAXIMUM_CAPACITY);
                  if (t > threshold)
                      threshold = tableSizeFor(t);
              } else {
@@ -1527,12 +1527,12 @@
          } else if (mappings == 0) {
              // use defaults
          } else if (mappings > 0) {
-            float fc = (float)mappings / lf + 1.0f;
-            int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
+            double dc = Math.ceil(mappings / lf);
+            int cap = ((dc < DEFAULT_INITIAL_CAPACITY) ?
                         DEFAULT_INITIAL_CAPACITY :
-                       (fc >= MAXIMUM_CAPACITY) ?
+                       (dc >= MAXIMUM_CAPACITY) ?
                         MAXIMUM_CAPACITY :
-                       tableSizeFor((int)fc));
+                       tableSizeFor((int)dc));
              float ft = (float)cap * lf;
              threshold = ((cap < MAXIMUM_CAPACITY && ft <
MAXIMUM_CAPACITY) ?
                           (int)ft : Integer.MAX_VALUE);
Index: src/java.base/share/classes/java/util/WeakHashMap.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/util/WeakHashMap.java
b/src/java.base/share/classes/java/util/WeakHashMap.java
--- a/src/java.base/share/classes/java/util/WeakHashMap.java (revision
3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
+++ b/src/java.base/share/classes/java/util/WeakHashMap.java (revision
deeba25d15398fea8bc971ac915e348878b2c27a)
@@ -251,7 +251,7 @@
       * @since   1.3
       */
      public WeakHashMap(Map<? extends K, ? extends V> m) {
-        this(Math.max((int) ((float)m.size() / DEFAULT_LOAD_FACTOR +
1.0F),
+        this(Math.max((int) Math.ceil(m.size() / DEFAULT_LOAD_FACTOR),
                  DEFAULT_INITIAL_CAPACITY),
               DEFAULT_LOAD_FACTOR);
          putAll(m);

Xeno Amess <xenoam...@gmail.com> 于2022年2月4日周五 18:45写道:

also find some other places have same problem.
If some of your already-in people aggree, I would create a pr, but
according to the rules seems I should wait for now.

Xeno Amess <xenoam...@gmail.com> 于2022年2月4日周五 18:42写道:

import java.lang.reflect.Array;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.Map;

public class TestMap {

     public static void main(String[] args) throws NoSuchFieldException, 
IllegalAccessException {
         HashMap<Object, Object> a = new HashMap<>();
         fill12(a);
         HashMap<Object, Object> b = new HashMap<>(12);
         fill12(b);
         HashMap<Object, Object> c = new HashMap<>(a);
         HashMap<Object, Object> d = new HashMap<>();
         d.putAll(a);
         System.out.println("a : " + getArrayLength(a));
         System.out.println("b : " + getArrayLength(b));
         System.out.println("c : " + getArrayLength(c));
         System.out.println("d : " + getArrayLength(d));
     }

     public static void fill12(Map<Object, Object> map) {
         for (int i = 0; i < 12; i++) {
             map.put(i, i);
         }
     }

     public static int getArrayLength(Map<Object, Object> map) throws 
NoSuchFieldException, IllegalAccessException {
         Field field = HashMap.class.getDeclaredField("table");
         field.setAccessible(true);
         Object table = field.get(map);
         return Array.getLength(table);
     }

}

run this and we get the output:

a : 16
b : 16
c : 32
d : 32

So I go see the codes.

/**
  * Implements Map.putAll and Map constructor.
  *
  * @param m the map
  * @param evict false when initially constructing this map, else
  * true (relayed to method afterNodeInsertion).
  */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
     int s = m.size();
     if (s > 0) {
         if (table == null) { // pre-size
             float ft = ((float)s / loadFactor) + 1.0F;
             int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                      (int)ft : MAXIMUM_CAPACITY);
             if (t > threshold)
                 threshold = tableSizeFor(t);
         } else {
             // Because of linked-list bucket constraints, we cannot
             // expand all at once, but can reduce total resize
             // effort by repeated doubling now vs later
             while (s > threshold && table.length < MAXIMUM_CAPACITY)
                 resize();
         }

         for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
             K key = e.getKey();
             V value = e.getValue();
             putVal(hash(key), key, value, false, evict);
         }
     }
}

yep I do think *((float)s / loadFactor) + 1.0F* here is wrong.

It should be *Math.ceil((float)s / loadFactor)*

So I wish to generate a pull request.

Anyone interested?


Reply via email to