leerho commented on code in PR #559:
URL: https://github.com/apache/datasketches-java/pull/559#discussion_r1600795862


##########
src/main/java/org/apache/datasketches/theta/Sketch.java:
##########
@@ -297,13 +297,27 @@ public double getLowerBound(final int numStdDev) {
    * @param numberOfEntries the actual number of entries stored with the 
CompactSketch.
    * @return the maximum number of storage bytes required for a CompactSketch 
with the given number
    * of entries.
+   * @deprecated as a public method. Use {@link #getCompactSketchMaxBytes(int) 
instead}
    */
+  @Deprecated
   public static int getMaxCompactSketchBytes(final int numberOfEntries) {
     if (numberOfEntries == 0) { return 8; }
     if (numberOfEntries == 1) { return 16; }
     return (numberOfEntries << 3) + 24;
   }
 
+  /**
+   * Returns the maximum number of storage bytes required for a CompactSketch 
given the configured
+   * number of nominal entries (power of 2).
+   * @param nomEntries <a 
href="{@docRoot}/resources/dictionary.html#nomEntries">Nominal Entries</a>
+   * @return the maximum number of storage bytes required for a CompactSketch 
with the given
+   * nomEntries.
+   */
+  public static int getCompactSketchMaxBytes(final int nomEntries) {

Review Comment:
   - I changed the name of the `HeapQuickSelectSketch` function to 
`getHashTableThreshold(..)`
   - However the `DirectQuickSelectSketchR` function with the same name is 
actually different and uses a different constant.  The Direct version is tuned 
to delay having to resize less often because the cost of resizing in off-heap 
is much more expensive than on-heap.  Nonetheless, I changed its name to 
`getOffHeapHashTableThreshold(...)`.
   
   As for the internal computation of getCompactSketchMaxBytes(...) I changed 
it to be a little different from what you proposed.  
   
   You proposed using the HeapQuickSelectSketch.getHashTableThreshold(lgK, lgK 
+ 1).  
   
   Unfortunately, the function input was nomEntries and not lgNomEntries, which 
meant I would have to convert to LgK before using the proposed function.  This 
is not a single CPU cycle operation. If you go look at the JVM source code for 
`Long.numberOfTrailingZeros(long i)` you'll see that it is not trivial.
   
   Meanwhile, we have for some time also been using `lgNomEntries` in Theta.  
(Go look at the Theta builder.) So I decided to switch the input to 
`lgNomEntries`.  
   - The intent of the functions `getHashTableThreshold(..)` and 
`getOffHeapHashTableThreshold(...)` is to allow fine tuning of the resizing 
operation the sketch, which we actually do because we tune the resizing of the 
heap sketch differently from the off-heap sketch. Both of these were 
package-private, and neither was used in test.  So I decided to make the heap 
function `private` and the off-heap version `protected` because it is used by 
the child sketch `DirectQuickSelectSketch`.
   
   I really didn't want to use these special functions for this 
`getCompactSketchMaxBytes(...)`, so I rewrote it with one of your suggestions 
and leveraging the REBUILD_THRESHOLD constant already defined.   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to