leerho commented on code in PR #706:
URL: https://github.com/apache/datasketches-java/pull/706#discussion_r2656120838


##########
src/main/javadoc/resources/dictionary.html:
##########
@@ -143,95 +143,95 @@ <h3><a name="numStdDev">Number of Standard 
Deviations</a></h3>
 getLowerBound(3) returns the estimated quantile(0.0013) of the 
distribution.<br>
 </p>
 
-<p>However, for sketches with small configured values of <i>Nominal Entries 
&lt; 4096</i> for Theta or <i>lgConfigK &lt; 12</i> for HLL, 
-the error distribution of the sketch becomes quite asymmetric and cannot be 
approximated with a Gaussian. In these cases the interpretation of 
-<i>numStdDev</i> is that of an index that returns the quantile of the sketch 
error distribution that corresponds to fractional normalized rank 
+<p>However, for sketches with small configured values of <i>Nominal Entries 
&lt; 4096</i> for Theta or <i>lgConfigK &lt; 12</i> for HLL,
+the error distribution of the sketch becomes quite asymmetric and cannot be 
approximated with a Gaussian. In these cases the interpretation of
+<i>numStdDev</i> is that of an index that returns the quantile of the sketch 
error distribution that corresponds to fractional normalized rank
 of the standard normal distribution at the specified <i>numStdDev</i>.
 
-<p>Thus, getUpperBound(1) and getLowerBound(2) represent the 68.3% confidence 
bounds, 
+<p>Thus, getUpperBound(1) and getLowerBound(2) represent the 68.3% confidence 
bounds,
 getUpperBound(2) and getLowerBound(2) represent the 95.4% confidence bounds, 
and
 getUpperBound(3) and getLowerBound(3) represent the 99.7% confidence bounds.
 <br>
 
-<p>For some sketches where the error distribution is not Gaussian, special 
mathematical approximation methods are used. 
+<p>For some sketches where the error distribution is not Gaussian, special 
mathematical approximation methods are used.
 See <a href="#accuracy">Sketch Accuracy</a>.</p>
 
 
 
 <h3><a name="quickSelectTCF">Quick Select TCF</a></h3>
 The fundamental Theta Sketch QuickSelect algorithm is described in classic 
algorithm texts by Sedgewick and
 is the Theta Choosing Function (<a href="#tcf">TCF</a>) for the QuickSelect 
Sketches.
-When the internal hash table of the sketch reaches its internal 
-<i>refresh threshold</i>, 
-the quick select algorithm is used to select the <code>(k+1)th order 
statistic</code> 
-from the hash table with a complexity of <i>O(n)</i>.  
-The value of the selected hash becomes the new 
-<a href="#thetaLong">Theta Long</a> 
-and immediately makes some number of entries in the table 
+When the internal hash table of the sketch reaches its internal
+<i>refresh threshold</i>,
+the quick select algorithm is used to select the <code>(k+1)th order 
statistic</code>
+from the hash table with a complexity of <i>O(n)</i>.
+The value of the selected hash becomes the new
+<a href="#thetaLong">Theta Long</a>
+and immediately makes some number of entries in the table
 <a href="#dirtyHash">dirty</a>.
-The <i>rebuild()</i> method is called that rebuilds the hash table removing 
the 
+The <i>rebuild()</i> method is called that rebuilds the hash table removing the
 <a href="#dirtyHash">dirty</a> values.
 Since the value of <a href="#thetaLong">Theta Long</a>
-is only changed when the hash table needs to be rebuilt, 
-the values in the hash table are only ever <a href="#dirtyHash">dirty</a> 
-briefly during the rebuild process. 
+is only changed when the hash table needs to be rebuilt,
+the values in the hash table are only ever <a href="#dirtyHash">dirty</a>
+briefly during the rebuild process.
 Thus, all the values in the hash table are always
 <a href="#validHash">valid</a> during normal updating of the sketch.
 <p>One of the benefits of using the QuickSelect algorithm for the cache 
management of the sketch is
-that the number of <a href="#validHash">valid</a> hashes ranges from 
-<a href="#nomEntries">nominal entries</a> 
-to the current <i>REBUILD_THRESHOLD</i></a>, which is nominally 15/16 * 
<i>cacheSize</i>.  
-This means that without the user forcing 
-a <i>rebuild()</i>, the sketch, on average, may be about 50% larger than 
+that the number of <a href="#validHash">valid</a> hashes ranges from
+<a href="#nomEntries">nominal entries</a>
+to the current <i>REBUILD_THRESHOLD</i></a>, which is nominally 15/16 * 
<i>cacheSize</i>.
+This means that without the user forcing
+a <i>rebuild()</i>, the sketch, on average, may be about 50% larger than
 <a href="#nomEntries">nominal entries</a>, about 19% more accurate, and 
faster.</p>
 
 <h3><a name="resizeFactor">Resize Factor</a></h3>
 For Theta Sketches, the Resize Factor is a dynamic, speed performance vs. 
memory size tradeoff.
 The sketches created on-heap and configured with a Resize Factor of &gt; X1 
start out with
-an internal hash table size that is the smallest submultiple of the the target 
-<a href="#nomEntries">Nominal Entries</a> 
-and larger than the minimum required hash table size for that sketch.  
+an internal hash table size that is the smallest submultiple of the the target
+<a href="#nomEntries">Nominal Entries</a>
+and larger than the minimum required hash table size for that sketch.
 When the sketch needs to be resized larger, then the Resize Factor is used as 
a multiplier of
-the current sketch cache array size. <br> 
-"X1" means no resizing is allowed and the sketch will be intialized at full 
size.<br>
+the current sketch cache array size. <br>
+"X1" means no resizing is allowed and the sketch will be initialized at full 
size.<br>
 "X2" means the internal cache will start very small and double in size until 
the target size is reached.<br>
-Similarly, "X4" is a factor of 4 and "X8 is a factor of 8.
+Similarly, "X4" is a factor of 4 and "X8" is a factor of 8.

Review Comment:
   Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to