This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new cb01d2e Automatic Site Publish by Buildbot
cb01d2e is described below
commit cb01d2e848a0147ad46f874b5831bbc8a65c9223
Author: buildbot <[email protected]>
AuthorDate: Mon Nov 1 04:21:37 2021 +0000
Automatic Site Publish by Buildbot
---
output/docs/Theta/ThetaSetOpsCornerCases.html | 60 ++++++++++++++++++++-------
1 file changed, 45 insertions(+), 15 deletions(-)
diff --git a/output/docs/Theta/ThetaSetOpsCornerCases.html
b/output/docs/Theta/ThetaSetOpsCornerCases.html
index ea72500..de7d173 100644
--- a/output/docs/Theta/ThetaSetOpsCornerCases.html
+++ b/output/docs/Theta/ThetaSetOpsCornerCases.html
@@ -562,7 +562,11 @@
<p>This is a new sketch where the user has set the sampling probability, <em>p
< 1.0</em> and the sketch has not been presented any data. Internally at
initialization, <em>theta</em> is set to <em>p</em>, so if <em>p = 0.5</em>,
<em>theta</em> will be set to <em>0.5</em>. Since the sketch has not seen any
data, <em>retained entries = 0</em> and <em>empty = T</em>. This is
degenerative form of a new sketch, thus its name.</p>
<h3 id="resultdegen10-0-f">ResultDegen{<1.0, 0, F}</h3>
-<p>This requires some explanation. Imagine the intersection of two estimating
sketches where the values retained in the two sketches are disjoint (i.e, no
overlap). Since the two sketches chose their internal values at random, there
remains some probability that there could be common values in an exactly
computed intersection, but it just so happens that one of the two sketches did
not select any of them in the random sampling process. Therefore, the
<em>retained entries = 0</em>. The [...]
+<p>This requires some explanation. Imagine the intersection of two estimating
sketches where the values retained in the two sketches are disjoint (i.e, no
overlap). Since the two sketches chose their internal values at random, there
remains some probability that there could be common values in an exactly
computed intersection, but it just so happens that one of the two sketches did
not select any of them in the random sampling process. Therefore, the
<em>retained entries = 0</em>.</p>
+
+<p>Even though the <em>retained entries = 0</em> the upper bound of the
estimated number of unique values in the input domain, but missed by the
sketch, can be computed statistically. It is too complex to discuss here, but
the sketch code actually performs this estimation.</p>
+
+<p>Since there is a positive probability of an intersection, <em>empty =
F</em>. This is also a degenerative case in the sense that <em>theta <
1.0</em> and <em>empty = F</em> like an estimating sketch, except that no
actual values were found in the operation, so <em>retained entries = 0</em>.</p>
<h3 id="summary-table-of-the-valid-states-of-a-sketch">Summary Table of the
Valid States of a Sketch</h3>
<p>The <em>Has Seen Data</em> column is not an independent variable, but helps
with the interpretation of the state.</p>
@@ -576,6 +580,16 @@
</ul>
<table>
+ <tbody>
+ <tr>
+ <td>The octal digit ID = ((theta == 1.0) ? 4 : 0)</td>
+ <td>((retainedEntries > 0) ? 2 : 0)</td>
+ <td>(empty ? 1 : 0);</td>
+ </tr>
+ </tbody>
+</table>
+
+<table>
<thead>
<tr>
<th style="text-align: center">Shorthand Notation</th>
@@ -666,14 +680,14 @@ The <em>Has Seen Data</em> column is not an independent
variable, but helps with
<td style="text-align: center">>0</td>
<td style="text-align: center">F</td>
<td style="text-align: center">F</td>
- <td style="text-align: center">If it has not seen data, Entries !>
0.</td>
+ <td style="text-align: center">If it has not seen data, Entries ! >
0.</td>
</tr>
<tr>
<td style="text-align: center"><1.0</td>
<td style="text-align: center">>0</td>
<td style="text-align: center">F</td>
<td style="text-align: center">F</td>
- <td style="text-align: center">If it has not seen data, Entries !>
0.</td>
+ <td style="text-align: center">If it has not seen data, Entries ! >
0.</td>
</tr>
</tbody>
</table>
@@ -912,64 +926,80 @@ The <em>Has Seen Data</em> column is not an independent
variable, but helps with
<tr>
<th style="text-align: center">Result Action</th>
<th style="text-align: center">Result Code</th>
- <th style="text-align: left">Description</th>
+ <th style="text-align: center">Used by Intersection</th>
+ <th style="text-align: center">Used By AnotB</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">New{1.0,0,T}</td>
<td style="text-align: center">1</td>
- <td style="text-align: left">New empty sketch</td>
+ <td style="text-align: center">Yes</td>
+ <td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: center">New{min,0,F}</td>
<td style="text-align: center">2</td>
- <td style="text-align: left">Min=min(thetaA,thetaB)</td>
+ <td style="text-align: center">Yes</td>
+ <td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: center">New{thA,0,F}</td>
<td style="text-align: center">3</td>
- <td style="text-align: left">thA=theta of A</td>
+ <td style="text-align: center"> </td>
+ <td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: center">SkA Min</td>
<td style="text-align: center">4</td>
- <td style="text-align: left">Trim A by minTheta</td>
+ <td style="text-align: center"> </td>
+ <td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: center">Sketch A</td>
<td style="text-align: center">5</td>
- <td style="text-align: left">Sketch A exactly</td>
+ <td style="text-align: center"> </td>
+ <td style="text-align: center">Yes</td>
</tr>
<tr>
<td style="text-align: center">Full Inter</td>
<td style="text-align: center">6</td>
- <td style="text-align: left">Full intersect</td>
+ <td style="text-align: center">Yes</td>
+ <td style="text-align: center"> </td>
</tr>
<tr>
<td style="text-align: center">Full AnotB</td>
<td style="text-align: center">7</td>
- <td style="text-align: left">Full AnotB</td>
+ <td style="text-align: center"> </td>
+ <td style="text-align: center">Yes</td>
</tr>
</tbody>
</table>
+<p>Abbreviations:<br /></p>
+
+<ul>
+ <li>min : min(thetaA,thetaB)</li>
+ <li>thA : theta of A</li>
+ <li>SkA Min : Trim Sketch A by minTheta</li>
+</ul>
+
<p>Note that the results of a <em>Full Intersect</em> or a <em>Full AnotB</em>
will require further interpretation of the resulting state.
For example, if the resulting sketch is <em>{1.0,0,?}</em>, then a
<em>New{1.0,0,T}</em> is returned.
If the resulting sketch is <em>{<1.0,0,?}</em> then a
<em>ResultDegen{<1.0,0,F}</em> is returned.<br />
Otherwise, the sketch returned will be an estimating or exact <em>{theta,
>0, F}</em>.</p>
<h2 id="testing">Testing</h2>
-<p>The above information is encoded as a model into the special class
<em>org.apache.datasketches.SetOperationCornerCases.java</em>. This class is
made up of enums and static methods to quickly determine for a sketch what
actions to take based on the state of the input arguments. This model is
independent of the implementation of the Theta Sketch, whether the set
operation is performed as a Theta Sketch, or a Tuple Sketch and when translated
can be used in other languages as well.</p>
+<p>The above information is encoded as a model into the special class <em><a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.SetOperationCornerCases.java">org.apache.datasketches.SetOperationsCornerCases</a></em>.
This class is made up of enums and static methods to quickly determine for a
sketch what actions to take based on the state of the input arguments. This
model is independent of the implementation of the Theta Sketch, whether [...]
<p>Before this model was put to use an extensive set of tests was designed to
test any potential implementation against this model. These tests are slightly
different for the Tuple Sketch than the Theta Sketch because the Tuple Sketch
has more combinations to test, but the model is the same.</p>
<ul>
- <li>The tests for the Theta Sketch can be found in the class
<em>org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest.java</em></li>
- <li>The tests for the Tuple Sketch can be found in the class
<em>org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest.java</em></li>
+ <li>The tests for the Theta Sketch can be found in the class <em><a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest.java">org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest</a></em></li>
+ <li>The tests for the Tuple Sketch can be found in the class <em><a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest.java">org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest</a></em></li>
</ul>
-<p>The details of how this mode is used in run-time code can be found in the
class <em>org.apache.datasketches.tuple.AnotB.java</em>.</p>
+<p>The details of how this mode is used in run-time code can be found in the
class <em><a
href="https://github.com/apache/datasketches-java/blob/master/src/main/java/org.apache.datasketches.tuple.AnotB.java">org.apache.datasketches.tuple.AnotB.java</a></em>.</p>
</div> <!-- End content -->
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]