This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2836280 Automatic Site Publish by Buildbot
2836280 is described below
commit 283628005bc43dbaa63c569140e19cfa0d1bbc4e
Author: buildbot <[email protected]>
AuthorDate: Sat Oct 30 06:04:38 2021 +0000
Automatic Site Publish by Buildbot
---
output/docs/Theta/ThetaSetOpsCornerCases.html | 998 ++++++++++++++++++++++++++
output/docs/Theta/ThetaSetOpsCornerCases.md | 143 ----
2 files changed, 998 insertions(+), 143 deletions(-)
diff --git a/output/docs/Theta/ThetaSetOpsCornerCases.html
b/output/docs/Theta/ThetaSetOpsCornerCases.html
new file mode 100644
index 0000000..ea72500
--- /dev/null
+++ b/output/docs/Theta/ThetaSetOpsCornerCases.html
@@ -0,0 +1,998 @@
+<!DOCTYPE html>
+<!-- Start _layouts/doc_page.html-->
+<html lang="en">
+
+<head>
+<!-- Start _include/site_head.html -->
+<meta charset="UTF-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="description" content="">
+<meta name="author" content="datasketches">
+
+<title>DataSketches | </title>
+
+<link rel="shortcut icon" href="/img/favicon.png">
+
+<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css">
+<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css">
+
+<link
href='https://fonts.googleapis.com/css?family=Open+Sans+Condensed:300,700,300italic|Open+Sans:300italic,400italic,600italic,400,300,600'
+ rel='stylesheet' type='text/css'>
+
+<link rel="stylesheet" href="/css/main.css">
+<link rel="stylesheet" href="/css/header.css">
+<link rel="stylesheet" href="/css/footer.css">
+<link rel="stylesheet" href="/css/syntax.css">
+<link rel="stylesheet" href="/css/docs.css">
+
+
+<script type="text/javascript"
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML-full">
+</script>
+<script src="https://code.jquery.com/jquery.min.js"></script>
+<script
src="https://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/js/bootstrap.min.js"></script>
+<!-- End _include/site_head.html -->
+</head>
+
+<body>
+<!-- Start _include/nav_bar.html -->
+<div class="navbar navbar-inverse navbar-static-top ds-nav">
+ <div class="container">
+ <div class="navbar-header">
+ <button type="button" class="navbar-toggle" data-toggle="collapse"
data-target=".navbar-collapse">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <a href="/" style="padding-top: 0px; padding-bottom: 0px;">
+ <span class="ds-small-h-logo"></span></a>
+ </div>
+ <div class="navbar-collapse collapse">
+ <ul class="nav navbar-nav navbar-right">
+ <li>
+ <a href="/docs/Background/TheChallenge.html">
+ <span class="fa fa-info-circle"></span> DOCUMENTATION</a>
+ </li>
+ <li>
+ <a href="/docs/Community/Downloads.html">
+ <span class="fa fa-download"></span> DOWNLOAD</a>
+ </li>
+ <!--
+ <li>
+ <a href="/docs/Architecture/Components.html">
+ <span class="fa fa-github"></span> GITHUB</a>
+ </li>
+ -->
+ <li>
+ <a href="/docs/Community/Research.html">
+ <span class="fa fa-paper-plane"></span> RESEARCH</a>
+ </li>
+ <li>
+ <a href="/docs/Community/index.html" style="padding-top: 0;
padding-bottom: 0;">
+ <img class="ds-small-man"
src="/img/datasketches-ManWhite.svg"/>COMMUNITY</a>
+ </li>
+ <li>
+ <ul class="nav navbar-nav navbar-right ds-nav">
+ <li class="dropdown ds-nav" >
+ <a href="#" class="dropdown-toggle" data-toggle="dropdown"
role="button" aria-haspopup="true" aria-expanded="false" style="padding-top: 0;
padding-bottom: 0;"><img class="apache-logo" src="/img/feather.svg"/>Apache
<span class="caret"></span></a>
+ <ul class="dropdown-menu ds-nav">
+ <li><a href="https://www.apache.org/"
target="_blank">Foundation</a></li>
+ <li><a href="https://www.apache.org/events/current-event"
target="_blank">Events</a></li>
+ <li><a href="https://www.apache.org/licenses/"
target="_blank">License</a></li>
+ <li><a href="https://www.apache.org/foundation/thanks.html"
target="_blank">Thanks</a></li>
+ <li><a href="https://www.apache.org/security/"
target="_blank">Security</a></li>
+ <li><a
href="https://www.apache.org/foundation/sponsorship.html"
target="_blank">Sponsorship</a></li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+ </div>
+</div>
+<!-- End _include/nav_bar.html -->
+
+<!-- Start _include/javadocs.html -->
+<div class="ds-header">
+ <div class="container">
+ <h4>API Snapshots:
+ <a href="/api/java/snapshot/apidocs/index.html">Java Core</a>,
+ <a href="/api/memory/snapshot/apidocs/index.html">Memory</a>,
+ <a href="/api/pig/snapshot/apidocs/index.html">Pig</a>,
+ <a href="/api/hive/snapshot/apidocs/index.html">Hive</a>,
+ </h4>
+ </div>
+</div>
+<!-- End _include/javadocs.html -->
+
+ <div class="container">
+ <div class="row">
+ <!-- Start ToC Block -->
+ <div class="col-md-3">
+ <div class="searchbox" style="position:relative">
+ <gcse:searchbox-only></gcse:searchbox-only>
+ </div>
+<!-- Start _includes/toc.html -->
+<!-- Computer Generated File, Do Not Edit! -->
+<link rel="stylesheet" href="/css/toc.css">
+<div id="toc" class="nav toc hidden-print">
+
+ <p id="background">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_background">Background</a>
+ </p>
+ <div class="collapse" id="collapse_background">
+ <li><a href="/docs/Background/TheChallenge.html">•The Challenge</a></li>
+ <li><a href="/docs/Background/SketchOrigins.html">•Sketch Origins</a></li>
+ <li><a href="/docs/Background/SketchElements.html">•Sketch
Elements</a></li>
+ <li><a href="/docs/Background/Presentations.html">•Presentations</a></li>
+ <li><a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/DataSketches_deck.pdf">•Overview
Slide Deck</a></li>
+ </div>
+
+ <p id="architecture-and-design">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_architecture_and_design">Architecture And Design</a>
+ </p>
+ <div class="collapse" id="collapse_architecture_and_design">
+ <li><a href="/docs/Architecture/MajorSketchFamilies.html">•The Major
Sketch Families</a></li>
+ <li><a href="/docs/Architecture/LargeScale.html">•Large Scale
Computing</a></li>
+ <li><a href="/docs/Architecture/KeyFeatures.html">•Key Features</a></li>
+ <li><a href="/docs/Architecture/SketchFeaturesMatrix.html">•Sketch
Features Matrix</a></li>
+ <li><a href="/docs/Architecture/Components.html">•Components</a></li>
+ <li><a href="/docs/Architecture/SketchesByComponent.html">•Sketches by
Component</a></li>
+ <li><a href="/docs/Architecture/SketchCriteria.html">•Sketch
Criteria</a></li>
+
+ <p id="memory-component">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_memory_component">Memory Component</a>
+ </p>
+ <div class="collapse" id="collapse_memory_component">
+ <li><a href="/docs/Memory/MemoryComponent.html">•Memory Componet</a></li>
+ <li><a href="/docs/Memory/MemoryPerformance.html">•Memory Component
Performance</a></li>
+ </div>
+ <li><a href="/docs/Architecture/OrderSensitivity.html">•Notes on Order
Sensitivity</a></li>
+ <li><a href="/docs/Architecture/Concurrency.html">•Notes on
Concurrency</a></li>
+ </div>
+
+ <p id="sketch-families">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_sketch_families">Sketch Families</a>
+ </p>
+ <div class="collapse" id="collapse_sketch_families">
+
+ <p id="distinct-counting">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_distinct_counting">Distinct Counting</a>
+ </p>
+ <div class="collapse" id="collapse_distinct_counting">
+ <li><a href="/docs/DistinctCountFeaturesMatrix.html">•Features
Matrix</a></li>
+ <li><a href="/docs/DistinctCountMeritComparisons.html">•Figures-of-Merit
Comparison</a></li>
+
+ <p id="cpc-sketches">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_cpc_sketches">CPC Sketches</a>
+ </p>
+ <div class="collapse" id="collapse_cpc_sketches">
+ <li><a href="/docs/CPC/CPC.html">•CPC Sketch</a></li>
+ <li><a href="/docs/CPC/CpcPerformance.html">•CPC Sketch
Performance</a></li>
+
+ <p id="cpc-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_cpc_examples">CPC Examples</a>
+ </p>
+ <div class="collapse" id="collapse_cpc_examples">
+ <li><a href="/docs/CPC/CpcJavaExample.html">•CPC Sketch Java
Example</a></li>
+ <li><a href="/docs/CPC/CpcCppExample.html">•CPC Sketch C++
Example</a></li>
+ <li><a href="/docs/CPC/CpcPigExample.html">•CPC Sketch Pig
UDFs</a></li>
+ <li><a href="/docs/CPC/CpcHiveExample.html">•CPC Sketch Hive
UDFs</a></li>
+ </div>
+ </div>
+
+ <p id="hyperloglog-sketches">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_hyperloglog_sketches">HyperLogLog Sketches</a>
+ </p>
+ <div class="collapse" id="collapse_hyperloglog_sketches">
+ <li><a href="/docs/HLL/HLL.html">•HLL Sketch</a></li>
+ <li><a href="/docs/HLL/HllMap.html">•HLL Map Sketch</a></li>
+
+ <p id="hll-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_hll_examples">HLL Examples</a>
+ </p>
+ <div class="collapse" id="collapse_hll_examples">
+ <li><a href="/docs/HLL/HllJavaExample.html">•HLL Sketch Java
Example</a></li>
+ <li><a href="/docs/HLL/HllCppExample.html">•HLL Sketch C++
Example</a></li>
+ <li><a href="/docs/HLL/HllPigUDFs.html">•HLL Sketch Pig UDFs</a></li>
+ <li><a href="/docs/HLL/HllHiveUDFs.html">•HLL Sketch Hive
UDFs</a></li>
+ </div>
+
+ <p id="hll-studies">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_hll_studies">HLL Studies</a>
+ </p>
+ <div class="collapse" id="collapse_hll_studies">
+ <li><a href="/docs/HLL/HllPerformance.html">•HLL Sketch
Performance</a></li>
+ <li><a href="/docs/HLL/Hll_vs_CS_Hllpp.html">•HLL vs Clearspring
HLL++</a></li>
+ <li><a
href="/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html">•HLL Sketch vs Druid
HyperLogLogCollector</a></li>
+ </div>
+ </div>
+
+ <p id="theta-sketches">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_theta_sketches">Theta Sketches</a>
+ </p>
+ <div class="collapse" id="collapse_theta_sketches">
+ <li><a href="/docs/Theta/ThetaSketchFramework.html">•Theta Sketch
Framework</a></li>
+
+ <p id="theta-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_theta_examples">Theta Examples</a>
+ </p>
+ <div class="collapse" id="collapse_theta_examples">
+ <li><a href="/docs/Theta/ConcurrentThetaSketch.html">•Concurrent
Theta Sketch</a></li>
+ <li><a href="/docs/Theta/ThetaJavaExample.html">•Theta Sketch Java
Example</a></li>
+ <li><a href="/docs/Theta/ThetaSparkExample.html">•Theta Sketch Spark
Example</a></li>
+ <li><a href="/docs/Theta/ThetaPigUDFs.html">•Theta Sketch Pig
UDFs</a></li>
+ <li><a href="/docs/Theta/ThetaHiveUDFs.html">•Theta Sketch Hive
UDFs</a></li>
+ </div>
+
+ <p id="kmv-tutorial">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_kmv_tutorial">KMV Tutorial</a>
+ </p>
+ <div class="collapse" id="collapse_kmv_tutorial">
+ <li><a href="/docs/Theta/InverseEstimate.html">•The Inverse
Estimate</a></li>
+ <li><a href="/docs/Theta/KMVempty.html">•Empty Sketch</a></li>
+ <li><a href="/docs/Theta/KMVfirstEst.html">•First Estimator</a></li>
+ <li><a href="/docs/Theta/KMVbetterEst.html">•Better
Estimator</a></li>
+ <li><a href="/docs/Theta/KMVrejection.html">•Rejection Rules</a></li>
+ <li><a href="/docs/Theta/KMVupdateVkth.html">•Update V(kth)
Rule</a></li>
+ </div>
+
+ <p id="set-operations-and-p-sampling">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_set_operations_and_p-sampling">Set Operations and P-sampling</a>
+ </p>
+ <div class="collapse" id="collapse_set_operations_and_p-sampling">
+ <li><a href="/docs/Theta/ThetaSketchSetOps.html">•Set
Operations</a></li>
+ <li><a href="/docs/Theta/ThetaSetOpsCornerCases.html">•Model & Test
Set Operations</a></li>
+ <li><a
href="/docs/Theta/ThetaPSampling.html">•<i>p</i>-Sampling</a></li>
+ </div>
+
+ <p id="accuracy">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_accuracy">Accuracy</a>
+ </p>
+ <div class="collapse" id="collapse_accuracy">
+ <li><a href="/docs/Theta/ThetaAccuracy.html">•Basic Accuracy</a></li>
+ <li><a href="/docs/Theta/ThetaAccuracyPlots.html">•Accuracy
Plots</a></li>
+ <li><a href="/docs/Theta/ThetaErrorTable.html">•Relative Error
Table</a></li>
+ <li><a href="/docs/Theta/ThetaSketchSetOpsAccuracy.html">•SetOp
Accuracy</a></li>
+ <li><a href="/docs/Theta/AccuracyOfDifferentKUnions.html">•Unions
With Different k</a></li>
+ </div>
+
+ <p id="size">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_size">Size</a>
+ </p>
+ <div class="collapse" id="collapse_size">
+ <li><a href="/docs/Theta/ThetaSize.html">•Theta Sketch Size</a></li>
+ </div>
+
+ <p id="speed">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_speed">Speed</a>
+ </p>
+ <div class="collapse" id="collapse_speed">
+ <li><a href="/docs/Theta/ThetaUpdateSpeed.html">•Update
Speed</a></li>
+ <li><a href="/docs/Theta/ThetaMergeSpeed.html">•Merge Speed</a></li>
+ </div>
+
+ <p id="theta-sketch-theory">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_theta_sketch_theory">Theta Sketch Theory</a>
+ </p>
+ <div class="collapse" id="collapse_theta_sketch_theory">
+ <li><a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchFramework.pdf">•Theta
Sketch Framework (PDF)</a></li>
+ <li><a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/ThetaSketchEquations.pdf">•Theta
Sketch Equations (PDF)</a></li>
+ <li><a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/DataSketches.pdf">•DataSketches
(PDF)</a></li>
+ <li><a href="/docs/Theta/ThetaConfidenceIntervals.html">•Confidence
Intervals Notes</a></li>
+ <li><a href="/docs/Theta/ThetaMergingAlgorithm.html">•Merging
Algorithm Notes</a></li>
+ <li><a href="/docs/Theta/ThetaReferences.html">•Theta
References</a></li>
+ </div>
+ </div>
+
+ <p id="tuple-sketches">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_tuple_sketches">Tuple Sketches</a>
+ </p>
+ <div class="collapse" id="collapse_tuple_sketches">
+ <li><a href="/docs/Tuple/TupleOverview.html">•Tuple Overview</a></li>
+
+ <p id="tuple-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_tuple_examples">Tuple Examples</a>
+ </p>
+ <div class="collapse" id="collapse_tuple_examples">
+ <li><a href="/docs/Tuple/TupleJavaExample.html">•Tuple Java
Example</a></li>
+ <li><a href="/docs/Tuple/TupleEngagementExample.html">•Tuple
Engagement Example</a></li>
+ <li><a href="/docs/Tuple/TuplePigUDFs.html">•Tuple Pig UDFs</a></li>
+ <li><a href="/docs/Tuple/TupleHiveUDFs.html">•Tuple Hive
UDFs</a></li>
+ </div>
+ </div>
+ </div>
+
+ <p id="most-frequent">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_most_frequent">Most Frequent</a>
+ </p>
+ <div class="collapse" id="collapse_most_frequent">
+ <li><a href="/docs/Frequency/FrequencySketchesOverview.html">•Frequency
Sketches Overview</a></li>
+
+ <p id="frequent-item-sketches">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_frequent_item_sketches">Frequent Item Sketches</a>
+ </p>
+ <div class="collapse" id="collapse_frequent_item_sketches">
+ <li><a href="/docs/Frequency/FrequentItemsOverview.html">•Frequent
Items Overview</a></li>
+ <li><a href="/docs/Frequency/FrequentItemsErrorTable.html">•Frequent
Items Error Table</a></li>
+ <li><a href="/docs/Frequency/FrequentItemsReferences.html">•Frequent
Items References</a></li>
+ <li><a href="/docs/Frequency/FrequentItemsPerformance.html">•Frequent
Items Performance</a></li>
+
+ <p id="most-frequent-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_most_frequent_examples">Most Frequent Examples</a>
+ </p>
+ <div class="collapse" id="collapse_most_frequent_examples">
+ <li><a
href="/docs/Frequency/FrequentItemsJavaExample.html">•Frequent Items Java
Example</a></li>
+ <li><a href="/docs/Frequency/FrequentItemsCppExample.html">•Frequent
Items C++ Example</a></li>
+ <li><a href="/docs/Frequency/FrequentItemsPigUDFs.html">•Frequent
Items Pig UDFs</a></li>
+ <li><a href="/docs/Frequency/FrequentItemsHiveUDFs.html">•Frequent
Items Hive UDFs</a></li>
+ </div>
+ </div>
+
+ <p id="frequent-distinct-sketches">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_frequent_distinct_sketches">Frequent Distinct Sketches</a>
+ </p>
+ <div class="collapse" id="collapse_frequent_distinct_sketches">
+ <li><a
href="/docs/Frequency/FrequentDistinctTuplesSketch.html">•Frequent Distinct
Tuples Sketch</a></li>
+ </div>
+ </div>
+
+ <p id="quantiles-and-histograms">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_quantiles_and_histograms">Quantiles And Histograms</a>
+ </p>
+ <div class="collapse" id="collapse_quantiles_and_histograms">
+ <li><a href="/docs/Quantiles/Definitions.html">•Quantiles
Definitions</a></li>
+ <li><a href="/docs/Quantiles/QuantilesOverview.html">•Quantiles
Overview</a></li>
+ <li><a href="/docs/KLL/KLLSketch.html">•KLL Floats sketch</a></li>
+ <li><a href="/docs/REQ/ReqSketch.html">•REQ Floats sketch</a></li>
+ <li><a href="/docs/Quantiles/OrigQuantilesSketch.html">•Original
QuantilesSketch</a></li>
+
+ <p id="quantiles-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_quantiles_examples">Quantiles Examples</a>
+ </p>
+ <div class="collapse" id="collapse_quantiles_examples">
+ <li><a href="/docs/Quantiles/QuantilesJavaExample.html">•Quantiles
Sketch Java Example</a></li>
+ <li><a href="/docs/KLL/KLLCppExample.html">•KLL Quantiles Sketch C++
Example</a></li>
+ <li><a href="/docs/Quantiles/QuantilesPigUDFs.html">•Quantiles Sketch
Pig UDFs</a></li>
+ <li><a href="/docs/Quantiles/QuantilesHiveUDFs.html">•Quantiles Sketch
Hive UDFs</a></li>
+ </div>
+
+ <p id="quantiles-studies">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_quantiles_studies">Quantiles Studies</a>
+ </p>
+ <div class="collapse" id="collapse_quantiles_studies">
+ <li><a href="/docs/QuantilesStudies/KllSketchVsTDigest.html">•KLL
sketch vs t-digest</a></li>
+ <li><a
href="/docs/QuantilesStudies/DruidApproxHistogramStudy.html">•Druid Approximate
Histogram</a></li>
+ <li><a href="/docs/QuantilesStudies/MomentsSketchStudy.html">•Moments
Sketch Study</a></li>
+ <li><a
href="/docs/QuantilesStudies/QuantilesStreamAStudy.html">•Quantiles StreamA
Study</a></li>
+ <li><a href="/docs/QuantilesStudies/ExactQuantiles.html">•Exact
Quantiles for Studies</a></li>
+ </div>
+
+ <p id="quantiles-sketch-theory">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_quantiles_sketch_theory">Quantiles Sketch Theory</a>
+ </p>
+ <div class="collapse" id="collapse_quantiles_sketch_theory">
+ <li><a
href="https://github.com/apache/datasketches-website/tree/master/docs/pdf/Quantiles_KLL.pdf">•Optimal
Quantile Approximation in Streams</a></li>
+ <li><a href="/docs/Quantiles/QuantilesReferences.html">•Quantiles
References</a></li>
+ </div>
+ </div>
+
+ <p id="sampling">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_sampling">Sampling</a>
+ </p>
+ <div class="collapse" id="collapse_sampling">
+ <li><a href="/docs/Sampling/ReservoirSampling.html">•Reservoir
Sampling</a></li>
+ <li><a
href="/docs/Sampling/ReservoirSamplingPerformance.html">•Reservoir Sampling
Performance</a></li>
+ <li><a href="/docs/Sampling/VarOptSampling.html">•VarOpt
Sampling</a></li>
+
+ <p id="sampling-examples">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_sampling_examples">Sampling Examples</a>
+ </p>
+ <div class="collapse" id="collapse_sampling_examples">
+ <li><a href="/docs/Sampling/ReservoirSamplingJava.html">•Reservoir
Sampling Java Example</a></li>
+ <li><a href="/docs/Sampling/ReservoirSamplingPigUDFs.html">•Reservoir
Sampling Pig UDFs</a></li>
+ <li><a href="/docs/Sampling/VarOptSamplingJava.html">•VarOpt Sampling
Java Example</a></li>
+ <li><a href="/docs/Sampling/VarOptPigUDFs.html">•VarOpt Sampling Pig
UDFs</a></li>
+ </div>
+ </div>
+ </div>
+
+ <p id="system-integrations">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_system_integrations">System Integrations</a>
+ </p>
+ <div class="collapse" id="collapse_system_integrations">
+ <li><a href="/docs/SystemIntegrations/ApacheDruidIntegration.html">•Using
Sketches in ApacheDruid</a></li>
+ <li><a href="/docs/SystemIntegrations/ApacheHiveIntegration.html">•Using
Sketches in Apache Hive</a></li>
+ <li><a href="/docs/SystemIntegrations/ApachePigIntegration.html">•Using
Sketches in Apache Pig</a></li>
+ <li><a href="/docs/SystemIntegrations/PostgreSQLIntegration.html">•Using
Sketches in PostgreSQL</a></li>
+ </div>
+
+ <p id="community">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_community">Community</a>
+ </p>
+ <div class="collapse" id="collapse_community">
+ <li><a href="/docs/Community/index.html">•Community</a></li>
+ <li><a href="/docs/Community/Downloads.html">•Downloads</a></li>
+ <li><a href="/docs/Community/NewCommitterProcess.html">•Committer
Process</a></li>
+ <li><a href="/docs/Community/ReleaseProcessForCppComponents.html">•Release
Process For CPP Components</a></li>
+ <li><a
href="/docs/Community/ReleaseProcessForJavaComponents.html">•Release Process
For Java Components</a></li>
+ <li><a href="/docs/Community/Transitioning.html">•Transitioning from prior
GitHub Site</a></li>
+ <li><a href="/docs/Community/WhoUses.html">•Who Uses</a></li>
+ </div>
+
+ <p id="research">
+ <a data-toggle="collapse" class="menu collapsed"
href="#collapse_research">Research</a>
+ </p>
+ <div class="collapse" id="collapse_research">
+ <li><a href="/docs/Community/Research.html">•Research</a></li>
+ </div>
+</div>
+<!-- End _includes/toc.html -->
+
+
+<!-- Start _includes/tocScript.html -->
+<script>
+ (function () {
+
+ var findLineItem = function (path) {
+ return document.querySelector(`#toc [href="${path}"]`);
+ };
+
+ function findNavItem(path) {
+ return document.querySelector(`.nav [href="${path}"]`);
+ }
+
+ var highlighLineItem = function (element) {
+ element.classList.add('highlight');
+ };
+
+ var checkHasClass = function (element, className) {
+ return element.className.split(' ').find(function (item) { return item
=== className || '' })
+ }
+
+ var findAllCollapseParents = function (element) {
+ var collapseMenus = [];
+ var elementPointer = element;
+ while (elementPointer !== document.body) {
+ if (checkHasClass(elementPointer, 'collapse')) {
+ collapseMenus.push(elementPointer);
+ }
+ elementPointer = elementPointer.parentElement
+ }
+ return collapseMenus
+ };
+
+ var openMenuItem = function (element) {
+ // $(element).collapse('show') would start a transition, adding `in`
class instead.
+ element.classList.add('in');
+ };
+
+ var openAllFromList = function (elementList) {
+ elementList.forEach(openMenuItem);
+ };
+
+ var highlightAndOpenMenu = function () {
+ // Highlight & expand nav item in the TOC
+ var currentLineItem = findLineItem(document.location.pathname);
+ highlighLineItem(currentLineItem);
+ openAllFromList(findAllCollapseParents(currentLineItem));
+
+ // Highlight nav item in top navigation
+ highlighLineItem(findNavItem(document.location.pathname));
+ };
+
+ $(highlightAndOpenMenu);
+
+ }());
+</script>
+<!-- End _includes/tocScript.html -->
+
+ </div>
+ <!-- End ToC Block -->
+ <div class="col-md-9 doc-content">
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<h1 id="theta-sketch-and-tuple-sketch-set-operation-corner-cases">Theta Sketch
and Tuple Sketch Set Operation Corner Cases</h1>
+
+<p>The <em>TupleSketch</em> is an extension of the <em>ThetaSketch</em> and
both are part of the <em>Theta Sketch Framework</em><sup>1</sup>. In this
document, the term <em>Theta</em> (upper case) when referencing sketches will
refer to both the <em>ThetaSketch</em> and the <em>TupleSketch</em>. This is
not to be confused with the term <em>theta</em> (lower case), which refers to
the sketch variable that tracks the sampling probability of the sketch.</p>
+
+<p>Because Theta sketches provide the set operations of <em>intersection</em>
and <em>difference</em> (<em>A and not B</em> or just <em>A not B</em>), a
number of interesting corner cases arise that require some analysis to
determine how the code should handle them.</p>
+
+<p>Theta sketches track three key variables in addition to retained data:</p>
+
+<ul>
+ <li>
+ <p><em>theta</em>: This is the current sampling probability of the sketch
and mathematically expressed as a 64-bit, double floating value between 0.0 and
1.0. However, internally in the sketch, this value is expressed as a 64-bit,
signed, long integer (usually identified as <em>thetaLong</em> in the code),
where the maximum positive value (<em>Long.MAX_VALUE</em>) is interpreted as
the double 1.0. In this document we will only refer to the mathematical
quantity <em>theta</em>.</p>
+ </li>
+ <li>
+ <p><em>retained entries</em> or <em>count</em>: This is the number of hash
values currently retained in the sketch. It can never be less than zero.</p>
+ </li>
+ <li>
+ <p><em>empty</em>:</p>
+ <ul>
+ <li>By definition, if <em>empty = true</em>, the number of <em>retained
entries</em> must be zero. However, the value of <em>theta</em> can be 1.0 or
less-than 1.0.</li>
+ <li>If <em>empty</em> = false, the <em>retained entries</em> can be zero
or greater than zero, and <em>theta</em> can be 1.0 or less than 1.0.</li>
+ </ul>
+ </li>
+</ul>
+
+<p>We have developed a short hand notation for these three variables to record
their state as <em>{theta, retained entries, empty}</em>. When analyzing the
corner cases of the set operations, we only need to know whether <em>theta</em>
is 1.0 or less than 1.0, <em>retained entries</em> is zero or greater than
zero, and <em>empty</em> is true or false. These are further abbreviated as</p>
+
+<ul>
+ <li><em>theta</em> can be <em>1.0</em> or <em><1.0</em></li>
+ <li><em>retained entries</em> can be either <em>0</em> or <em>>0</em></li>
+ <li><em>empty</em> can be either <em>T</em> or <em>F</em></li>
+</ul>
+
+<p>Each of the above three variables can be represented as boolean variable.
Thus, there are 8 possible combinations of the three variables.</p>
+
+<p><sup>1</sup> Anirban Dasgupta, Kevin J. Lang, Lee Rhodes, and Justin
Thaler. A framework for estimating stream expression cardinalities. In
*EDBT/ICDT Proceedings ‘16 *, pages 6:1–6:17, 2016.</p>
+
+<h2 id="valid-states-of-a-sketch">Valid States of a Sketch</h2>
+
+<p>Of the eight possible combinations of the three variables and using the
above notation, there are five valid states of a <em>Theta</em> sketch.</p>
+
+<h3 id="new10-0-t">New{1.0, 0, T}</h3>
+<p>When a new sketch is created, <em>theta</em> is set to 1.0, <em>retained
entries</em> is set to zero, and <em>empty</em> is true. This state can also
occur as the result of a set operation, where the operation creates a new
sketch to potentially load result data into the sketch but there is no data to
load into the sketch. So it effectively returns a new sketch that has been
untouched and unaffected by the input arguments to the set operation.</p>
+
+<h3 id="exact10-0-f">Exact{1.0, >0, F}</h3>
+<p>All of the <em>Theta</em> sketches have an input buffer that is effectively
a list of items received by the sketch. If the number of unique input values
does not exceed the size of that buffer, the sketch is in <em>exact</em> mode.
There is no probabilistic estimation involved so <em>theta = 1.0</em>, which
indicates that all unique values presented to the sketch are in the buffer.
<em>retained entries</em> is the count of those values in the buffer, and the
sketch is clearly not <em [...]
+
+<h3 id="estimation10-0-f">Estimation{<1.0, >0, F}</h3>
+<p>Here, the number of inputs to the sketch have exceeded the size of the
input buffer, so the sketch must start choosing what values to retain in the
sketch and starts reducing the value of <em>theta</em> accordingly. <em>theta
< 1.0</em>, <em>retained entries > 0</em>, and <em>empty = F</em>.</p>
+
+<h3 id="newdegen10-0-t">NewDegen{<1.0, 0, T}</h3>
+<p>This is a new sketch where the user has set the sampling probability, <em>p
< 1.0</em> and the sketch has not been presented any data. Internally at
initialization, <em>theta</em> is set to <em>p</em>, so if <em>p = 0.5</em>,
<em>theta</em> will be set to <em>0.5</em>. Since the sketch has not seen any
data, <em>retained entries = 0</em> and <em>empty = T</em>. This is
degenerative form of a new sketch, thus its name.</p>
+
+<h3 id="resultdegen10-0-f">ResultDegen{<1.0, 0, F}</h3>
+<p>This requires some explanation. Imagine the intersection of two estimating
sketches where the values retained in the two sketches are disjoint (i.e, no
overlap). Since the two sketches chose their internal values at random, there
remains some probability that there could be common values in an exactly
computed intersection, but it just so happens that one of the two sketches did
not select any of them in the random sampling process. Therefore, the
<em>retained entries = 0</em>. The [...]
+
+<h3 id="summary-table-of-the-valid-states-of-a-sketch">Summary Table of the
Valid States of a Sketch</h3>
+<p>The <em>Has Seen Data</em> column is not an independent variable, but helps
with the interpretation of the state.</p>
+
+<p>We can assign a single octal digit ID to each state where</p>
+
+<ul>
+ <li><em>theta = 1.0 := 4, else 0</em></li>
+ <li><em>retained entries >0 := 2, else 0</em></li>
+ <li><em>empty = true := 1, else 0</em></li>
+</ul>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: center">Shorthand Notation</th>
+ <th style="text-align: center">theta</th>
+ <th style="text-align: center">retained entries</th>
+ <th style="text-align: center">empty</th>
+ <th style="text-align: center">Has Seen Data</th>
+ <th style="text-align: center">ID</th>
+ <th style="text-align: center">Comments</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1.0</td>
+ <td style="text-align: center">0</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">5</td>
+ <td style="text-align: center">New Sketch, p=1.0 (default)</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">1.0</td>
+ <td style="text-align: center">>0</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">6</td>
+ <td style="text-align: center">Exact Mode</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center"><1.0</td>
+ <td style="text-align: center">>0</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">2</td>
+ <td style="text-align: center">Estimation Mode</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">NewDegen {<1.0,0,T}<sup>2</sup></td>
+ <td style="text-align: center"><1.0</td>
+ <td style="text-align: center">0</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">1</td>
+ <td style="text-align: center">New Sketch, user sets p<1.0</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}<sup>3</sup></td>
+ <td style="text-align: center"><1.0</td>
+ <td style="text-align: center">0</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">0</td>
+ <td style="text-align: center">Valid Intersect or AnotB result</td>
+ </tr>
+ </tbody>
+</table>
+
+<p><sup>2</sup> <em>New Degenerate</em>: New Empty Sketch where the user sets
<em>p < 1.0</em>. This can be safely reinterpreted as {1.0,0,T} because it
has not seen any data.<br />
+<sup>3</sup> <em>Result Degenerate</em>: Can appear as a result of a an
Intersection or AnotB of certain combination of sketches.</p>
+
+<h2 id="invalid-states-of-a-sketch">Invalid States of a Sketch</h2>
+<p>The remaining three combinations of the variables represent internal errors
and should not occur.
+The <em>Has Seen Data</em> column is not an independent variable, but helps
with the interpretation of the state.</p>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: center">Theta</th>
+ <th style="text-align: center">Retained Entries</th>
+ <th style="text-align: center">Empty Flag</th>
+ <th style="text-align: center">Has Seen Data</th>
+ <th style="text-align: center">Comments</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: center">1.0</td>
+ <td style="text-align: center">0</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">T</td>
+ <td style="text-align: center">If it has seen data, Theta != 1.0 AND
Entries = 0.</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">1.0</td>
+ <td style="text-align: center">>0</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">If it has not seen data, Entries !>
0.</td>
+ </tr>
+ <tr>
+ <td style="text-align: center"><1.0</td>
+ <td style="text-align: center">>0</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">F</td>
+ <td style="text-align: center">If it has not seen data, Entries !>
0.</td>
+ </tr>
+ </tbody>
+</table>
+
+<h2 id="combinations-of-states-of-two-sketches">Combinations of States of Two
Sketches</h2>
+<p>Each sketch can have 5 valid states, which means we can have 25
combinations of states of two sketches as expanded in the following table.</p>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: center">ID</th>
+ <th style="text-align: center">Sketch A</th>
+ <th style="text-align: center">Sketch B</th>
+ <th style="text-align: center">Intersection Result</th>
+ <th style="text-align: center">AnotB Result</th>
+ <th style="text-align: center">Result Actions</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: center">00</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">2,2</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">01</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {ThetaA,0,F}</td>
+ <td style="text-align: center">1,3</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">02</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">2,2</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">05</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {ThetaA,0,F}</td>
+ <td style="text-align: center">1,3</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">06</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">New {ThetaA,0,F}</td>
+ <td style="text-align: center">2,3</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">10</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">11</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">12</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">15</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">16</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">20</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">Trim A by minTheta</td>
+ <td style="text-align: center">2,4</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">21</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">Sketch A</td>
+ <td style="text-align: center">1,5</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">22</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">Full Intersect</td>
+ <td style="text-align: center">Full AnotB</td>
+ <td style="text-align: center">6,7</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">25</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">Sketch A</td>
+ <td style="text-align: center">1,5</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">26</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">Full Intersect</td>
+ <td style="text-align: center">Full AnotB</td>
+ <td style="text-align: center">6,7</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">50</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">51</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">52</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">55</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">56</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">1,1</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">60</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">ResultDegen {<1.0,0,F}</td>
+ <td style="text-align: center">New {minTheta,0,F}</td>
+ <td style="text-align: center">Trim A by minTheta</td>
+ <td style="text-align: center">2,4</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">61</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">NewDegen {<1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">Sketch A</td>
+ <td style="text-align: center">1,5</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">62</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">Estimation {<1.0,>0,F}</td>
+ <td style="text-align: center">Full Intersect</td>
+ <td style="text-align: center">Full AnotB</td>
+ <td style="text-align: center">6,7</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">65</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">New {1.0,0,T}</td>
+ <td style="text-align: center">Sketch A</td>
+ <td style="text-align: center">1,5</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">66</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">Exact {1.0,>0,F}</td>
+ <td style="text-align: center">Full Intersect</td>
+ <td style="text-align: center">Full AnotB</td>
+ <td style="text-align: center">6,7</td>
+ </tr>
+ </tbody>
+</table>
+
+<p>The description of each column:</p>
+
+<ul>
+ <li>ID: two octal digits, the first digit represents the state of Sketch A,
the second digit represents the state of Sketch B.</li>
+ <li>Sketch A State</li>
+ <li>Sketch B State</li>
+ <li>Intersection Result</li>
+ <li>AnotB Result</li>
+ <li>The octal representation of the Intersection Result followed by the
octal representation of the AnotB result. The result codes are given by the
following table:</li>
+</ul>
+
+<table>
+ <thead>
+ <tr>
+ <th style="text-align: center">Result Action</th>
+ <th style="text-align: center">Result Code</th>
+ <th style="text-align: left">Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td style="text-align: center">New{1.0,0,T}</td>
+ <td style="text-align: center">1</td>
+ <td style="text-align: left">New empty sketch</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">New{min,0,F}</td>
+ <td style="text-align: center">2</td>
+ <td style="text-align: left">Min=min(thetaA,thetaB)</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">New{thA,0,F}</td>
+ <td style="text-align: center">3</td>
+ <td style="text-align: left">thA=theta of A</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">SkA Min</td>
+ <td style="text-align: center">4</td>
+ <td style="text-align: left">Trim A by minTheta</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">Sketch A</td>
+ <td style="text-align: center">5</td>
+ <td style="text-align: left">Sketch A exactly</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">Full Inter</td>
+ <td style="text-align: center">6</td>
+ <td style="text-align: left">Full intersect</td>
+ </tr>
+ <tr>
+ <td style="text-align: center">Full AnotB</td>
+ <td style="text-align: center">7</td>
+ <td style="text-align: left">Full AnotB</td>
+ </tr>
+ </tbody>
+</table>
+
+<p>Note that the results of a <em>Full Intersect</em> or a <em>Full AnotB</em>
will require further interpretation of the resulting state.
+For example, if the resulting sketch is <em>{1.0,0,?}</em>, then a
<em>New{1.0,0,T}</em> is returned.
+If the resulting sketch is <em>{<1.0,0,?}</em> then a
<em>ResultDegen{<1.0,0,F}</em> is returned.<br />
+Otherwise, the sketch returned will be an estimating or exact <em>{theta,
>0, F}</em>.</p>
+
+<h2 id="testing">Testing</h2>
+<p>The above information is encoded as a model into the special class
<em>org.apache.datasketches.SetOperationCornerCases.java</em>. This class is
made up of enums and static methods to quickly determine for a sketch what
actions to take based on the state of the input arguments. This model is
independent of the implementation of the Theta Sketch, whether the set
operation is performed as a Theta Sketch, or a Tuple Sketch and when translated
can be used in other languages as well.</p>
+
+<p>Before this model was put to use an extensive set of tests was designed to
test any potential implementation against this model. These tests are slightly
different for the Tuple Sketch than the Theta Sketch because the Tuple Sketch
has more combinations to test, but the model is the same.</p>
+
+<ul>
+ <li>The tests for the Theta Sketch can be found in the class
<em>org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest.java</em></li>
+ <li>The tests for the Tuple Sketch can be found in the class
<em>org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest.java</em></li>
+</ul>
+
+<p>The details of how this mode is used in run-time code can be found in the
class <em>org.apache.datasketches.tuple.AnotB.java</em>.</p>
+
+
+ </div> <!-- End content -->
+ </div> <!-- End row -->
+ </div> <!-- End Container -->
+
+<!-- Start _include/page_footer.html -->
+<footer class="ds-footer">
+ <div class="container">
+ <div class="text-center">
+ <p>
+ <div>Copyright © 2020 <a href="https://www.apache.org">Apache Software
Foundation</a>,
+ Licensed under the Apache License, Version 2.0. All Rights
Reserved.<br/>
+ Apache DataSketches, Apache, the Apache feather logo, and the Apache
DataSketches project logos are trademarks of The Apache Software
Foundation.<br/>
+ All other marks mentioned may be trademarks or registered trademarks
of their respective owners.
+ </div>
+ </p>
+ </div>
+ </div>
+</footer>
+<!-- End _include/page_footer.html -->
+
+</body>
+
+</html>
+<!-- End _layouts/doc_page.html-->
\ No newline at end of file
diff --git a/output/docs/Theta/ThetaSetOpsCornerCases.md
b/output/docs/Theta/ThetaSetOpsCornerCases.md
deleted file mode 100644
index fdb1e89..0000000
--- a/output/docs/Theta/ThetaSetOpsCornerCases.md
+++ /dev/null
@@ -1,143 +0,0 @@
-# Theta Sketch and Tuple Sketch Set Operation Corner Cases
-
-The *TupleSketch* is an extension of the *ThetaSketch* and both are part of
the *Theta Sketch Framework*<sup>1</sup>. In this document, the term *Theta*
(upper case) when referencing sketches will refer to both the *ThetaSketch* and
the *TupleSketch*. This is not to be confused with the term *theta* (lower
case), which refers to the sketch variable that tracks the sampling probability
of the sketch.
-
-Because Theta sketches provide the set operations of *intersection* and
*difference* (*A and not B* or just *A not B*), a number of interesting corner
cases arise that require some analysis to determine how the code should handle
them.
-
-Theta sketches track three key variables in addition to retained data:
-
-* *theta*: This is the current sampling probability of the sketch and
mathematically expressed as a 64-bit, double floating value between 0.0 and
1.0. However, internally in the sketch, this value is expressed as a 64-bit,
signed, long integer (usually identified as *thetaLong* in the code), where the
maximum positive value (*Long.MAX_VALUE*) is interpreted as the double 1.0. In
this document we will only refer to the mathematical quantity *theta*.
-
-* *retained entries* or *count*: This is the number of hash values currently
retained in the sketch. It can never be less than zero.
-
-* *empty*:
- * By definition, if *empty = true*, the number of *retained entries* must
be zero. However, the value of *theta* can be 1.0 or less-than 1.0.
- * If *empty* = false, the *retained entries* can be zero or greater than
zero, and *theta* can be 1.0 or less than 1.0.
-
-We have developed a short hand notation for these three variables to record
their state as *{theta, retained entries, empty}*. When analyzing the corner
cases of the set operations, we only need to know whether *theta* is 1.0 or
less than 1.0, *retained entries* is zero or greater than zero, and *empty* is
true or false. These are further abbreviated as
-
-* *theta* can be *1.0* or *<1.0*
-* *retained entries* can be either *0* or *>0*
-* *empty* can be either *T* or *F*
-
-Each of the above three variables can be represented as boolean variable.
Thus, there are 8 possible combinations of the three variables.
-
-<sup>1</sup> Anirban Dasgupta, Kevin J. Lang, Lee Rhodes, and Justin Thaler. A
framework for estimating stream expression cardinalities. In *EDBT/ICDT
Proceedings ‘16 *, pages 6:1–6:17, 2016.
-
-## Valid States of a Sketch
-
-Of the eight possible combinations of the three variables and using the above
notation, there are five valid states of a *Theta* sketch.
-
-### New{1.0, 0, T}
-When a new sketch is created, *theta* is set to 1.0, *retained entries* is set
to zero, and *empty* is true. This state can also occur as the result of a set
operation, where the operation creates a new sketch to potentially load result
data into the sketch but there is no data to load into the sketch. So it
effectively returns a new sketch that has been untouched and unaffected by the
input arguments to the set operation.
-
-### Exact{1.0, >0, F}
-All of the *Theta* sketches have an input buffer that is effectively a list of
items received by the sketch. If the number of unique input values does not
exceed the size of that buffer, the sketch is in *exact* mode. There is no
probabilistic estimation involved so *theta = 1.0*, which indicates that all
unique values presented to the sketch are in the buffer. *retained entries* is
the count of those values in the buffer, and the sketch is clearly not *empty*.
-
-### Estimation{<1.0, >0, F}
-Here, the number of inputs to the sketch have exceeded the size of the input
buffer, so the sketch must start choosing what values to retain in the sketch
and starts reducing the value of *theta* accordingly. *theta < 1.0*, *retained
entries > 0*, and *empty = F*.
-
-### NewDegen{<1.0, 0, T}
-This is a new sketch where the user has set the sampling probability, *p <
1.0* and the sketch has not been presented any data. Internally at
initialization, *theta* is set to *p*, so if *p = 0.5*, *theta* will be set to
*0.5*. Since the sketch has not seen any data, *retained entries = 0* and
*empty = T*. This is degenerative form of a new sketch, thus its name.
-
-### ResultDegen{<1.0, 0, F}
-This requires some explanation. Imagine the intersection of two estimating
sketches where the values retained in the two sketches are disjoint (i.e, no
overlap). Since the two sketches chose their internal values at random, there
remains some probability that there could be common values in an exactly
computed intersection, but it just so happens that one of the two sketches did
not select any of them in the random sampling process. Therefore, the
*retained entries = 0*. The value *1. [...]
-
-### Summary Table of the Valid States of a Sketch
-The *Has Seen Data* column is not an independent variable, but helps with the
interpretation of the state.
-
-We can assign a single octal digit ID to each state where
-
-* *theta = 1.0 := 4, else 0*
-* *retained entries >0 := 2, else 0*
-* *empty = true := 1, else 0*
-
-| Shorthand Notation | theta | retained entries | empty |
Has Seen Data | ID | Comments |
-|:---------------------------------:|:-----:|:----------------:|:----------:|:-------------:|:--:|:------------------------------:|
-| New {1.0,0,T} | 1.0 | 0 | T |
F | 5 | New Sketch, p=1.0 (default) |
-| Exact {1.0,>0,F} | 1.0 | >0 | F |
T | 6 | Exact Mode |
-| Estimation {<1.0,>0,F} | <1.0 | >0 | F |
T | 2 | Estimation Mode |
-| NewDegen {<1.0,0,T}<sup>2</sup> | <1.0 | 0 | T |
F | 1 | New Sketch, user sets p<1.0 |
-| ResultDegen {<1.0,0,F}<sup>3</sup>| <1.0 | 0 | F |
T | 0 | Valid Intersect or AnotB result |
-
-<sup>2</sup> *New Degenerate*: New Empty Sketch where the user sets *p < 1.0*.
This can be safely reinterpreted as {1.0,0,T} because it has not seen any
data.<br>
-<sup>3</sup> *Result Degenerate*: Can appear as a result of a an Intersection
or AnotB of certain combination of sketches.
-
-## Invalid States of a Sketch
-The remaining three combinations of the variables represent internal errors
and should not occur.
-The *Has Seen Data* column is not an independent variable, but helps with the
interpretation of the state.
-
-| Theta | Retained Entries | Empty Flag | Has Seen Data | Comments
|
-|:-----:|:----------------:|:----------:|:-------------:|:--------------------------------------------------:|
-| 1.0 | 0 | T | T | If it has seen data,
Theta != 1.0 AND Entries = 0. |
-| 1.0 | >0 | F | F | If it has not seen
data, Entries !> 0. |
-| <1.0 | >0 | F | F | If it has not seen
data, Entries !> 0. |
-
-
-## Combinations of States of Two Sketches
-Each sketch can have 5 valid states, which means we can have 25 combinations
of states of two sketches as expanded in the following table.
-
-| ID | Sketch A | Sketch B | Intersection Result
| AnotB Result | Result Actions |
-|:--:|:----------------------:|:----------------------:|:--------------------:|:-------------------:|:--------------:|
-| 00 | ResultDegen {<1.0,0,F} | ResultDegen {<1.0,0,F} | New {minTheta,0,F}
| New {minTheta,0,F} | 2,2 |
-| 01 | ResultDegen {<1.0,0,F} | NewDegen {<1.0,0,T} | New {1.0,0,T}
| New {ThetaA,0,F} | 1,3 |
-| 02 | ResultDegen {<1.0,0,F} | Estimation {<1.0,>0,F} | New {minTheta,0,F}
| New {minTheta,0,F} | 2,2 |
-| 05 | ResultDegen {<1.0,0,F} | New {1.0,0,T} | New {1.0,0,T}
| New {ThetaA,0,F} | 1,3 |
-| 06 | ResultDegen {<1.0,0,F} | Exact {1.0,>0,F} | New {minTheta,0,F}
| New {ThetaA,0,F} | 2,3 |
-| 10 | NewDegen {<1.0,0,T} | ResultDegen {<1.0,0,F} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 11 | NewDegen {<1.0,0,T} | NewDegen {<1.0,0,T} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 12 | NewDegen {<1.0,0,T} | Estimation {<1.0,>0,F} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 15 | NewDegen {<1.0,0,T} | New {1.0,0,T} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 16 | NewDegen {<1.0,0,T} | Exact {1.0,>0,F} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 20 | Estimation {<1.0,>0,F} | ResultDegen {<1.0,0,F} | New {minTheta,0,F}
| Trim A by minTheta | 2,4 |
-| 21 | Estimation {<1.0,>0,F} | NewDegen {<1.0,0,T} | New {1.0,0,T}
| Sketch A | 1,5 |
-| 22 | Estimation {<1.0,>0,F} | Estimation {<1.0,>0,F} | Full Intersect
| Full AnotB | 6,7 |
-| 25 | Estimation {<1.0,>0,F} | New {1.0,0,T} | New {1.0,0,T}
| Sketch A | 1,5 |
-| 26 | Estimation {<1.0,>0,F} | Exact {1.0,>0,F} | Full Intersect
| Full AnotB | 6,7 |
-| 50 | New {1.0,0,T} | ResultDegen {<1.0,0,F} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 51 | New {1.0,0,T} | NewDegen {<1.0,0,T} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 52 | New {1.0,0,T} | Estimation {<1.0,>0,F} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 55 | New {1.0,0,T} | New {1.0,0,T} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 56 | New {1.0,0,T} | Exact {1.0,>0,F} | New {1.0,0,T}
| New {1.0,0,T} | 1,1 |
-| 60 | Exact {1.0,>0,F} | ResultDegen {<1.0,0,F} | New {minTheta,0,F}
| Trim A by minTheta | 2,4 |
-| 61 | Exact {1.0,>0,F} | NewDegen {<1.0,0,T} | New {1.0,0,T}
| Sketch A | 1,5 |
-| 62 | Exact {1.0,>0,F} | Estimation {<1.0,>0,F} | Full Intersect
| Full AnotB | 6,7 |
-| 65 | Exact {1.0,>0,F} | New {1.0,0,T} | New {1.0,0,T}
| Sketch A | 1,5 |
-| 66 | Exact {1.0,>0,F} | Exact {1.0,>0,F} | Full Intersect
| Full AnotB | 6,7 |
-
-The description of each column:
-
-* ID: two octal digits, the first digit represents the state of Sketch A, the
second digit represents the state of Sketch B.
-* Sketch A State
-* Sketch B State
-* Intersection Result
-* AnotB Result
-* The octal representation of the Intersection Result followed by the octal
representation of the AnotB result. The result codes are given by the following
table:
-
-| Result Action | Result Code | Description |
-|:-------------:|:-----------:|:-----------------------|
-| New{1.0,0,T} | 1 | New empty sketch |
-| New{min,0,F} | 2 | Min=min(thetaA,thetaB) |
-| New{thA,0,F} | 3 | thA=theta of A |
-| SkA Min | 4 | Trim A by minTheta |
-| Sketch A | 5 | Sketch A exactly |
-| Full Inter | 6 | Full intersect |
-| Full AnotB | 7 | Full AnotB |
-
-
-Note that the results of a *Full Intersect* or a *Full AnotB* will require
further interpretation of the resulting state.
-For example, if the resulting sketch is *{1.0,0,?}*, then a *New{1.0,0,T}* is
returned.
-If the resulting sketch is *{<1.0,0,?}* then a *ResultDegen{<1.0,0,F}* is
returned.
-Otherwise, the sketch returned will be an estimating or exact *{theta, >0, F}*.
-
-## Testing
-The above information is encoded as a model into the special class
*org.apache.datasketches.SetOperationCornerCases.java*. This class is made up
of enums and static methods to quickly determine for a sketch what actions to
take based on the state of the input arguments. This model is independent of
the implementation of the Theta Sketch, whether the set operation is performed
as a Theta Sketch, or a Tuple Sketch and when translated can be used in other
languages as well.
-
-Before this model was put to use an extensive set of tests was designed to
test any potential implementation against this model. These tests are slightly
different for the Tuple Sketch than the Theta Sketch because the Tuple Sketch
has more combinations to test, but the model is the same.
-
-* The tests for the Theta Sketch can be found in the class
*org.apache.datasketches.theta.CornerCaseThetaSetOperationsTest.java*
-* The tests for the Tuple Sketch can be found in the class
*org.apache.datasketches.tuple.aninteger.CornerCaseTupleSetOperationsTest.java*
-
-The details of how this mode is used in run-time code can be found in the
class *org.apache.datasketches.tuple.AnotB.java*.
-
-
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]