This is an automated email from the ASF dual-hosted git repository.
charlie pushed a commit to branch add-python-plotting
in repository
https://gitbox.apache.org/repos/asf/datasketches-characterization.git
The following commit(s) were added to refs/heads/add-python-plotting by this
push:
new 058c211 Added space profile and analysis
058c211 is described below
commit 058c2116dc059cd8987eb65305015330b28507b7
Author: c-dickens <[email protected]>
AuthorDate: Mon Aug 19 16:09:41 2024 +0100
Added space profile and analysis
---
.../resources/filters/BloomFilterSpaceJob.conf | 2 +-
src/notebooks/FilterErrorProfile.ipynb | 1242 ++++++++++++++++++++
src/notebooks/filterSizeProfile.ipynb | 1119 ++++++++++++++++++
.../BloomFilterSizeProfile20240703_045451PST.txt | 203 ++++
...QuotientFilterSizeProfile20240703_045733PST.txt | 200 ++++
.../BloomFilterSizeProfile20240703_014140PST.txt | 203 ++++
...entFilterSizeProfile20240703_014255PST_1E-6.txt | 200 ++++
7 files changed, 3168 insertions(+), 1 deletion(-)
diff --git a/src/main/resources/filters/BloomFilterSpaceJob.conf
b/src/main/resources/filters/BloomFilterSpaceJob.conf
index c6008a4..2729d79 100644
--- a/src/main/resources/filters/BloomFilterSpaceJob.conf
+++ b/src/main/resources/filters/BloomFilterSpaceJob.conf
@@ -40,6 +40,6 @@ FileNameDateFormat=yyyyMMdd'_'HHmmssz
ReadableDateFormat=yyyy/MM/dd HH:mm:ss z
#Job Profile
-JobProfile=org.apache.datasketches.characterization.filters.BloomFilterSizeProfile
+JobProfile=org.apache.datasketches.characterization.filters.BloomFilterSpaceProfile
targetFpp=1e-3
diff --git a/src/notebooks/FilterErrorProfile.ipynb
b/src/notebooks/FilterErrorProfile.ipynb
new file mode 100644
index 0000000..f359b3e
--- /dev/null
+++ b/src/notebooks/FilterErrorProfile.ipynb
@@ -0,0 +1,1242 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "df8bb271",
+ "metadata": {},
+ "source": [
+ "# Error Profile: varying the signature length.\n",
+ "\n",
+ "### 1. Introduction\n",
+ "\n",
+ "This notebooks compares the **error** behaviour of the Bloom and Quotient
Filter as the number of hash bits for each filter is varied.\n",
+ "Both filters are designed for _approximate set membership_ queries and
have the property that, upon querying for the membership of an element:\n",
+ "1. Quering any item inserted into the sketch will always return `true`
and **never** return `false` -- _\"no false negative property\"._\n",
+ "2. Some items that were not in the set used to build the filter **may**
return `true` -- this is known as a _false positive_.\n",
+ "\n",
+ "The **false positive probability (fpr)** is the probability that the
filter returns `true` to any query item not in the set used to build the
filter. We test this by building a filter $S$ over a fixed input cardinality
$n$ and then generating a query set $X$ that has empty intersection with the
input set. For every item in $X$, we count the number of times that the filter
returns `true` to the approximate membership query and divide this by $|X|$.
Over large enough sets $X$ and mu [...]
+ "For each filter, a signature is generated and the number of hash bits is
the length of the signature.\n",
+ "The experiment is repeated over different signature lengths and the
performance is compared to the theoretically expected curves.\n",
+ "\n",
+ "## 2. Types of Filter\n",
+ "\n",
+ "### 2.1 Bloom Filter ###\n",
+ "\n",
+ "A **Bloom Filter** is an array of $m$ bits. The array is populated using
$k$ independent hash functions and the signature is the tuple of index
locations in the array \n",
+ "$h(x) = (h_1(x), h_2(x), \\dots, h_k(x))$ with each $h_j(x) \\in \\{0, 1,
2, \\dots, m-1 \\}$.\n",
+ "Bloom filter accepts $n < m$ items with the specific $n$ coupled to both
the length $m$ and the number of hash functions, $k$.\n",
+ "The Bloom filter fpr is approximately $\\epsilon = (1 -
e^{-\\frac{nk}{m}})^k$ which is minimised with the choice of $k = \\ln(2)
\\frac{m}{n}$.\n",
+ "\n",
+ "An optimally initialised Bloom filter has $k = \\frac{m}{n} \\ln(2)$ so
that the **space** in bits is $m = \\frac{k}{\\ln(2) n}$ and the **false
positive probability** is approximately:\n",
+ "\\begin{align}\n",
+ "\\epsilon &\\approx (1 - e^{\\frac{n}{m}k})^k \\\\ \n",
+ "&=(1 - e^{\\frac{n}{m}\\frac{m}{n} \\ln(2)})^{\\frac{m}{n} \\ln(2)} \\\\
\n",
+ "&= 2^{-\\frac{m}{n} \\ln(2)} \\tag{*} \\\\ \n",
+ "&= 2^{-k}.\n",
+ "\\end{align}\n",
+ "\n",
+ "### 2.2 Quotient Filter ###\n",
+ "\n",
+ "A **Quotient Filter** is an array of $k=2^Q$ slots, with each slot
storing $b = 3 + f$ bits in total: $3$ bits are used for state and\n",
+ "a further $f$ bits are used for a fingerprint that partially identifies
the item in a given slot.\n",
+ "We think of the fingerprint length, $f$, as being the signature length as
it is this portion of a hash function output that \n",
+ "is related to the accuracy of a generated filter.\n",
+ "To generate a fingerprint, we take a hash function $h(x)$ and use $Q$ of
the hash bits to for the slot selection and $f$ of them for the fingerprint \n",
+ "selection. \n",
+ "Any further hash bits may be discarded and the total *space** usage in
bits is $m = 2^Q(3+f)$.\n",
+ "\n",
+ "The $Q$ slot bits are implicitly stored via the array index and any items
that land in the same slot but have distinct fingerprints have their \n",
+ "collision resolved through the three metadata state bits.\n",
+ "The $Q$ bits used for the address space represent a value for each item
that refers to its _canonical slot_ -- the location an item would be stored
were \n",
+ "there no hash collisions.\n",
+ "Collisions are resolved similarly to linear probing with the combination
of state and fingerprint bits being used to disambiguate items.\n",
+ "The Quotient Filter can, conceptually, accept up to $n = 2^Q$ items,
however, this would cause complete saturation of the filter so isn't practical.
\n",
+ "Rather, a **load factor**, $\\alpha$, threshold between zero and one is
chosen and only this fraction of the slots are used which acts as a bound on
the maximum number of items that can be inserted into the sketch so that
$\\alpha \\le \\frac{n}{k}$.\n",
+ "\n",
+ "A false positive is obtained if an unseen element hashes to a slot that
is canonical for a previously seen element and has a matching fingerprint in
the correct location.\n",
+ "The probability of the former event, assuming that $n$ items have been
inserted, is at most $\\alpha$ while the probability for the latter\n",
+ "The false positive rate using $r$ bit fingerprints is approximately:\n",
+ "\\begin{align*}\n",
+ "\\epsilon &\\approx \\alpha 2^{-r}.\n",
+ "\\end{align*}\n",
+ "\n",
+ "## 2. Results\n",
+ "\n",
+ "First we show the theoretical type of tradeoff that we hope to find in
the characterisation profiles.\n",
+ "For both filters, we expect the false positive rate to exponentially
decrease with the number of hash bits used."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "b9634ba5-37f9-47fd-ab24-eeb36e427cf3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd \n",
+ "import matplotlib.pyplot as plt\n",
+ "from matplotlib.ticker import AutoMinorLocator\n",
+ "from typing import Tuple\n",
+ "%matplotlib inline\n",
+ "plt.rcParams['font.size'] = 14"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "5bc3ab73",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'False positive rate vs. Number of hash bits')"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABAYAAALLCAYAAABq/ICJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd1QU1/v48fcsLL1YAFEQsKDSLAgYK5ZEjSZqbKCJRk3s6ZrmJ9ZoLFFjejOJJSqaWBJboolRY4oiShELiCKIHSkiUnd+f/hlfxLaroJYntc5nKMzd+48d/buwjx7515FVVUVIYQQQgghhBBCPJQ01R2AEEIIIYQQQgghqo8kBoQQQgghhBBCiIeYJAaEEEIIIYQQQoiHmCQGhBBCCCGEEEKIh5gkBoQQQgghhBBCiIeYJAaEEEIIIYQQQoiHmCQGhBBCCCGEEEKIh5gkBoQQQgghhBBCiIeYJAaE
[...]
+ "text/plain": [
+ "<Figure size 1200x800 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "num_hash_bits = np.arange(1,33)\n",
+ "\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(12, 8))\n",
+ "ax.plot(num_hash_bits, 2.**(-num_hash_bits), label=\"Optimal Bloom
Filter\", color=\"blue\") \n",
+ "\n",
+ "colours = [\"silver\", \"grey\", \"black\"]\n",
+ "for i, alpha in enumerate([0.5, 0.8, 0.9]):\n",
+ " theory_str = \"$\\epsilon_{{}} = {} \\cdot 2^{{-f}}$\".format(alpha,
alpha)\n",
+ " ax.plot(num_hash_bits, alpha*2.**(-num_hash_bits), label=theory_str,
color=colours[i])\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\")\n",
+ "ax.legend()\n",
+ "ax.set_xlabel(r\"Number of hash bits\")\n",
+ "ax.set_ylabel(\"False Positive Rate\")\n",
+ "ax.set_title(\"False positive rate vs. Number of hash bits\")\n",
+ "#fig.savefig(\"false-positive-rates.pdf\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "777bb727",
+ "metadata": {},
+ "source": [
+ "Note that for a _fixed_ bitmap length and input size, simply increasing
the number of hash functions in the Bloom Filter does not improve
performance.\n",
+ "When the number of hash functions increases, there is competing
behaviour: on the one hand, more bits are set which can increase the number of
false positives reported.\n",
+ "On the other hand, a false positive is triggered only when all of the bit
positions are set, so using more hash functions means that the requirement is
higher.\n",
+ "In contrast, because the false positive probability of a Quotient Filter
is only related to the fingerprint length (rather than also the number of items
inserted), increasing\n",
+ "the length of the fingerprint yields a monotonically decreasing false
positive rate.\n",
+ "We can see this feature in the following plot."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "636b5abf",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(5e-06, 2.3405078559743058)"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABAYAAAKxCAYAAADARa4uAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXyM1/7A8c/s2XdbLBGRxE4V1aoQWkpbdLO1LrqpanXR9raKJGh7Xa5fW3F7Ve+tXbWl9F4UVVE7tVeQILFvScieWc/vj8mMjJlEVhHO22teyTzPmec5z3gy85zvc873KIQQAkmSJEmSJEmSJEmS7knK6q6AJEmSJEmSJEmSJEnVRwYGJEmSJEmSJEmSJOkeJgMDkiRJkiRJkiRJknQPk4EBSZIkSZIkSZIkSbqHycCAJEmSJEmSJEmSJN3DZGBAkiRJkiRJkiRJku5hMjAgSZIkSZIkSZIkSfcw
[...]
+ "text/plain": [
+ "<Figure size 1200x800 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "ns = [1.5E8]\n",
+ "ms = [4, 8, 12, 16, 24]\n",
+ "ks = np.arange(1, 26)\n",
+ "\n",
+ "\n",
+ "fig, ax = plt.subplots(figsize=(12, 8))\n",
+ "\n",
+ "\n",
+ "# bloom plots \n",
+ "bf_lines = []\n",
+ "for n in ns:\n",
+ " for r in ms:\n",
+ " m = r*n\n",
+ " bf_l, = ax.plot(ks, (1. - np.exp(-ks*n / m))**ks,
label=f\"({n:.1e}, {m:.1e})\")\n",
+ " bf_lines.append(bf_l)\n",
+ "opt_bf_l, = ax.plot(ks, 2.**(-ks), label=r\"${fpr} = 2^{-f}$\",
color=\"magenta\", linewidth=2)\n",
+ "bf_lines.append(opt_bf_l)\n",
+ "bf_legend = ax.legend(title=\"Bloom Filter: (n, m)\", handles=bf_lines,
loc=\"upper left\")\n",
+ "ax.add_artist(bf_legend)\n",
+ "\n",
+ "# quotient plots\n",
+ "qf_lines = []\n",
+ "linestyles = [\":\", \"-.\", \"--\"]\n",
+ "for ii, alpha in enumerate([0.5, 0.75, 0.9]):\n",
+ " qf_l, = ax.plot(ks, alpha*2.**(-ks), label=f\"${alpha:.2f} \\cdot
2^{{-f}}$\", linestyle=linestyles[ii], color=\"black\")\n",
+ " qf_lines.append(qf_l)\n",
+ "\n",
+ "qf_legend = ax.legend(handles=qf_lines, title=\"Quotient filter FPR\",
loc=\"upper right\")\n",
+ "\n",
+ "ax.set_xlabel(r\"Number of hash bits: $h$\")\n",
+ "ax.set_ylabel(\"False Positive Rate\")\n",
+ "ax.set_yscale(\"log\")\n",
+ "ax.grid()\n",
+ "\n",
+ "ax.set_ylim(bottom=5e-6)\n",
+ "#fig.savefig(\"image/fpr-vs-numh.pdf\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "06245ebd",
+ "metadata": {},
+ "source": [
+ "Depending on the parameter setting, miscalibrating the Bloom filter
parameters can lead to sharper changes in error performance."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "60b8bae1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "<matplotlib.legend.Legend at 0x12f181410>"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABAMAAAKxCAYAAAAmbGVqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAACtxElEQVR4nOzdeVxU9f7H8fcZdmRTwQ0VxL0yNQ3NNTOXzNJWzbJd27s3ra52K6tfZZa35bbd0m5qaWlZWrfMLffU3DVLxQUEXEEEZR2Y+f0BTBIuMAycGXg9Hw8fNeecOXyGL5Tznu/38zXsdrtdAAAAAACgxrCYXQAAAAAAAKhahAEAAAAAANQwhAEAAAAAANQwhAEAAAAAANQwhAEAAAAAANQwhAEAAAAAANQwhAEAAAAAANQwhAEAAAAAANQw3mYXUF3ZbDYdOnRIwcHBMgzD7HIAAAAAANWc3W7XqVOn1KhR
[...]
+ "text/plain": [
+ "<Figure size 1200x800 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "n = int(0.9*(1<<25)) # chosen for illustration\n",
+ "m = 1<<28\n",
+ "ks = np.arange(1, 17)\n",
+ "fig, ax = plt.subplots(figsize=(12, 8))\n",
+ "ax.plot(ks, (1. - np.exp(-ks*n / m))**ks, label=\"Bloom Filter FPR\",
marker=\"*\")\n",
+ "ax.set_xlabel(\"Number of Hash Functions\")\n",
+ "ax.set_ylabel(\"False Positive Rate\")\n",
+ "ax.grid()\n",
+ "ax.legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ba3dd229",
+ "metadata": {},
+ "source": [
+ "### Experiment 1. Validating the theoretical performance.\n",
+ "\n",
+ "In this plot we demonstrate that the expected behaviour above is borne
out in practice.\n",
+ "\n",
+ "The experimental configuration can be found at the bottom of the input
result file and in this case is as follows.\n",
+ "Set the input cardinality to $n = 0.8 * 2^20$; vary the number of hash
bits from a lower to an upper bound and use the number of hash bits to
determine the number of trials at each hash bit value; \n",
+ "for the Bloom filter, use these parameters to determine an optimal filter
and for the Quotient filter just use the number of hash bits as the fingerprint
length."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "2012f2a5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def bloom_filter_fpr(n:int, k:int) -> float:\n",
+ " \"\"\"Returns the theoretical false positive rate of a Bloom filter
with n elements, and k hash functions.\"\"\"\n",
+ " m = k * n\n",
+ " m /= np.log(2.)\n",
+ " m = int(m)\n",
+ " return (1 - (1 - 1/m)**(k*n))**k\n",
+ "\n",
+ "def quotient_filter_fpr(alpha: float, bits_per_entry: int) -> float:\n",
+ " \"\"\"\n",
+ " Returns the theoretical false positive rate of a Quotient filter with
alpha load factor that \n",
+ " uses `bits_per_entry` many bits for each entry added to the
sketch.\n",
+ " \"\"\"\n",
+ " return 2**(3 - alpha*bits_per_entry)\n",
+ "\n",
+ "from scipy.stats import binom as binom_dist\n",
+ "def probability_sum(num_slots:int, num_elements:int,
fingerprint_length:int) -> float:\n",
+ " \"\"\"\n",
+ " The vectorized part of this code more efficiently evaluates \n",
+ " for k in range(m+1):\n",
+ " probs[k] = dist.pmf(k)\n",
+ " probs[k] *= (1. - (1. - 2.**(-f))**k)\n",
+ "\n",
+ " This sum is well approximated by alpha*2^(-f) for alpha = n / m\n",
+ " \"\"\"\n",
+ " m = num_slots\n",
+ " n = num_elements\n",
+ " f = fingerprint_length\n",
+ " assert isinstance(m, int), \"num_slots must be an integer\"\n",
+ " assert isinstance(n, int), \"num_elements must be an integer\"\n",
+ " assert isinstance(f, int), \"fingerprint_length must be an
integer\"\n",
+ " dist = binom_dist(n, 1./m)\n",
+ " probs = dist.pmf(np.arange(m+1))\n",
+ " probs *= (1. - (1. - 2.**(-f))**np.arange(m+1))\n",
+ " return np.sum(probs)\n",
+ "\n",
+ "n = int(0.9 * 2**20)\n",
+ "theoretical_bloom_results = [] \n",
+ "theoretical_quotient_results = []\n",
+ "num_test_bits = np.arange(4,25)\n",
+ "\n",
+ "for k in num_test_bits:\n",
+ " theoretical_bloom_results.append([k, k/np.log(2), bloom_filter_fpr(n,
k)])\n",
+ " theoretical_quotient_results.append([k, k, quotient_filter_fpr(0.9,
k)])\n",
+ "\n",
+ "\n",
+ "bloom_theory_accuracy = pd.DataFrame(theoretical_bloom_results,
columns=['k', 'bitsPerEntry', 'FPR'])\n",
+ "quotient_theory_accuracy = pd.DataFrame(theoretical_quotient_results,
columns=['k', 'bitsPerEntry', 'FPR'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "71f122df",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def parse_accuracy_results(filename:str) -> Tuple[pd.DataFrame, float,
int]:\n",
+ " \"\"\"\n",
+ " Takes an input file name in .txt format, strips out the extra strings
and returns \n",
+ " the main results in a dataframe.\n",
+ "\n",
+ " The universe size is the largest power of 2 that is used to insert
into the sketch and \n",
+ " the capacity is how far along the power of two interval used for
maximum cardinality.\n",
+ "\n",
+ " Eg. capacity = 0.9 and lgU = 20 means that we insert 0.9 * 2^20
elements into the sketch.\n",
+ " \"\"\"\n",
+ " with open(filename) as f:\n",
+ " lines = f.readlines()[2:]\n",
+ " ignore_from_idx = lines.index(\"PROPERTIES:\\n\")\n",
+ " results = lines[:ignore_from_idx-1] # there is an empty line
preceding the ignore_from_idx\n",
+ " results_clean = [r.strip(\"\\n\") for r in results]\n",
+ " results_df = pd.DataFrame(sub.split(\"\\t\") for sub in
results_clean[1:])\n",
+ " results_df.columns = results_clean[0].split(\"\\t\")\n",
+ " for col in results_df.columns:\n",
+ " results_df[col] = pd.to_numeric(results_df[col])\n",
+ "\n",
+ " # get the universe size and capacity \n",
+ " properties = lines[ignore_from_idx:]\n",
+ " properties_clean = [p.strip(\"\\n\") for p in properties]\n",
+ " strings_to_find = ['Universe_capacity', 'Universe_lgU']\n",
+ " found_strings = [s for s in properties_clean if any(substring in s
for substring in strings_to_find)]\n",
+ " values = [float(s.split('=')[1]) for s in found_strings]\n",
+ " capacity = values[0]\n",
+ " lgU = int(values[1])\n",
+ " return results_df, capacity, lgU"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "dfc34112",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "bloom_results_df, bloom_thld, bloom_lgU =
parse_accuracy_results(\"BloomFilterAccuracyProfile20240612_050335PST.txt\")\n",
+ "quotient_results_df, q90, _ =
parse_accuracy_results(\"QuotientFilterAccuracyProfile20240617_065441PST.txt\")\n",
+ "quotient_results_df_75, q75, _ =
parse_accuracy_results(\"QuotientFilterAccuracyProfile20240612_062110PST.txt\")\n",
+ "m = 2**20\n",
+ "q_alpha = 0.9\n",
+ "q_n = int(q_alpha*m)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "3c28b6f5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "20"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "bloom_lgU"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "080f7541",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>numHashes</th>\n",
+ " <th>bitsPerEntry_bloom</th>\n",
+ " <th>FPR_bloom</th>\n",
+ " <th>filterSizeBits_bloom</th>\n",
+ " <th>bitsPerEntry_quotient</th>\n",
+ " <th>FPR_quotient</th>\n",
+ " <th>filterSizeBits_quotient</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>6</th>\n",
+ " <td>10</td>\n",
+ " <td>14</td>\n",
+ " <td>1.016590e-03</td>\n",
+ " <td>13614976</td>\n",
+ " <td>10</td>\n",
+ " <td>7.020040e-03</td>\n",
+ " <td>10485760</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>7</th>\n",
+ " <td>11</td>\n",
+ " <td>15</td>\n",
+ " <td>4.788000e-04</td>\n",
+ " <td>14976512</td>\n",
+ " <td>11</td>\n",
+ " <td>3.261530e-03</td>\n",
+ " <td>11534336</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>8</th>\n",
+ " <td>12</td>\n",
+ " <td>17</td>\n",
+ " <td>2.345400e-04</td>\n",
+ " <td>16337984</td>\n",
+ " <td>12</td>\n",
+ " <td>1.752870e-03</td>\n",
+ " <td>12582912</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>9</th>\n",
+ " <td>13</td>\n",
+ " <td>18</td>\n",
+ " <td>1.299970e-04</td>\n",
+ " <td>17699520</td>\n",
+ " <td>13</td>\n",
+ " <td>8.584560e-04</td>\n",
+ " <td>13631488</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>10</th>\n",
+ " <td>14</td>\n",
+ " <td>20</td>\n",
+ " <td>5.921320e-05</td>\n",
+ " <td>19060992</td>\n",
+ " <td>14</td>\n",
+ " <td>4.600410e-04</td>\n",
+ " <td>14680064</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>11</th>\n",
+ " <td>15</td>\n",
+ " <td>21</td>\n",
+ " <td>2.975460e-05</td>\n",
+ " <td>20422464</td>\n",
+ " <td>15</td>\n",
+ " <td>2.182010e-04</td>\n",
+ " <td>15728640</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>12</th>\n",
+ " <td>16</td>\n",
+ " <td>23</td>\n",
+ " <td>1.885760e-05</td>\n",
+ " <td>21784000</td>\n",
+ " <td>16</td>\n",
+ " <td>1.101220e-04</td>\n",
+ " <td>16777216</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>13</th>\n",
+ " <td>17</td>\n",
+ " <td>24</td>\n",
+ " <td>7.708870e-06</td>\n",
+ " <td>23145472</td>\n",
+ " <td>17</td>\n",
+ " <td>5.435940e-05</td>\n",
+ " <td>17825792</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>14</th>\n",
+ " <td>18</td>\n",
+ " <td>25</td>\n",
+ " <td>3.326770e-06</td>\n",
+ " <td>24507008</td>\n",
+ " <td>18</td>\n",
+ " <td>2.869890e-05</td>\n",
+ " <td>18874368</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>15</th>\n",
+ " <td>19</td>\n",
+ " <td>27</td>\n",
+ " <td>2.005160e-06</td>\n",
+ " <td>25868480</td>\n",
+ " <td>19</td>\n",
+ " <td>1.393830e-05</td>\n",
+ " <td>19922944</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>16</th>\n",
+ " <td>20</td>\n",
+ " <td>28</td>\n",
+ " <td>8.742010e-07</td>\n",
+ " <td>27229952</td>\n",
+ " <td>20</td>\n",
+ " <td>6.834670e-06</td>\n",
+ " <td>20971520</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>17</th>\n",
+ " <td>21</td>\n",
+ " <td>30</td>\n",
+ " <td>5.129610e-07</td>\n",
+ " <td>28591488</td>\n",
+ " <td>21</td>\n",
+ " <td>3.294510e-06</td>\n",
+ " <td>22020096</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>18</th>\n",
+ " <td>22</td>\n",
+ " <td>31</td>\n",
+ " <td>2.861020e-07</td>\n",
+ " <td>29952960</td>\n",
+ " <td>22</td>\n",
+ " <td>1.688800e-06</td>\n",
+ " <td>23068672</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>19</th>\n",
+ " <td>23</td>\n",
+ " <td>33</td>\n",
+ " <td>1.490120e-07</td>\n",
+ " <td>31314496</td>\n",
+ " <td>23</td>\n",
+ " <td>8.621390e-07</td>\n",
+ " <td>24117248</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>20</th>\n",
+ " <td>24</td>\n",
+ " <td>34</td>\n",
+ " <td>5.501970e-08</td>\n",
+ " <td>32675968</td>\n",
+ " <td>24</td>\n",
+ " <td>4.779830e-07</td>\n",
+ " <td>25165824</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " numHashes bitsPerEntry_bloom FPR_bloom filterSizeBits_bloom
\\\n",
+ "6 10 14 1.016590e-03 13614976
\n",
+ "7 11 15 4.788000e-04 14976512
\n",
+ "8 12 17 2.345400e-04 16337984
\n",
+ "9 13 18 1.299970e-04 17699520
\n",
+ "10 14 20 5.921320e-05 19060992
\n",
+ "11 15 21 2.975460e-05 20422464
\n",
+ "12 16 23 1.885760e-05 21784000
\n",
+ "13 17 24 7.708870e-06 23145472
\n",
+ "14 18 25 3.326770e-06 24507008
\n",
+ "15 19 27 2.005160e-06 25868480
\n",
+ "16 20 28 8.742010e-07 27229952
\n",
+ "17 21 30 5.129610e-07 28591488
\n",
+ "18 22 31 2.861020e-07 29952960
\n",
+ "19 23 33 1.490120e-07 31314496
\n",
+ "20 24 34 5.501970e-08 32675968
\n",
+ "\n",
+ " bitsPerEntry_quotient FPR_quotient filterSizeBits_quotient \n",
+ "6 10 7.020040e-03 10485760 \n",
+ "7 11 3.261530e-03 11534336 \n",
+ "8 12 1.752870e-03 12582912 \n",
+ "9 13 8.584560e-04 13631488 \n",
+ "10 14 4.600410e-04 14680064 \n",
+ "11 15 2.182010e-04 15728640 \n",
+ "12 16 1.101220e-04 16777216 \n",
+ "13 17 5.435940e-05 17825792 \n",
+ "14 18 2.869890e-05 18874368 \n",
+ "15 19 1.393830e-05 19922944 \n",
+ "16 20 6.834670e-06 20971520 \n",
+ "17 21 3.294510e-06 22020096 \n",
+ "18 22 1.688800e-06 23068672 \n",
+ "19 23 8.621390e-07 24117248 \n",
+ "20 24 4.779830e-07 25165824 "
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "combined_df = pd.merge(bloom_results_df, quotient_results_df,
on=\"numHashes\", suffixes=(\"_bloom\", \"_quotient\"))\n",
+ "combined_df.drop([\"numQueryPoints_bloom\", \"numQueryPoints_quotient\",
\"numTrials_bloom\", \"numTrials_quotient\"], axis=1, inplace=True)\n",
+ "combined_df.tail(15)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "3fc0d0fc",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>numHashes</th>\n",
+ " <th>bitsPerEntry</th>\n",
+ " <th>FPR</th>\n",
+ " <th>filterSizeBits</th>\n",
+ " <th>numQueryPoints</th>\n",
+ " <th>numTrials</th>\n",
+ " <th>fingerprintLength</th>\n",
+ " <th>theoreticalFPR</th>\n",
+ " <th>scaledFPR</th>\n",
+ " <th>exactCalcFPR</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>4</td>\n",
+ " <td>4</td>\n",
+ " <td>3.590670e-01</td>\n",
+ " <td>4194304</td>\n",
+ " <td>32</td>\n",
+ " <td>608</td>\n",
+ " <td>1</td>\n",
+ " <td>5.000000e-01</td>\n",
+ " <td>4.500000e-01</td>\n",
+ " <td>3.623718e-01</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>5</td>\n",
+ " <td>5</td>\n",
+ " <td>2.019490e-01</td>\n",
+ " <td>5242880</td>\n",
+ " <td>64</td>\n",
+ " <td>412</td>\n",
+ " <td>2</td>\n",
+ " <td>2.500000e-01</td>\n",
+ " <td>2.250000e-01</td>\n",
+ " <td>2.014837e-01</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>6</td>\n",
+ " <td>6</td>\n",
+ " <td>1.035220e-01</td>\n",
+ " <td>6291456</td>\n",
+ " <td>128</td>\n",
+ " <td>299</td>\n",
+ " <td>3</td>\n",
+ " <td>1.250000e-01</td>\n",
+ " <td>1.125000e-01</td>\n",
+ " <td>1.064026e-01</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>7</td>\n",
+ " <td>7</td>\n",
+ " <td>5.340250e-02</td>\n",
+ " <td>7340032</td>\n",
+ " <td>256</td>\n",
+ " <td>228</td>\n",
+ " <td>4</td>\n",
+ " <td>6.250000e-02</td>\n",
+ " <td>5.625000e-02</td>\n",
+ " <td>5.469720e-02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>8</td>\n",
+ " <td>8</td>\n",
+ " <td>2.748400e-02</td>\n",
+ " <td>8388608</td>\n",
+ " <td>512</td>\n",
+ " <td>181</td>\n",
+ " <td>5</td>\n",
+ " <td>3.125000e-02</td>\n",
+ " <td>2.812500e-02</td>\n",
+ " <td>2.773316e-02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>5</th>\n",
+ " <td>9</td>\n",
+ " <td>9</td>\n",
+ " <td>1.373830e-02</td>\n",
+ " <td>9437184</td>\n",
+ " <td>1024</td>\n",
+ " <td>147</td>\n",
+ " <td>6</td>\n",
+ " <td>1.562500e-02</td>\n",
+ " <td>1.406250e-02</td>\n",
+ " <td>1.396408e-02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>6</th>\n",
+ " <td>10</td>\n",
+ " <td>10</td>\n",
+ " <td>7.020040e-03</td>\n",
+ " <td>10485760</td>\n",
+ " <td>2048</td>\n",
+ " <td>122</td>\n",
+ " <td>7</td>\n",
+ " <td>7.812500e-03</td>\n",
+ " <td>7.031250e-03</td>\n",
+ " <td>7.006586e-03</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>7</th>\n",
+ " <td>11</td>\n",
+ " <td>11</td>\n",
+ " <td>3.261530e-03</td>\n",
+ " <td>11534336</td>\n",
+ " <td>4096</td>\n",
+ " <td>103</td>\n",
+ " <td>8</td>\n",
+ " <td>3.906250e-03</td>\n",
+ " <td>3.515625e-03</td>\n",
+ " <td>3.509451e-03</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>8</th>\n",
+ " <td>12</td>\n",
+ " <td>12</td>\n",
+ " <td>1.752870e-03</td>\n",
+ " <td>12582912</td>\n",
+ " <td>8192</td>\n",
+ " <td>89</td>\n",
+ " <td>9</td>\n",
+ " <td>1.953125e-03</td>\n",
+ " <td>1.757813e-03</td>\n",
+ " <td>1.756268e-03</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>9</th>\n",
+ " <td>13</td>\n",
+ " <td>13</td>\n",
+ " <td>8.584560e-04</td>\n",
+ " <td>13631488</td>\n",
+ " <td>16384</td>\n",
+ " <td>77</td>\n",
+ " <td>10</td>\n",
+ " <td>9.765625e-04</td>\n",
+ " <td>8.789063e-04</td>\n",
+ " <td>8.785198e-04</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>10</th>\n",
+ " <td>14</td>\n",
+ " <td>14</td>\n",
+ " <td>4.600410e-04</td>\n",
+ " <td>14680064</td>\n",
+ " <td>32768</td>\n",
+ " <td>67</td>\n",
+ " <td>11</td>\n",
+ " <td>4.882812e-04</td>\n",
+ " <td>4.394531e-04</td>\n",
+ " <td>4.393564e-04</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>11</th>\n",
+ " <td>15</td>\n",
+ " <td>15</td>\n",
+ " <td>2.182010e-04</td>\n",
+ " <td>15728640</td>\n",
+ " <td>65536</td>\n",
+ " <td>60</td>\n",
+ " <td>12</td>\n",
+ " <td>2.441406e-04</td>\n",
+ " <td>2.197266e-04</td>\n",
+ " <td>2.197023e-04</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>12</th>\n",
+ " <td>16</td>\n",
+ " <td>16</td>\n",
+ " <td>1.101220e-04</td>\n",
+ " <td>16777216</td>\n",
+ " <td>131072</td>\n",
+ " <td>53</td>\n",
+ " <td>13</td>\n",
+ " <td>1.220703e-04</td>\n",
+ " <td>1.098633e-04</td>\n",
+ " <td>1.098572e-04</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>13</th>\n",
+ " <td>17</td>\n",
+ " <td>17</td>\n",
+ " <td>5.435940e-05</td>\n",
+ " <td>17825792</td>\n",
+ " <td>262144</td>\n",
+ " <td>48</td>\n",
+ " <td>14</td>\n",
+ " <td>6.103516e-05</td>\n",
+ " <td>5.493164e-05</td>\n",
+ " <td>5.493011e-05</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>14</th>\n",
+ " <td>18</td>\n",
+ " <td>18</td>\n",
+ " <td>2.869890e-05</td>\n",
+ " <td>18874368</td>\n",
+ " <td>524288</td>\n",
+ " <td>43</td>\n",
+ " <td>15</td>\n",
+ " <td>3.051758e-05</td>\n",
+ " <td>2.746582e-05</td>\n",
+ " <td>2.746543e-05</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>15</th>\n",
+ " <td>19</td>\n",
+ " <td>19</td>\n",
+ " <td>1.393830e-05</td>\n",
+ " <td>19922944</td>\n",
+ " <td>1048576</td>\n",
+ " <td>39</td>\n",
+ " <td>16</td>\n",
+ " <td>1.525879e-05</td>\n",
+ " <td>1.373291e-05</td>\n",
+ " <td>1.373281e-05</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>16</th>\n",
+ " <td>20</td>\n",
+ " <td>20</td>\n",
+ " <td>6.834670e-06</td>\n",
+ " <td>20971520</td>\n",
+ " <td>2097152</td>\n",
+ " <td>36</td>\n",
+ " <td>17</td>\n",
+ " <td>7.629395e-06</td>\n",
+ " <td>6.866455e-06</td>\n",
+ " <td>6.866429e-06</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>17</th>\n",
+ " <td>21</td>\n",
+ " <td>21</td>\n",
+ " <td>3.294510e-06</td>\n",
+ " <td>22020096</td>\n",
+ " <td>4194304</td>\n",
+ " <td>33</td>\n",
+ " <td>18</td>\n",
+ " <td>3.814697e-06</td>\n",
+ " <td>3.433228e-06</td>\n",
+ " <td>3.433220e-06</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>18</th>\n",
+ " <td>22</td>\n",
+ " <td>22</td>\n",
+ " <td>1.688800e-06</td>\n",
+ " <td>23068672</td>\n",
+ " <td>8388608</td>\n",
+ " <td>30</td>\n",
+ " <td>19</td>\n",
+ " <td>1.907349e-06</td>\n",
+ " <td>1.716614e-06</td>\n",
+ " <td>1.716612e-06</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>19</th>\n",
+ " <td>23</td>\n",
+ " <td>23</td>\n",
+ " <td>8.621390e-07</td>\n",
+ " <td>24117248</td>\n",
+ " <td>16777216</td>\n",
+ " <td>28</td>\n",
+ " <td>20</td>\n",
+ " <td>9.536743e-07</td>\n",
+ " <td>8.583069e-07</td>\n",
+ " <td>8.583062e-07</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>20</th>\n",
+ " <td>24</td>\n",
+ " <td>24</td>\n",
+ " <td>4.779830e-07</td>\n",
+ " <td>25165824</td>\n",
+ " <td>33554432</td>\n",
+ " <td>26</td>\n",
+ " <td>21</td>\n",
+ " <td>4.768372e-07</td>\n",
+ " <td>4.291534e-07</td>\n",
+ " <td>4.291532e-07</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " numHashes bitsPerEntry FPR filterSizeBits
numQueryPoints \\\n",
+ "0 4 4 3.590670e-01 4194304
32 \n",
+ "1 5 5 2.019490e-01 5242880
64 \n",
+ "2 6 6 1.035220e-01 6291456
128 \n",
+ "3 7 7 5.340250e-02 7340032
256 \n",
+ "4 8 8 2.748400e-02 8388608
512 \n",
+ "5 9 9 1.373830e-02 9437184
1024 \n",
+ "6 10 10 7.020040e-03 10485760
2048 \n",
+ "7 11 11 3.261530e-03 11534336
4096 \n",
+ "8 12 12 1.752870e-03 12582912
8192 \n",
+ "9 13 13 8.584560e-04 13631488
16384 \n",
+ "10 14 14 4.600410e-04 14680064
32768 \n",
+ "11 15 15 2.182010e-04 15728640
65536 \n",
+ "12 16 16 1.101220e-04 16777216
131072 \n",
+ "13 17 17 5.435940e-05 17825792
262144 \n",
+ "14 18 18 2.869890e-05 18874368
524288 \n",
+ "15 19 19 1.393830e-05 19922944
1048576 \n",
+ "16 20 20 6.834670e-06 20971520
2097152 \n",
+ "17 21 21 3.294510e-06 22020096
4194304 \n",
+ "18 22 22 1.688800e-06 23068672
8388608 \n",
+ "19 23 23 8.621390e-07 24117248
16777216 \n",
+ "20 24 24 4.779830e-07 25165824
33554432 \n",
+ "\n",
+ " numTrials fingerprintLength theoreticalFPR scaledFPR
exactCalcFPR \n",
+ "0 608 1 5.000000e-01 4.500000e-01
3.623718e-01 \n",
+ "1 412 2 2.500000e-01 2.250000e-01
2.014837e-01 \n",
+ "2 299 3 1.250000e-01 1.125000e-01
1.064026e-01 \n",
+ "3 228 4 6.250000e-02 5.625000e-02
5.469720e-02 \n",
+ "4 181 5 3.125000e-02 2.812500e-02
2.773316e-02 \n",
+ "5 147 6 1.562500e-02 1.406250e-02
1.396408e-02 \n",
+ "6 122 7 7.812500e-03 7.031250e-03
7.006586e-03 \n",
+ "7 103 8 3.906250e-03 3.515625e-03
3.509451e-03 \n",
+ "8 89 9 1.953125e-03 1.757813e-03
1.756268e-03 \n",
+ "9 77 10 9.765625e-04 8.789063e-04
8.785198e-04 \n",
+ "10 67 11 4.882812e-04 4.394531e-04
4.393564e-04 \n",
+ "11 60 12 2.441406e-04 2.197266e-04
2.197023e-04 \n",
+ "12 53 13 1.220703e-04 1.098633e-04
1.098572e-04 \n",
+ "13 48 14 6.103516e-05 5.493164e-05
5.493011e-05 \n",
+ "14 43 15 3.051758e-05 2.746582e-05
2.746543e-05 \n",
+ "15 39 16 1.525879e-05 1.373291e-05
1.373281e-05 \n",
+ "16 36 17 7.629395e-06 6.866455e-06
6.866429e-06 \n",
+ "17 33 18 3.814697e-06 3.433228e-06
3.433220e-06 \n",
+ "18 30 19 1.907349e-06 1.716614e-06
1.716612e-06 \n",
+ "19 28 20 9.536743e-07 8.583069e-07
8.583062e-07 \n",
+ "20 26 21 4.768372e-07 4.291534e-07
4.291532e-07 "
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "quotient_results_df[\"fingerprintLength\"] =
quotient_results_df[\"numHashes\"] - 3\n",
+ "quotient_results_df[\"theoreticalFPR\"] =
2.**(-quotient_results_df[\"fingerprintLength\"])\n",
+ "quotient_results_df[\"scaledFPR\"] =
0.9*2.**(-quotient_results_df[\"fingerprintLength\"])\n",
+ "quotient_results_df[\"exactCalcFPR\"] = pd.Series([probability_sum(m,
q_n, f) for f in quotient_results_df[\"fingerprintLength\"]])\n",
+ "\n",
+ "quotient_results_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "8d180df5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for q in [q90, q75]:\n",
+ " q_n = int(q*m)\n",
+ " quotient_results_df_75[\"fingerprintLength\"] =
quotient_results_df_75[\"numHashes\"] - 3\n",
+ " quotient_results_df_75[\"theoreticalFPR\"] =
2.**(-quotient_results_df_75[\"fingerprintLength\"])\n",
+ " quotient_results_df_75[\"exactCalcFPR\"] =
pd.Series([probability_sum(m, q_n, f) for f in
quotient_results_df_75[\"fingerprintLength\"]])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "1d71441f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABAIAAAK1CAYAAABSP0xCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd1hT5/vH8XfC3qigOEBURNx7I4IDq7ZqndU6wNaFW9qvo3VWax217l1lWBx11q2oiKPi3lgngqhVHCAiEEh+f/AjSkFFRQl6v64rV+s5T875nAQj584zFBqNRoMQQgghhBBCCCE+CcrcDiCEEEIIIYQQQogPRwoBQgghhBBCCCHEJ0QKAUIIIYQQQgghxCdECgFCCCGEEEIIIcQnRAoBQgghhBBCCCHEJ0QKAUIIIYQQQgghxCdECgFCCCGEEEIIIcQnRAoBQgghhBBCCCHEJ0Q/twN8rNRqNbdv
[...]
+ "text/plain": [
+ "<Figure size 1200x800 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 8))\n",
+ "\n",
+ "ax.plot(bloom_results_df[\"numHashes\"], bloom_results_df[\"FPR\"],
label=\"Bloom\", color=\"royalblue\")\n",
+ "\n",
+ "ax.plot(quotient_results_df[\"fingerprintLength\"],
quotient_results_df[\"FPR\"], label=\"quotient:0.9\", color=\"black\")\n",
+ "ax.plot(quotient_results_df[\"fingerprintLength\"],
2.**(-quotient_results_df[\"fingerprintLength\"]), label=\"General bound:
$2^{-f}$\", color=\"deeppink\", linestyle=\":\")\n",
+ "\n",
+ "\n",
+ "ax.plot(quotient_results_df_75[\"fingerprintLength\"],
quotient_results_df_75[\"FPR\"], label=\"quotient:0.75\", color=\"grey\")\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=2)\n",
+ "ax.legend()\n",
+ "ax.set_xticks(quotient_results_df[\"fingerprintLength\"])\n",
+ "ax.set_yticks([2.**(-j) for j in range(1,25)])\n",
+ "ax.set_xlabel(\"Number of hash space bits $f$\")\n",
+ "ax.set_ylabel(\"False Positive Rate\")\n",
+ "fig.savefig(\"image/accuracy-vs-num-hashes.pdf\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f3a14a50",
+ "metadata": {},
+ "source": [
+ "Both error rates roughly follow the curve $2^{-f}$. This means that
increasing the number of hash bits space by one reduces the false positive rate
by a factor of two. The false positive rate of the Quotient filter is slightly
smaller when it is less-densely filled. \n",
+ "\n",
+ "We can also explore the space efficiency: namely the number of bits used
per element added to the filter. We see that for lower accuracy (high fpr)
regimes, the Bloom filter is more efficient, \n",
+ "while the converse is true for higher accuracy (low fpr) regimes."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 85,
+ "id": "62b845e0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 0, 'Number of hash space bits $f$')"
+ ]
+ },
+ "execution_count": 85,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA/EAAAK1CAYAAACAf25bAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAADFXklEQVR4nOzdd3hU1cLF4d+kkgQSepOOcEGKfCIg0ktClaKIFKkKgiDSe5JJ6F2aoiBVERAUAeldQOGigCCCdJBeE5JASDLz/XEkV6QYhiRnkqz3eXyc2XNmZmUMkpWzz94Wu91uR0REREREREScnovZAUREREREREQkYVTiRURERERERFIIlXgRERERERGRFEIlXkRERERERCSFUIkXERERERERSSFU4kVERERERERSCJV4ERERERERkRRCJV5EREREREQkhXAzO4CzsdlsXLhwgQwZMmCxWMyOIyIiIiIiIqmc
[...]
+ "text/plain": [
+ "<Figure size 1200x800 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 8))\n",
+ "\n",
+ "ax.plot(bloom_results_df[\"numHashes\"],
bloom_results_df[\"bitsPerEntry\"], label=\"Bloom\", color=\"blue\")\n",
+ "ax.plot(quotient_results_df[\"fingerprintLength\"],
quotient_results_df[\"bitsPerEntry\"], label=\"quotient\", color=\"black\")\n",
+ "ax.grid()\n",
+ "ax.legend()\n",
+ "ax.set_xticks(quotient_results_df[\"fingerprintLength\"])\n",
+ "ax.set_ylabel(\"Bits per entry\")\n",
+ "ax.set_xlabel(\"Number of hash space bits $f$\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 86,
+ "id": "91d13517",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0, 0.5, 'False Positive Rate')"
+ ]
+ },
+ "execution_count": 86,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABAIAAAKxCAYAAADJrg5UAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAAD8b0lEQVR4nOzdd1gU19vG8e/Si4I1liiCEjXF3rvYe+8VRVNMYo0xJjFqEmM0xhRNU0EQBXvv2Du2aEzsBTsqFkBF6r5/7E/eEBsgsKD357r2isycmXnOetg4z855jsFoNBoRERERERERkZeChbkDEBEREREREZGMo0SAiIiIiIiIyEtEiQARERERERGRl4gSASIiIiIiIiIvESUCRERERERERF4iSgSIiIiIiIiIvESUCBARERERERF5iSgRICIiIiIiIvISsTJ3AC+qhIQErly5Qvbs2TEYDOYOR0RERERERF5w
[...]
+ "text/plain": [
+ "<Figure size 1200x800 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 8))\n",
+ "\n",
+ "ax.plot(bloom_results_df[\"bitsPerEntry\"], bloom_results_df[\"FPR\"],
label=\"Bloom\", color=\"blue\")\n",
+ "\n",
+ "ax.plot(quotient_results_df[\"bitsPerEntry\"],
quotient_results_df[\"FPR\"], label=\"quotient:0.9\", color=\"black\")\n",
+ "ax.plot(quotient_results_df_75[\"bitsPerEntry\"],
quotient_results_df_75[\"FPR\"], label=\"quotient:0.75\", color=\"grey\")\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=2)\n",
+ "ax.legend()\n",
+ "ax.set_xlabel(\"bits per entry\")\n",
+ "ax.set_ylabel(\"False Positive Rate\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 99,
+ "id": "a0342a9b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA/EAAAI0CAYAAABVk6kmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAAD4wklEQVR4nOzdd3gU1dfA8e/spickIfQWSkKVphB674hSBARUuogK0qtSQhcIRREVBOlNsYBKkSJFlKLSRCmhCwJCICFAgOze94/9ZV9CypbsshtyPs+TR5g5986ZOTuYuzNzR1NKKYQQQgghhBBCCOH2dK5OQAghhBBCCCGEENaRQbwQQgghhBBCCJFJyCBeCCGEEEIIIYTIJGQQL4QQQgghhBBCZBIyiBdCCCGEEEIIITIJGcQLIYQQQgghhBCZhAzihRBCCCGEEEKITEIG8UIIIYQQQgghRCYhg3ghhBBCCCGE
[...]
+ "text/plain": [
+ "<Figure size 1200x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12,6))\n",
+ "\n",
+ "\n",
+ "NRML_CNST = 1E6 \n",
+ "ax.plot(bloom_results_df[\"FPR\"],
bloom_results_df[\"filterSizeBits\"]/NRML_CNST,\n",
+ " label=\"Bloom Filter\", color=\"red\", linestyle=\"-\")\n",
+ "ax.plot(quotient_results_df[\"FPR\"],
quotient_results_df[\"filterSizeBits\"]/NRML_CNST,\n",
+ " label=\"Quotient Filter\", color=\"blue\", linestyle=\"-\")\n",
+ "\n",
+ "\n",
+ "ax.legend()\n",
+ "#ax.set_yscale(\"log\", base=2)\n",
+ "ax.set_xscale(\"log\", base=10)\n",
+ "ax.set_ylabel(\"Sketch size (Mb)\")\n",
+ "ax.set_xlabel(\"False Positive Rate\")\n",
+ "\n",
+ "ax.set_ylim(3,30)\n",
+ "ax.set_title(\"Filter size (bits) vs False Positive Rate\")\n",
+ "ax.yaxis.set_minor_locator(AutoMinorLocator())\n",
+ "ax.grid(which=\"both\", alpha=0.5)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4536560c",
+ "metadata": {},
+ "source": [
+ "Consistent with the prior plots, we see that the there are two regions of
behaviour. \n",
+ "We control the false positive rate for both filters, using an
optimally-configured Bloom filter and a quotient filter with the number of
fingerprints that achieves the fixed FPR. For large false positive rates, a
Bloom filter is smaller than Quotient filter. However, if a smaller false
positive rate is desired, then a Quotient filter has a lower space footprint."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d418104f",
+ "metadata": {},
+ "source": [
+ "## References\n",
+ "<a id=\"1\">[1]</a> \n",
+ "Bender, Michael A., et al. \"Don't thrash: How to cache your hash on
flash.\" 3rd Workshop on Hot Topics in Storage and File Systems (HotStorage
11). 2011."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "density-venv",
+ "language": "python",
+ "name": "density_venv"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/notebooks/filterSizeProfile.ipynb
b/src/notebooks/filterSizeProfile.ipynb
new file mode 100644
index 0000000..1b45b98
--- /dev/null
+++ b/src/notebooks/filterSizeProfile.ipynb
@@ -0,0 +1,1119 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Size vs Cardinality for Bloom and Quotient Filters\n",
+ "\n",
+ "We investigate the size of the filters as the cardinality of the input
set increases. \n",
+ "Linear space is required to solve the approximate set membership problem,
so we expect the size of the filters to *increase* as the input cardinality
increases for a fixed \n",
+ "false positive rate (fpr). However, we are interested in the relative
performance of the Bloom and Quotient filters at a fixed false positive rate.
This enables us to contrast the performance of each filter.\n",
+ "\n",
+ "## Experiment 1. False Positive Rate $10^{-6}$\n",
+ "\n",
+ "Fix a target false positive rate and then configure the filters
appropriately for increasing input size $n$.\n",
+ "The Quotient filter has its number of slots rounded to the next power of
two larger than the current $n$ and the Bloom filter is set optimally for any
given $n$.\n",
+ "As a result, the Bloom filter results are as optimistic as can be."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "from matplotlib.ticker import AutoMinorLocator\n",
+ "from typing import Tuple\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def parse_size_results(filename:str) -> pd.DataFrame:\n",
+ " with open(filename) as f:\n",
+ " lines = f.readlines()[2:] # first two lines not needed\n",
+ " ignore_from_idx = lines.index(\"PROPERTIES:\\n\")\n",
+ " results = lines[:ignore_from_idx-1] # there is an empty line
preceding the ignore_from_idx\n",
+ " results_clean = [r.strip(\"\\n\") for r in results]\n",
+ " results_df = pd.DataFrame(sub.split(\"\\t\") for sub in
results_clean[1:])\n",
+ " results_df.columns = results_clean[0].split(\"\\t\")\n",
+ " results_df.rename(columns={\"TrueU\" : \"inputCardinality\"},
inplace=True)\n",
+ " for col in results_df.columns:\n",
+ " results_df[col] = pd.to_numeric(results_df[col])\n",
+ " return results_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "bf_sizes =
parse_size_results(\"1e-6/BloomFilterSizeProfile20240703_014140PST.txt\")\n",
+ "qf_sizes =
parse_size_results(\"1e-6/QuotientFilterSizeProfile20240703_014255PST_1E-6.txt\")
# 1e-6\n",
+ "\n",
+ "for df in [bf_sizes, qf_sizes]:\n",
+ " df[\"bitsPerElement\"] = df[\"Size\"] / df[\"inputCardinality\"]\n",
+ "\n",
+ "for qf in [qf_sizes]:\n",
+ " qf[\"numSlots\"] = np.ceil(np.log2(qf[\"inputCardinality\"] / 0.9
)).astype(int) # nb this is because the suggestion function always uses 0.9 and
may change in futuer.\n",
+ " qf[\"loadFactor\"] = qf[\"inputCardinality\"] / (2**qf[\"numSlots\"])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>inputCardinality</th>\n",
+ " <th>Size</th>\n",
+ " <th>NumTrials</th>\n",
+ " <th>FalsePositiveRate</th>\n",
+ " <th>NumHashBits</th>\n",
+ " <th>bitsPerElement</th>\n",
+ " <th>numSlots</th>\n",
+ " <th>loadFactor</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>3</td>\n",
+ " <td>176</td>\n",
+ " <td>16384</td>\n",
+ " <td>9.536743e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>58.666667</td>\n",
+ " <td>2</td>\n",
+ " <td>0.750000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>4</td>\n",
+ " <td>184</td>\n",
+ " <td>10922</td>\n",
+ " <td>2.384186e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>46.000000</td>\n",
+ " <td>3</td>\n",
+ " <td>0.500000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>5</td>\n",
+ " <td>184</td>\n",
+ " <td>8192</td>\n",
+ " <td>4.768372e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>36.800000</td>\n",
+ " <td>3</td>\n",
+ " <td>0.625000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>6</td>\n",
+ " <td>184</td>\n",
+ " <td>6553</td>\n",
+ " <td>4.768372e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>30.666667</td>\n",
+ " <td>3</td>\n",
+ " <td>0.750000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>7</td>\n",
+ " <td>352</td>\n",
+ " <td>5461</td>\n",
+ " <td>2.384186e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>50.285714</td>\n",
+ " <td>3</td>\n",
+ " <td>0.875000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>...</th>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>169</th>\n",
+ " <td>794672</td>\n",
+ " <td>24117248</td>\n",
+ " <td>1024</td>\n",
+ " <td>1.192093e-06</td>\n",
+ " <td>20</td>\n",
+ " <td>30.348682</td>\n",
+ " <td>20</td>\n",
+ " <td>0.757858</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>170</th>\n",
+ " <td>851708</td>\n",
+ " <td>24117248</td>\n",
+ " <td>1024</td>\n",
+ " <td>1.192093e-06</td>\n",
+ " <td>20</td>\n",
+ " <td>28.316334</td>\n",
+ " <td>20</td>\n",
+ " <td>0.812252</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>171</th>\n",
+ " <td>912838</td>\n",
+ " <td>24117248</td>\n",
+ " <td>1024</td>\n",
+ " <td>2.384186e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>26.420075</td>\n",
+ " <td>20</td>\n",
+ " <td>0.870550</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>172</th>\n",
+ " <td>978356</td>\n",
+ " <td>48234496</td>\n",
+ " <td>1024</td>\n",
+ " <td>7.152557e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>49.301579</td>\n",
+ " <td>21</td>\n",
+ " <td>0.466516</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>173</th>\n",
+ " <td>1048576</td>\n",
+ " <td>48234496</td>\n",
+ " <td>1024</td>\n",
+ " <td>2.384186e-07</td>\n",
+ " <td>20</td>\n",
+ " <td>46.000000</td>\n",
+ " <td>21</td>\n",
+ " <td>0.500000</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "<p>174 rows × 8 columns</p>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " inputCardinality Size NumTrials FalsePositiveRate
NumHashBits \\\n",
+ "0 3 176 16384 9.536743e-07
20 \n",
+ "1 4 184 10922 2.384186e-07
20 \n",
+ "2 5 184 8192 4.768372e-07
20 \n",
+ "3 6 184 6553 4.768372e-07
20 \n",
+ "4 7 352 5461 2.384186e-07
20 \n",
+ ".. ... ... ... ...
... \n",
+ "169 794672 24117248 1024 1.192093e-06
20 \n",
+ "170 851708 24117248 1024 1.192093e-06
20 \n",
+ "171 912838 24117248 1024 2.384186e-07
20 \n",
+ "172 978356 48234496 1024 7.152557e-07
20 \n",
+ "173 1048576 48234496 1024 2.384186e-07
20 \n",
+ "\n",
+ " bitsPerElement numSlots loadFactor \n",
+ "0 58.666667 2 0.750000 \n",
+ "1 46.000000 3 0.500000 \n",
+ "2 36.800000 3 0.625000 \n",
+ "3 30.666667 3 0.750000 \n",
+ "4 50.285714 3 0.875000 \n",
+ ".. ... ... ... \n",
+ "169 30.348682 20 0.757858 \n",
+ "170 28.316334 20 0.812252 \n",
+ "171 26.420075 20 0.870550 \n",
+ "172 49.301579 21 0.466516 \n",
+ "173 46.000000 21 0.500000 \n",
+ "\n",
+ "[174 rows x 8 columns]"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "qf_sizes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# merged_df = pd.merge(bf_sizes, bf_sizes_less_one,
on='inputCardinality')\n",
+ "# merged_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Size (bits) vs Input Cardinality')"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA+wAAAIoCAYAAADz12GeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydeVhc5d3+75mBYWCGZdjCMuxLIAshQIhGMaTRRq1aG9NaazXRvtbWahOtttX21aaLdrVJLS61dWnfusQk2v5aq7UxMWkWEkgIEEiABBJ2CAwDM8PALOf3B5kjw2xn5hwOM+T7uS4unfPczzLP+XLnOZxnkTAMw4AgCIIgCIIgCIIgiIBCOtcNIAiCIAiCIAiCIAjCGXpgJwiCIAiCIAiCIIgAhB7YCYIgCIIgCIIgCCIAoQd2giAIgiAIgiAIgghA6IGdIAiCIAiCIAiCIAIQemAnCIIgCIIgCIIg
[...]
+ "text/plain": [
+ "<Figure size 1200x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 6))\n",
+ "SCALE = 1\n",
+ "ax.plot(bf_sizes[\"inputCardinality\"], bf_sizes[\"Size\"]/SCALE,
color=\"blue\", marker=\"o\", label=r\"Bloom: $fpp = 10^{-6}$\")\n",
+ "\n",
+ "# Quotient \n",
+ "ax.plot(qf_sizes[\"inputCardinality\"], qf_sizes[\"Size\"]/SCALE,
color=\"black\", marker=\"o\", label=r\"Quotient: $fpp = 10^{-6}$\")\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=10)\n",
+ "ax.set_xscale(\"log\", base=10)\n",
+ "ax.set_ylabel(\"Size (bits)\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "ax.set_title(\"Size (bits) vs Input Cardinality\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Size (bits) vs Input Cardinality')"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA+wAAAIoCAYAAADz12GeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydeVxc1d3/3zPDMjBsQ4AQQhIIhJCdAEnc4x7jVq1W2/pYo9ZqrfvWulRrtW5Vq7Wp2vrUpfWpW9X211qtVWs0JiQhIZAVSCCBEJbAMDAMAzPM/f2Bc8syy52Zy4Uh5/168dKZ8znne+bcL5+cy9xzjk6SJAmBQCAQCAQCgUAgEAgEEwr9eHdAIBAIBAKBQCAQCAQCwWjEDbtAIBAIBAKBQCAQCAQTEHHDLhAIBAKBQCAQCAQCwQRE3LALBAKBQCAQCAQCgUAwARE37AKBQCAQCAQCgUAgEExAxA27
[...]
+ "text/plain": [
+ "<Figure size 1200x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 6))\n",
+ "SCALE = 1\n",
+ "ax.plot(bf_sizes[\"inputCardinality\"], bf_sizes[\"Size\"]/SCALE,
color=\"blue\", label=r\"Bloom: $fpp = 10^{-6}$\")\n",
+ "ax.plot(qf_sizes[\"inputCardinality\"], qf_sizes[\"Size\"]/SCALE,
color=\"black\", label=r\"Quotient: $fpp = 10^{-6}$\", linestyle=\"--\")\n",
+ "\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=10)\n",
+ "ax.set_xscale(\"log\", base=10)\n",
+ "ax.set_ylabel(\"Size (bits)\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "ax.set_title(\"Size (bits) vs Input Cardinality\")\n",
+ "#fig.savefig(\"image/space-vs-cardinality.pdf\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Looking closely, we can see that there is a small regime where the
Quotient filter space in bits seems to drop beneath that for the Bloom filter.
Let's zoom in on that region in the following plots.\n",
+ "First, we can see that as $n$ increases, the load factor is not constant.
The load factor is the ratio of the number of elements in the filter to the
filter's length. This behaviour is to be expected since the Quotient filter
length must always be a power of two, so once a certain (parameterised) \n",
+ "threshold is met, then the filter length must be doubled. In this
example, we see this behaviour in the vertical drops between about $90\\%$ to
$\\approx 45\\%$ load factor."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[<matplotlib.axis.XTick at 0x125f1fd10>,\n",
+ " <matplotlib.axis.XTick at 0x125f5d450>,\n",
+ " <matplotlib.axis.XTick at 0x125f1f9d0>,\n",
+ " <matplotlib.axis.XTick at 0x125fe2390>,\n",
+ " <matplotlib.axis.XTick at 0x125fe8750>,\n",
+ " <matplotlib.axis.XTick at 0x125feabd0>,\n",
+ " <matplotlib.axis.XTick at 0x125ff0ed0>,\n",
+ " <matplotlib.axis.XTick at 0x125ff30d0>,\n",
+ " <matplotlib.axis.XTick at 0x125ff00d0>,\n",
+ " <matplotlib.axis.XTick at 0x125ff5e50>,\n",
+ " <matplotlib.axis.XTick at 0x125ffc210>,\n",
+ " <matplotlib.axis.XTick at 0x125ffe410>,\n",
+ " <matplotlib.axis.XTick at 0x126000650>,\n",
+ " <matplotlib.axis.XTick at 0x125ff4750>,\n",
+ " <matplotlib.axis.XTick at 0x126003050>,\n",
+ " <matplotlib.axis.XTick at 0x1260092d0>,\n",
+ " <matplotlib.axis.XTick at 0x12600b490>,\n",
+ " <matplotlib.axis.XTick at 0x1260116d0>,\n",
+ " <matplotlib.axis.XTick at 0x126001a10>,\n",
+ " <matplotlib.axis.XTick at 0x126018290>,\n",
+ " <matplotlib.axis.XTick at 0x12601a4d0>]"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABboAAAIRCAYAAACMHQu3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOy9eXhdV3nv/zk6tuTIlmzLtmTLR5ad2JnIAKQkECA0JCGFkECgaRhKSggkTIUfuXApNGGm4UKhuYEMDggopVxyS7nctqQtNGUoFJpLRiAhceJRlmzLgyzJ1njO+f2xs/basiVbRzpn77X3+n6ehwdtWTp765v33Wutd73rfXPlcrmMEEIIIYQQQgghhBBCCJFS6pJ+ACGEEEIIIYQQQgghhBBiLijQLYQQQgghhBBCCCGEECLVKNAthBBCCCGEEEIIIYQQItUo0C2EEEIIIYQQQgghhBAi1SjQLYQQ
[...]
+ "text/plain": [
+ "<Figure size 1800x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(18, 6))\n",
+ "ax.plot(qf_sizes[\"inputCardinality\"], qf_sizes[\"loadFactor\"],
color=\"black\", marker=\".\", label=r\"Size ratio\")\n",
+ "\n",
+ "ax.set_ylabel(\"Load Factor\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.set_xscale(\"log\", base=2)\n",
+ "ax.grid()\n",
+ "ax.set_xticks([2**i for i in range(0, 21)])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABboAAAIRCAYAAACMHQu3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydeZgcVbn/P1U9Mz1rz2Sf7CsJIUAWCBGC7BKQiCCKC27wwxXUiMvVq7LI9XJFveJ2Xe81iih65YJKWAQURBDIQlgkBAKBBLKvMz1LT3dV/f7orpruru6equrqpU6dz/P0k+6equ7qb973nFPvec97FMMwDCQSiUQikUgkEolEIpFIJBKJRCIJKGqtL0AikUgkEolEIpFIJBKJRCKRSCSScpCBbolEIpFIJBKJRCKRSCQSiUQikQQaGeiWSCQSiUQikUgkEolEIpFIJBJJoJGBbolEIpFIJBKJRCKR
[...]
+ "text/plain": [
+ "<Figure size 1800x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(18, 6))\n",
+ "ax.plot(bf_sizes[\"inputCardinality\"],
bf_sizes[\"Size\"]/qf_sizes[\"Size\"], color=\"grey\", marker=\".\",
label=r\"Size ratio\")\n",
+ "ax.axhline(y=1, color='red', linestyle='--', linewidth=2, label='Bloom
filter smaller below this line')\n",
+ "\n",
+ "\n",
+ "\n",
+ "# Plot vertical lines at 0.9*2^j for each j where we double to length of
the quotient filter\n",
+ "j_values = range(2, 21)\n",
+ "jval_col = \"magenta\"\n",
+ "ival_col = \"skyblue\"\n",
+ "for i, j in enumerate(j_values):\n",
+ " x = 0.9 * 2**j\n",
+ " if i == len(j_values) - 1:\n",
+ " ax.axvline(x=x, color=jval_col, linestyle=':', alpha=0.5,
label=r\"$0.9 \\cdot 2^j$\")\n",
+ " else:\n",
+ " ax.axvline(x=x, color=jval_col, linestyle=':', alpha=0.5)\n",
+ "\n",
+ "\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_xscale(\"log\", base=2)\n",
+ "ax.set_ylabel(\"Size Ratio\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "#fig.savefig(\"image/bloom-quotient-ratio.pdf\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We see that _for a fixed false positive rate_, the Bloom filter is
smaller than the Quotient filter in most regimes, unless the Quotient filter
close to, but not exceeding $90\\%$.\n",
+ "\n",
+ "### Experiment 1a.\n",
+ "\n",
+ "We can also do a rough analysis of the false positive rate over input
cardinalities (note that a more refined analysis will follow).\n",
+ "In this plot we see that the Bloom filter probability approximations do
not work well in the small cardinality regime and then approach the $2^{-f}$
bound for large cardinalities. On the other hand, the Quotient filter error
behaviour consistently remains below the $2^{-f}$ probability error bound.
There are a small number of exceptions, but these are deferred for later
analysis."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Measured False Positive Rate vs. Input Cardinality
(zeros occur at non-uniform points)')"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA/QAAAIoCAYAAADHrqCCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydd3hUVf6435lJmcykd0ICgYQSQgkkhFVRQJoo2MVusC12kcVd9auyloVdbFgQ2yquq7uu3bVjRZEWIPQaQgglJCHJJJOemfP7g9/czaSQAW6YOfG8z5MHcufMve/9zLkn98w953MMQgiBQqFQKBQKhUKhUCgUCqkweltAoVAoFAqFQqFQKBQKxfGjOvQKhUKhUCgUCoVCoVBIiOrQKxQKhUKhUCgUCoVCISGqQ69QKBQKhUKhUCgUCoWEqA69QqFQKBQKhUKhUCgUEqI69AqFQqFQKBQKhUKhUEiI
[...]
+ "text/plain": [
+ "<Figure size 1200x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 6))\n",
+ "ax.plot(bf_sizes[\"inputCardinality\"], bf_sizes[\"FalsePositiveRate\"],
color=\"blue\", marker=\"o\", label=r\"Bloom target $fpp = 10^{-6}$\")\n",
+ "\n",
+ "# Quotient \n",
+ "ax.plot(qf_sizes[\"inputCardinality\"], qf_sizes[\"FalsePositiveRate\"],
color=\"black\", label=r\"Quotient target $fpp = 10^{-6}$\")\n",
+ "ax.plot(qf_sizes[\"inputCardinality\"][::3],
qf_sizes[\"FalsePositiveRate\"][::3], color=\"black\", marker=\"o\",
linestyle=\"None\") # just for markers ::3 aribitrarily chosen.\n",
+ "\n",
+ "xvals = np.array([2**j for j in range(21)])\n",
+ "ax.plot(xvals, (1e-6)*np.ones_like(xvals), label=\"Target FPP =
$10^{-6}$\", color=\"red\", linestyle=\"--\")\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=10)\n",
+ "ax.set_ylim(1e-7, 1)\n",
+ "ax.set_xscale(\"log\", base=2)\n",
+ "ax.set_xticks([2**i for i in range(0, 21)])\n",
+ "ax.set_ylabel(\"FPR (measured)\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "ax.set_title(\"Measured False Positive Rate vs. Input Cardinality (zeros
occur at non-uniform points)\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Experiment 2. Changing the false positive rate\n",
+ "\n",
+ "We repeat the above with a larger false positive rate. The main \n",
+ "takeaway is that the Quotient filter is less attractive in this setting,
consistent with prior plots."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "bf_sizes_1e3 =
parse_size_results(\"1e-3/BloomFilterSizeProfile20240703_045451PST.txt\")\n",
+ "qf_sizes_1e3 =
parse_size_results(\"1e-3/QuotientFilterSizeProfile20240703_045733PST.txt\")\n",
+ "\n",
+ "for df in [bf_sizes_1e3, qf_sizes_1e3]:\n",
+ " df[\"bitsPerElement\"] = df[\"Size\"] / df[\"inputCardinality\"]\n",
+ "\n",
+ "for qf in [qf_sizes_1e3]:\n",
+ " qf[\"numSlots\"] = np.ceil(np.log2(qf[\"inputCardinality\"] / 0.9
)).astype(int) # nb this is because the suggestion function always uses 0.9 and
may change in futuer.\n",
+ " qf[\"loadFactor\"] = qf[\"inputCardinality\"] / (2**qf[\"numSlots\"])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>inputCardinality</th>\n",
+ " <th>Size</th>\n",
+ " <th>NumTrials</th>\n",
+ " <th>FalsePositiveRate</th>\n",
+ " <th>NumHashBits</th>\n",
+ " <th>bitsPerElement</th>\n",
+ " <th>numSlots</th>\n",
+ " <th>loadFactor</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>3</td>\n",
+ " <td>96</td>\n",
+ " <td>16384</td>\n",
+ " <td>0.000244</td>\n",
+ " <td>10</td>\n",
+ " <td>32.000000</td>\n",
+ " <td>2</td>\n",
+ " <td>0.750000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>4</td>\n",
+ " <td>104</td>\n",
+ " <td>10922</td>\n",
+ " <td>0.000732</td>\n",
+ " <td>10</td>\n",
+ " <td>26.000000</td>\n",
+ " <td>3</td>\n",
+ " <td>0.500000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>5</td>\n",
+ " <td>104</td>\n",
+ " <td>8192</td>\n",
+ " <td>0.000488</td>\n",
+ " <td>10</td>\n",
+ " <td>20.800000</td>\n",
+ " <td>3</td>\n",
+ " <td>0.625000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>6</td>\n",
+ " <td>104</td>\n",
+ " <td>6553</td>\n",
+ " <td>0.000244</td>\n",
+ " <td>10</td>\n",
+ " <td>17.333333</td>\n",
+ " <td>3</td>\n",
+ " <td>0.750000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>7</td>\n",
+ " <td>192</td>\n",
+ " <td>5461</td>\n",
+ " <td>0.000732</td>\n",
+ " <td>10</td>\n",
+ " <td>27.428571</td>\n",
+ " <td>3</td>\n",
+ " <td>0.875000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>...</th>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>169</th>\n",
+ " <td>794672</td>\n",
+ " <td>13631488</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000000</td>\n",
+ " <td>10</td>\n",
+ " <td>17.153603</td>\n",
+ " <td>20</td>\n",
+ " <td>0.757858</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>170</th>\n",
+ " <td>851708</td>\n",
+ " <td>13631488</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000244</td>\n",
+ " <td>10</td>\n",
+ " <td>16.004884</td>\n",
+ " <td>20</td>\n",
+ " <td>0.812252</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>171</th>\n",
+ " <td>912838</td>\n",
+ " <td>13631488</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000977</td>\n",
+ " <td>10</td>\n",
+ " <td>14.933086</td>\n",
+ " <td>20</td>\n",
+ " <td>0.870550</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>172</th>\n",
+ " <td>978356</td>\n",
+ " <td>27262976</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000244</td>\n",
+ " <td>10</td>\n",
+ " <td>27.866110</td>\n",
+ " <td>21</td>\n",
+ " <td>0.466516</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>173</th>\n",
+ " <td>1048576</td>\n",
+ " <td>27262976</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000000</td>\n",
+ " <td>10</td>\n",
+ " <td>26.000000</td>\n",
+ " <td>21</td>\n",
+ " <td>0.500000</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "<p>174 rows × 8 columns</p>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " inputCardinality Size NumTrials FalsePositiveRate
NumHashBits \\\n",
+ "0 3 96 16384 0.000244
10 \n",
+ "1 4 104 10922 0.000732
10 \n",
+ "2 5 104 8192 0.000488
10 \n",
+ "3 6 104 6553 0.000244
10 \n",
+ "4 7 192 5461 0.000732
10 \n",
+ ".. ... ... ... ...
... \n",
+ "169 794672 13631488 1024 0.000000
10 \n",
+ "170 851708 13631488 1024 0.000244
10 \n",
+ "171 912838 13631488 1024 0.000977
10 \n",
+ "172 978356 27262976 1024 0.000244
10 \n",
+ "173 1048576 27262976 1024 0.000000
10 \n",
+ "\n",
+ " bitsPerElement numSlots loadFactor \n",
+ "0 32.000000 2 0.750000 \n",
+ "1 26.000000 3 0.500000 \n",
+ "2 20.800000 3 0.625000 \n",
+ "3 17.333333 3 0.750000 \n",
+ "4 27.428571 3 0.875000 \n",
+ ".. ... ... ... \n",
+ "169 17.153603 20 0.757858 \n",
+ "170 16.004884 20 0.812252 \n",
+ "171 14.933086 20 0.870550 \n",
+ "172 27.866110 21 0.466516 \n",
+ "173 26.000000 21 0.500000 \n",
+ "\n",
+ "[174 rows x 8 columns]"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "qf_sizes_1e3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 252,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "<div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>inputCardinality</th>\n",
+ " <th>Size</th>\n",
+ " <th>NumTrials</th>\n",
+ " <th>FalsePositiveRate</th>\n",
+ " <th>NumHashBits</th>\n",
+ " <th>bitsPerElement</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>3</td>\n",
+ " <td>64</td>\n",
+ " <td>16384</td>\n",
+ " <td>0.006104</td>\n",
+ " <td>11</td>\n",
+ " <td>21.333333</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>4</td>\n",
+ " <td>64</td>\n",
+ " <td>10922</td>\n",
+ " <td>0.016968</td>\n",
+ " <td>11</td>\n",
+ " <td>16.000000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>5</td>\n",
+ " <td>128</td>\n",
+ " <td>8192</td>\n",
+ " <td>0.003174</td>\n",
+ " <td>10</td>\n",
+ " <td>25.600000</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>6</td>\n",
+ " <td>128</td>\n",
+ " <td>6553</td>\n",
+ " <td>0.002930</td>\n",
+ " <td>11</td>\n",
+ " <td>21.333333</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>7</td>\n",
+ " <td>128</td>\n",
+ " <td>5461</td>\n",
+ " <td>0.002808</td>\n",
+ " <td>11</td>\n",
+ " <td>18.285714</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>...</th>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " <td>...</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>169</th>\n",
+ " <td>794672</td>\n",
+ " <td>11425472</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000488</td>\n",
+ " <td>10</td>\n",
+ " <td>14.377595</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>170</th>\n",
+ " <td>851708</td>\n",
+ " <td>12245568</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.001465</td>\n",
+ " <td>10</td>\n",
+ " <td>14.377660</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>171</th>\n",
+ " <td>912838</td>\n",
+ " <td>13124416</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.001465</td>\n",
+ " <td>10</td>\n",
+ " <td>14.377596</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>172</th>\n",
+ " <td>978356</td>\n",
+ " <td>14066432</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.001709</td>\n",
+ " <td>10</td>\n",
+ " <td>14.377621</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>173</th>\n",
+ " <td>1048576</td>\n",
+ " <td>15076032</td>\n",
+ " <td>1024</td>\n",
+ " <td>0.000732</td>\n",
+ " <td>10</td>\n",
+ " <td>14.377625</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "<p>174 rows × 6 columns</p>\n",
+ "</div>"
+ ],
+ "text/plain": [
+ " inputCardinality Size NumTrials FalsePositiveRate
NumHashBits \\\n",
+ "0 3 64 16384 0.006104
11 \n",
+ "1 4 64 10922 0.016968
11 \n",
+ "2 5 128 8192 0.003174
10 \n",
+ "3 6 128 6553 0.002930
11 \n",
+ "4 7 128 5461 0.002808
11 \n",
+ ".. ... ... ... ...
... \n",
+ "169 794672 11425472 1024 0.000488
10 \n",
+ "170 851708 12245568 1024 0.001465
10 \n",
+ "171 912838 13124416 1024 0.001465
10 \n",
+ "172 978356 14066432 1024 0.001709
10 \n",
+ "173 1048576 15076032 1024 0.000732
10 \n",
+ "\n",
+ " bitsPerElement \n",
+ "0 21.333333 \n",
+ "1 16.000000 \n",
+ "2 25.600000 \n",
+ "3 21.333333 \n",
+ "4 18.285714 \n",
+ ".. ... \n",
+ "169 14.377595 \n",
+ "170 14.377660 \n",
+ "171 14.377596 \n",
+ "172 14.377621 \n",
+ "173 14.377625 \n",
+ "\n",
+ "[174 rows x 6 columns]"
+ ]
+ },
+ "execution_count": 252,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "bf_sizes_1e3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Size (bits) vs Input Cardinality')"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA+wAAAIoCAYAAADz12GeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOy9eXwb1b3+/0jyIlvyvtvyFtvZ7Tix40BCiEOAAC1QQoALlCa0pZQ1gV/pve1tWbrALfdemrQ10NIl0BK2JC18Wy4UQtKkJHFiJ16SOPESJ/Fux5ZlS7IkS5rfH0ZTyxpJI814rHE+79fLL5DOc5Y58/GTOZ6zKBiGYUAQBEEQBEEQBEEQREihnOkGEARBEARBEARBEAThCQ3YCYIgCIIgCIIgCCIEoQE7QRAEQRAEQRAEQYQgNGAnCIIgCIIgCIIgiBCEBuwEQRAEQRAEQRAEEYLQgJ0gCIIgCIIg
[...]
+ "text/plain": [
+ "<Figure size 1200x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 6))\n",
+ "SCALE = 1\n",
+ "ax.plot(bf_sizes_1e3[\"inputCardinality\"], bf_sizes_1e3[\"Size\"]/SCALE,
color=\"blue\", marker=\"o\", label=r\"Bloom: $fpp = 10^{-3}$\")\n",
+ "\n",
+ "# Quotient \n",
+ "ax.plot(qf_sizes_1e3[\"inputCardinality\"], qf_sizes_1e3[\"Size\"]/SCALE,
color=\"black\", marker=\"o\", label=r\"Quotient: $fpp = 10^{-3}$\")\n",
+ "\n",
+ "\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=10)\n",
+ "ax.set_xscale(\"log\", base=10)\n",
+ "ax.set_ylabel(\"Size (bits)\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "ax.set_title(\"Size (bits) vs Input Cardinality\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Size ratio of Bloom Filter to Quotient Filter')"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABboAAAIoCAYAAACvaF3SAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3wUdf4/8NfuppdNCKlASKjSexEsoKIcCipn9zwBFRAVRE79yp2ncBbu8FQ45bCcCraf9SyHiGJBEZUOShUwoRhqKulb5vdHMsNOPrvJzmT7vJ6PRx5sNrs7kxfv98zks7OfMUmSJIGIiIiIiIiIiIiIKEyZg70CREREREREREREREStwYFuIiIiIiIiIiIiIgprHOgmIiIiIiIiIiIiorDGgW4iIiIiIiIiIiIiCmsc6CYiIiIiIiIiIiKisMaBbiIiIiIiIiIiIiIKaxzoJiIiIiIiIiIiIqKw
[...]
+ "text/plain": [
+ "<Figure size 1800x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(18, 6))\n",
+ "ax.plot(bf_sizes_1e3[\"inputCardinality\"],
bf_sizes_1e3[\"Size\"]/qf_sizes_1e3[\"Size\"], color=\"grey\", marker=\".\",
label=r\"Size ratio (fpp <= $10^{-3}$)\")\n",
+ "ax.axhline(y=1, color='red', linestyle='--', linewidth=2, label='Bloom
filter smaller below this line')\n",
+ "\n",
+ "# Plot vertical lines at 0.9*2^j for each j\n",
+ "j_values = range(2, 21)\n",
+ "jval_col = \"magenta\"\n",
+ "for i, j in enumerate(j_values):\n",
+ " x = 0.9 * 2**j\n",
+ " if i == len(j_values) - 1:\n",
+ " ax.axvline(x=x, color=jval_col, linestyle=':', alpha=0.5,
label=r\"$0.9 \\cdot 2^j$\")\n",
+ " else:\n",
+ " ax.axvline(x=x, color=jval_col, linestyle=':', alpha=0.5)\n",
+ "\n",
+ "\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_xscale(\"log\", base=2)\n",
+ "ax.set_ylabel(\"Size Ratio\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "ax.set_title(\"Size ratio of Bloom Filter to Quotient Filter\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Measured False Positive Rate vs. Input Cardinality
(zeros occur at non-uniform points)')"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAAA/QAAAIoCAYAAADHrqCCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/SrBM8AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydd3hUVd6A35lJmcykd0ICCQkllCSQEFZBQQEREcWGXbAtdpHFXXRVrLBrxVXEtpa1fa4NXQt2RVEDAUKvIYTQQkmdSZ853x/ZuZtJIRO4YebgeZ8nD+TOmXvf+5tzT+6Ze87vGIQQAoVCoVAoFAqFQqFQKBRSYfS2gEKhUCgUCoVCoVAoFIquozr0CoVCoVAoFAqFQqFQSIjq0CsUCoVCoVAoFAqFQiEhqkOvUCgUCoVCoVAoFAqFhKgOvUKhUCgUCoVCoVAoFBKiOvQKhUKhUCgUCoVCoVBIiOrQ
[...]
+ "text/plain": [
+ "<Figure size 1200x600 with 1 Axes>"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, ax = plt.subplots(figsize=(12, 6))\n",
+ "ax.plot(bf_sizes_1e3[\"inputCardinality\"],
bf_sizes_1e3[\"FalsePositiveRate\"], color=\"blue\", marker=\"o\",
label=r\"Bloom target $fpp = 10^{-3}$\")\n",
+ "\n",
+ "# Quotient \n",
+ "ax.plot(qf_sizes_1e3[\"inputCardinality\"],
qf_sizes_1e3[\"FalsePositiveRate\"], color=\"black\", label=r\"Quotient target
$fpp = 10^{-3}$\")\n",
+ "ax.plot(qf_sizes_1e3[\"inputCardinality\"][::3],
qf_sizes_1e3[\"FalsePositiveRate\"][::3], color=\"black\", marker=\"o\",
linestyle=\"None\") \n",
+ "\n",
+ "xvals = np.array([2**j for j in range(21)])\n",
+ "ax.plot(xvals, (1e-3)*np.ones_like(xvals), label=\"Target FPP =
$10^{-e}$\", color=\"red\", linestyle=\"--\")\n",
+ "ax.legend()\n",
+ "ax.grid()\n",
+ "ax.set_yscale(\"log\", base=10)\n",
+ "ax.set_ylim(1e-7, 1)\n",
+ "ax.set_xscale(\"log\", base=2)\n",
+ "ax.set_xticks([2**i for i in range(0, 21)])\n",
+ "ax.set_ylabel(\"FPR (measured)\")\n",
+ "ax.set_xlabel(\"Input Cardinality\")\n",
+ "ax.grid(which='both', linestyle='--', linewidth=0.5)\n",
+ "ax.set_title(\"Measured False Positive Rate vs. Input Cardinality (zeros
occur at non-uniform points)\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "From these plots we see roughly the same error behaviour as when the fpr
is $1e-6$, however the gaps between the two filters have reduced. This
supports the theoretical understanding that if higher accuracy is desired then
Quotient filter should be more strongly considered compared to at lower false
positive rates."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "density-venv",
+ "language": "python",
+ "name": "density_venv"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/src/results/1e-3/BloomFilterSizeProfile20240703_045451PST.txt
b/src/results/1e-3/BloomFilterSizeProfile20240703_045451PST.txt
new file mode 100644
index 0000000..61827cc
--- /dev/null
+++ b/src/results/1e-3/BloomFilterSizeProfile20240703_045451PST.txt
@@ -0,0 +1,203 @@
+START JOB BloomFilterSizeProfile
+Date Time: 2024/07/03 04:54:51 PST
+TrueU Size NumTrials FalsePositiveRate NumHashBits
+3 64 16384 0.006103515625 11
+4 64 10922 0.0169677734375 11
+5 128 8192 0.003173828125 10
+6 128 6553 0.0029296875 11
+7 128 5461 0.0028076171875 11
+8 128 4681 0.0025634765625 11
+9 192 4096 7.32421875E-4 11
+10 192 3640 7.32421875E-4 10
+11 192 3276 8.544921875E-4 11
+12 192 2978 0.001953125 10
+13 192 2730 0.002197265625 10
+14 256 2520 0.002197265625 11
+15 256 2340 0.001708984375 10
+16 256 2184 0.0028076171875 11
+17 256 2048 0.001708984375 10
+18 320 1927 4.8828125E-4 10
+20 320 1820 0.00146484375 10
+21 320 1638 0.001220703125 10
+23 384 1560 9.765625E-4 10
+24 384 1424 0.001220703125 10
+26 384 1365 0.001220703125 10
+28 448 1260 7.32421875E-4 10
+30 448 1170 0.001708984375 10
+32 512 1092 0.00146484375 10
+34 512 1024 0.002197265625 10
+37 576 1024 4.8828125E-4 10
+39 576 1024 7.32421875E-4 10
+42 640 1024 2.44140625E-4 10
+45 704 1024 0.0 10
+49 768 1024 4.8828125E-4 10
+52 768 1024 0.001708984375 10
+56 832 1024 7.32421875E-4 10
+60 896 1024 0.001708984375 10
+64 960 1024 7.32421875E-4 10
+69 1024 1024 0.00244140625 10
+74 1088 1024 0.003173828125 10
+79 1152 1024 4.8828125E-4 10
+84 1216 1024 4.8828125E-4 10
+91 1344 1024 4.8828125E-4 10
+97 1408 1024 4.8828125E-4 10
+104 1536 1024 7.32421875E-4 10
+111 1600 1024 0.001953125 10
+119 1728 1024 0.00146484375 10
+128 1856 1024 7.32421875E-4 10
+137 1984 1024 0.00146484375 10
+147 2176 1024 2.44140625E-4 10
+158 2304 1024 9.765625E-4 10
+169 2432 1024 0.001220703125 10
+181 2624 1024 4.8828125E-4 10
+194 2816 1024 9.765625E-4 10
+208 3008 1024 2.44140625E-4 10
+223 3264 1024 4.8828125E-4 10
+239 3456 1024 0.001708984375 10
+256 3712 1024 9.765625E-4 10
+274 3968 1024 0.001220703125 10
+294 4288 1024 7.32421875E-4 10
+315 4544 1024 0.00146484375 10
+338 4864 1024 4.8828125E-4 10
+362 5248 1024 7.32421875E-4 10
+388 5632 1024 0.001220703125 10
+416 6016 1024 2.44140625E-4 10
+446 6464 1024 0.001220703125 10
+478 6912 1024 0.00146484375 10
+512 7424 1024 0.001220703125 10
+549 7936 1024 7.32421875E-4 10
+588 8512 1024 0.00146484375 10
+630 9088 1024 9.765625E-4 10
+676 9728 1024 4.8828125E-4 10
+724 10432 1024 0.00146484375 10
+776 11200 1024 9.765625E-4 10
+832 11968 1024 0.00146484375 10
+891 12864 1024 4.8828125E-4 10
+955 13760 1024 0.0 10
+1024 14784 1024 0.001708984375 10
+1097 15808 1024 2.44140625E-4 10
+1176 16960 1024 0.00146484375 10
+1261 18176 1024 0.002685546875 10
+1351 19456 1024 2.44140625E-4 10
+1448 20864 1024 9.765625E-4 10
+1552 22336 1024 9.765625E-4 10
+1663 23936 1024 9.765625E-4 10
+1783 25664 1024 7.32421875E-4 10
+1911 27520 1024 4.8828125E-4 10
+2048 29504 1024 4.8828125E-4 10
+2195 31616 1024 7.32421875E-4 10
+2353 33856 1024 7.32421875E-4 10
+2521 36288 1024 0.001220703125 10
+2702 38912 1024 9.765625E-4 10
+2896 41664 1024 4.8828125E-4 10
+3104 44672 1024 9.765625E-4 10
+3327 47872 1024 7.32421875E-4 10
+3566 51328 1024 7.32421875E-4 10
+3822 54976 1024 2.44140625E-4 10
+4096 58944 1024 4.8828125E-4 10
+4390 63168 1024 9.765625E-4 10
+4705 67648 1024 9.765625E-4 10
+5043 72512 1024 0.00146484375 10
+5405 77760 1024 4.8828125E-4 10
+5793 83328 1024 4.8828125E-4 10
+6208 89280 1024 0.001953125 10
+6654 95680 1024 0.001708984375 10
+7132 102592 1024 0.00146484375 10
+7643 109888 1024 4.8828125E-4 10
+8192 117824 1024 7.32421875E-4 10
+8780 126272 1024 9.765625E-4 10
+9410 135296 1024 0.001708984375 10
+10086 145024 1024 9.765625E-4 10
+10809 155456 1024 4.8828125E-4 10
+11585 166592 1024 9.765625E-4 10
+12417 178560 1024 9.765625E-4 10
+13308 191360 1024 4.8828125E-4 10
+14263 205120 1024 0.002685546875 10
+15287 219840 1024 7.32421875E-4 10
+16384 235584 1024 0.00146484375 10
+17560 252480 1024 9.765625E-4 10
+18820 270592 1024 4.8828125E-4 10
+20171 290048 1024 4.8828125E-4 10
+21619 310848 1024 4.8828125E-4 10
+23170 333184 1024 0.001708984375 10
+24834 357056 1024 9.765625E-4 10
+26616 382720 1024 0.001220703125 10
+28526 410176 1024 7.32421875E-4 10
+30574 439616 1024 7.32421875E-4 10
+32768 471168 1024 0.00146484375 10
+35120 504960 1024 4.8828125E-4 10
+37641 541248 1024 9.765625E-4 10
+40342 580032 1024 0.001220703125 10
+43238 621696 1024 9.765625E-4 10
+46341 666304 1024 4.8828125E-4 10
+49667 714112 1024 4.8828125E-4 10
+53232 765376 1024 9.765625E-4 10
+57052 820288 1024 0.001220703125 10
+61147 879168 1024 9.765625E-4 10
+65536 942272 1024 2.44140625E-4 10
+70240 1009920 1024 4.8828125E-4 10
+75281 1082368 1024 9.765625E-4 10
+80684 1160064 1024 4.8828125E-4 10
+86475 1243328 1024 7.32421875E-4 10
+92682 1332544 1024 4.8828125E-4 10
+99334 1428224 1024 0.0 10
+106464 1530752 1024 7.32421875E-4 10
+114105 1640576 1024 0.001708984375 10
+122295 1758336 1024 2.44140625E-4 10
+131072 1884544 1024 7.32421875E-4 10
+140479 2019776 1024 9.765625E-4 10
+150562 2164736 1024 4.8828125E-4 10
+161369 2320128 1024 2.44140625E-4 10
+172951 2486656 1024 7.32421875E-4 10
+185364 2665088 1024 7.32421875E-4 10
+198668 2856384 1024 0.001220703125 10
+212927 3061440 1024 7.32421875E-4 10
+228210 3281152 1024 4.8828125E-4 10
+244589 3516608 1024 9.765625E-4 10
+262144 3769024 1024 0.001708984375 10
+280959 4039552 1024 7.32421875E-4 10
+301124 4329472 1024 0.001220703125 10
+322737 4640192 1024 9.765625E-4 10
+345901 4973248 1024 9.765625E-4 10
+370728 5330176 1024 0.00146484375 10
+397336 5712768 1024 4.8828125E-4 10
+425854 6122816 1024 0.00146484375 10
+456419 6562240 1024 0.00146484375 10
+489178 7033216 1024 0.00146484375 10
+524288 7538048 1024 0.001708984375 10
+561918 8079040 1024 4.8828125E-4 10
+602249 8658944 1024 0.00146484375 10
+645474 9280384 1024 0.001708984375 10
+691802 9946496 1024 0.00146484375 10
+741455 10660352 1024 9.765625E-4 10
+794672 11425472 1024 4.8828125E-4 10
+851708 12245568 1024 0.00146484375 10
+912838 13124416 1024 0.00146484375 10
+978356 14066432 1024 0.001708984375 10
+1048576 15076032 1024 7.32421875E-4 10
+
+PROPERTIES:
+FileNameDateFormat=yyyyMMdd'_'HHmmssz
+JobProfile=org.apache.datasketches.characterization.filters.BloomFilterSizeProfile
+OutputFileName=BloomFilterSizeProfile20240703_045451PST.txt
+OutputFileNameData=BloomFilterSizeProfile20240703_045451PST.tsv
+ReadableDateFormat=yyyy/MM/dd HH:mm:ss z
+targetFpp=1e-3
+TimeZone=PST
+TimeZoneOffset=-28800000
+Trials_interData=true
+Trials_lgMaxBpU=5
+Trials_lgMaxT=14
+Trials_lgMaxU=20
+Trials_lgMinBpU=1
+Trials_lgMinT=10
+Trials_lgMinU=0
+Trials_lgQK=12
+Trials_postPMFs=false
+Trials_TPPO=1
+Trials_UPPO=10
+
+Total Job Time: 0:00:00.788
+END JOB BloomFilterSizeProfile
+
+
diff --git a/src/results/1e-3/QuotientFilterSizeProfile20240703_045733PST.txt
b/src/results/1e-3/QuotientFilterSizeProfile20240703_045733PST.txt
new file mode 100644
index 0000000..a12fa3c
--- /dev/null
+++ b/src/results/1e-3/QuotientFilterSizeProfile20240703_045733PST.txt
@@ -0,0 +1,200 @@
+START JOB QuotientFilterSizeProfile
+Date Time: 2024/07/03 04:57:33 PST
+TrueU Size NumTrials FalsePositiveRate NumHashBits
+3 96 16384 2.44140625E-4 10
+4 104 10922 7.32421875E-4 10
+5 104 8192 4.8828125E-4 10
+6 104 6553 2.44140625E-4 10
+7 192 5461 7.32421875E-4 10
+8 208 4681 0.0 10
+9 208 4096 2.44140625E-4 10
+10 208 3640 7.32421875E-4 10
+11 208 3276 9.765625E-4 10
+12 208 2978 0.001220703125 10
+13 208 2730 7.32421875E-4 10
+14 384 2520 0.001220703125 10
+15 416 2340 2.44140625E-4 10
+16 416 2184 4.8828125E-4 10
+17 416 2048 2.44140625E-4 10
+18 416 1927 7.32421875E-4 10
+20 416 1820 9.765625E-4 10
+21 416 1638 4.8828125E-4 10
+23 416 1560 0.001220703125 10
+24 416 1424 4.8828125E-4 10
+26 416 1365 9.765625E-4 10
+28 768 1260 2.44140625E-4 10
+30 832 1170 7.32421875E-4 10
+32 832 1092 2.44140625E-4 10
+34 832 1024 4.8828125E-4 10
+37 832 1024 7.32421875E-4 10
+39 832 1024 2.44140625E-4 10
+42 832 1024 4.8828125E-4 10
+45 832 1024 0.00146484375 10
+49 832 1024 2.44140625E-4 10
+52 832 1024 0.001220703125 10
+56 832 1024 0.001220703125 10
+60 1664 1024 2.44140625E-4 10
+64 1664 1024 7.32421875E-4 10
+69 1664 1024 0.001220703125 10
+74 1664 1024 4.8828125E-4 10
+79 1664 1024 4.8828125E-4 10
+84 1664 1024 4.8828125E-4 10
+91 1664 1024 2.44140625E-4 10
+97 1664 1024 9.765625E-4 10
+104 1664 1024 4.8828125E-4 10
+111 1664 1024 9.765625E-4 10
+119 3328 1024 7.32421875E-4 10
+128 3328 1024 2.44140625E-4 10
+137 3328 1024 7.32421875E-4 10
+147 3328 1024 4.8828125E-4 10
+158 3328 1024 2.44140625E-4 10
+169 3328 1024 2.44140625E-4 10
+181 3328 1024 4.8828125E-4 10
+194 3328 1024 0.001220703125 10
+208 3328 1024 2.44140625E-4 10
+223 3328 1024 4.8828125E-4 10
+239 6656 1024 9.765625E-4 10
+256 6656 1024 2.44140625E-4 10
+274 6656 1024 4.8828125E-4 10
+294 6656 1024 2.44140625E-4 10
+315 6656 1024 0.001220703125 10
+338 6656 1024 9.765625E-4 10
+362 6656 1024 4.8828125E-4 10
+388 6656 1024 0.00146484375 10
+416 6656 1024 4.8828125E-4 10
+446 6656 1024 9.765625E-4 10
+478 13312 1024 2.44140625E-4 10
+512 13312 1024 0.0 10
+549 13312 1024 7.32421875E-4 10
+588 13312 1024 0.001220703125 10
+630 13312 1024 9.765625E-4 10
+676 13312 1024 4.8828125E-4 10
+724 13312 1024 0.00146484375 10
+776 13312 1024 0.001220703125 10
+832 13312 1024 9.765625E-4 10
+891 13312 1024 7.32421875E-4 10
+955 26624 1024 4.8828125E-4 10
+1024 26624 1024 4.8828125E-4 10
+1097 26624 1024 4.8828125E-4 10
+1176 26624 1024 9.765625E-4 10
+1261 26624 1024 9.765625E-4 10
+1351 26624 1024 4.8828125E-4 10
+1448 26624 1024 0.0 10
+1552 26624 1024 4.8828125E-4 10
+1663 26624 1024 2.44140625E-4 10
+1783 26624 1024 7.32421875E-4 10
+1911 53248 1024 4.8828125E-4 10
+2048 53248 1024 0.001220703125 10
+2195 53248 1024 2.44140625E-4 10
+2353 53248 1024 9.765625E-4 10
+2521 53248 1024 0.001220703125 10
+2702 53248 1024 4.8828125E-4 10
+2896 53248 1024 0.0 10
+3104 53248 1024 2.44140625E-4 10
+3327 53248 1024 7.32421875E-4 10
+3566 53248 1024 0.0 10
+3822 106496 1024 2.44140625E-4 10
+4096 106496 1024 7.32421875E-4 10
+4390 106496 1024 7.32421875E-4 10
+4705 106496 1024 4.8828125E-4 10
+5043 106496 1024 0.0 10
+5405 106496 1024 2.44140625E-4 10
+5793 106496 1024 0.00146484375 10
+6208 106496 1024 2.44140625E-4 10
+6654 106496 1024 9.765625E-4 10
+7132 106496 1024 7.32421875E-4 10
+7643 212992 1024 7.32421875E-4 10
+8192 212992 1024 0.0 10
+8780 212992 1024 4.8828125E-4 10
+9410 212992 1024 9.765625E-4 10
+10086 212992 1024 2.44140625E-4 10
+10809 212992 1024 2.44140625E-4 10
+11585 212992 1024 9.765625E-4 10
+12417 212992 1024 0.0 10
+13308 212992 1024 4.8828125E-4 10
+14263 212992 1024 0.001708984375 10
+15287 425984 1024 0.0 10
+16384 425984 1024 2.44140625E-4 10
+17560 425984 1024 9.765625E-4 10
+18820 425984 1024 4.8828125E-4 10
+20171 425984 1024 7.32421875E-4 10
+21619 425984 1024 9.765625E-4 10
+23170 425984 1024 9.765625E-4 10
+24834 425984 1024 9.765625E-4 10
+26616 425984 1024 7.32421875E-4 10
+28526 425984 1024 9.765625E-4 10
+30574 851968 1024 4.8828125E-4 10
+32768 851968 1024 4.8828125E-4 10
+35120 851968 1024 0.0 10
+37641 851968 1024 4.8828125E-4 10
+40342 851968 1024 2.44140625E-4 10
+43238 851968 1024 4.8828125E-4 10
+46341 851968 1024 2.44140625E-4 10
+49667 851968 1024 0.001220703125 10
+53232 851968 1024 0.00146484375 10
+57052 851968 1024 4.8828125E-4 10
+61147 1703936 1024 4.8828125E-4 10
+65536 1703936 1024 2.44140625E-4 10
+70240 1703936 1024 0.0 10
+75281 1703936 1024 7.32421875E-4 10
+80684 1703936 1024 9.765625E-4 10
+86475 1703936 1024 4.8828125E-4 10
+92682 1703936 1024 2.44140625E-4 10
+99334 1703936 1024 9.765625E-4 10
+106464 1703936 1024 7.32421875E-4 10
+114105 1703936 1024 0.001220703125 10
+122295 3407872 1024 7.32421875E-4 10
+131072 3407872 1024 2.44140625E-4 10
+140479 3407872 1024 2.44140625E-4 10
+150562 3407872 1024 0.001220703125 10
+161369 3407872 1024 9.765625E-4 10
+172951 3407872 1024 7.32421875E-4 10
+185364 3407872 1024 0.001220703125 10
+198668 3407872 1024 4.8828125E-4 10
+212927 3407872 1024 9.765625E-4 10
+228210 3407872 1024 9.765625E-4 10
+244589 6815744 1024 4.8828125E-4 10
+262144 6815744 1024 4.8828125E-4 10
+280959 6815744 1024 0.001220703125 10
+301124 6815744 1024 9.765625E-4 10
+322737 6815744 1024 4.8828125E-4 10
+345901 6815744 1024 4.8828125E-4 10
+370728 6815744 1024 9.765625E-4 10
+397336 6815744 1024 2.44140625E-4 10
+425854 6815744 1024 7.32421875E-4 10
+456419 6815744 1024 7.32421875E-4 10
+489178 13631488 1024 0.0 10
+524288 13631488 1024 9.765625E-4 10
+561918 13631488 1024 7.32421875E-4 10
+602249 13631488 1024 7.32421875E-4 10
+645474 13631488 1024 4.8828125E-4 10
+691802 13631488 1024 0.0 10
+741455 13631488 1024 0.00146484375 10
+794672 13631488 1024 0.0 10
+851708 13631488 1024 2.44140625E-4 10
+912838 13631488 1024 9.765625E-4 10
+978356 27262976 1024 2.44140625E-4 10
+1048576 27262976 1024 0.0 10
+
+PROPERTIES:
+FileNameDateFormat=yyyyMMdd'_'HHmmssz
+JobProfile=org.apache.datasketches.characterization.filters.QuotientFilterSizeProfile
+OutputFileName=QuotientFilterSizeProfile20240703_045733PST.txt
+OutputFileNameData=QuotientFilterSizeProfile20240703_045733PST.tsv
+ReadableDateFormat=yyyy/MM/dd HH:mm:ss z
+targetFpp=1e-3
+TimeZone=PST
+TimeZoneOffset=-28800000
+Trials_lgMaxBpU=5
+Trials_lgMaxT=14
+Trials_lgMaxU=20
+Trials_lgMinBpU=1
+Trials_lgMinT=10
+Trials_lgMinU=0
+Trials_TPPO=1
+Trials_UPPO=10
+
+Total Job Time: 0:00:01.478
+END JOB QuotientFilterSizeProfile
+
+
diff --git a/src/results/1e-6/BloomFilterSizeProfile20240703_014140PST.txt
b/src/results/1e-6/BloomFilterSizeProfile20240703_014140PST.txt
new file mode 100644
index 0000000..9921d58
--- /dev/null
+++ b/src/results/1e-6/BloomFilterSizeProfile20240703_014140PST.txt
@@ -0,0 +1,203 @@
+START JOB BloomFilterSizeProfile
+Date Time: 2024/07/03 01:41:40 PST
+TrueU Size NumTrials FalsePositiveRate NumHashBits
+3 128 16384 0.003031015396118164 21
+4 128 10922 0.0032535791397094727 21
+5 192 8192 8.099079132080078E-4 20
+6 192 6553 9.753704071044922E-4 20
+7 256 5461 0.0012896060943603516 21
+8 256 4681 0.001676797866821289 21
+9 320 4096 1.94549560546875E-4 20
+10 320 3640 7.684230804443359E-4 20
+11 320 3276 4.975795745849609E-4 20
+12 384 2978 5.576610565185547E-4 20
+13 384 2730 5.486011505126953E-4 20
+14 448 2520 2.186298370361328E-4 20
+15 448 2340 8.077621459960938E-4 20
+16 512 2184 8.361339569091797E-4 20
+17 512 2048 0.0014133453369140625 20
+18 576 1927 1.4328956604003906E-4 20
+20 576 1820 1.7786026000976562E-4 20
+21 640 1638 3.0040740966796875E-4 20
+23 704 1560 1.1301040649414062E-4 20
+24 704 1424 1.3494491577148438E-4 20
+26 768 1365 3.4618377685546875E-4 20
+28 832 1260 1.2993812561035156E-4 20
+30 896 1170 1.0633468627929688E-4 20
+32 960 1092 9.1552734375E-5 20
+34 1024 1024 4.734992980957031E-4 20
+37 1088 1024 1.0943412780761719E-4 20
+39 1152 1024 1.0752677917480469E-4 20
+42 1216 1024 9.369850158691406E-5 20
+45 1344 1024 8.606910705566406E-5 20
+49 1472 1024 6.198883056640625E-5 20
+52 1536 1024 1.9502639770507812E-4 20
+56 1664 1024 1.010894775390625E-4 20
+60 1728 1024 7.081031799316406E-5 20
+64 1856 1024 5.2928924560546875E-5 20
+69 2048 1024 2.2220611572265625E-4 20
+74 2176 1024 4.6253204345703125E-5 20
+79 2304 1024 5.984306335449219E-5 20
+84 2432 1024 4.38690185546875E-5 20
+91 2624 1024 4.3392181396484375E-5 20
+97 2816 1024 5.507469177246094E-5 20
+104 3008 1024 4.363059997558594E-5 20
+111 3200 1024 3.8623809814453125E-5 20
+119 3456 1024 3.4332275390625E-5 20
+128 3712 1024 2.9087066650390625E-5 20
+137 3968 1024 3.361701965332031E-5 20
+147 4288 1024 2.7418136596679688E-5 20
+158 4544 1024 2.6226043701171875E-5 20
+169 4864 1024 3.0040740966796875E-5 20
+181 5248 1024 2.0265579223632812E-5 20
+194 5632 1024 2.8133392333984375E-5 20
+208 6016 1024 2.09808349609375E-5 20
+223 6464 1024 1.6689300537109375E-5 20
+239 6912 1024 1.7881393432617188E-5 20
+256 7424 1024 1.5974044799804688E-5 20
+274 7936 1024 1.33514404296875E-5 20
+294 8512 1024 1.1205673217773438E-5 20
+315 9088 1024 1.71661376953125E-5 20
+338 9728 1024 9.5367431640625E-6 20
+362 10432 1024 9.5367431640625E-6 20
+388 11200 1024 9.298324584960938E-6 20
+416 11968 1024 1.0013580322265625E-5 20
+446 12864 1024 9.775161743164062E-6 20
+478 13760 1024 8.58306884765625E-6 20
+512 14784 1024 8.106231689453125E-6 20
+549 15808 1024 8.344650268554688E-6 20
+588 16960 1024 8.344650268554688E-6 20
+630 18176 1024 4.76837158203125E-6 20
+676 19456 1024 7.3909759521484375E-6 20
+724 20864 1024 6.198883056640625E-6 20
+776 22336 1024 5.245208740234375E-6 20
+832 23936 1024 4.5299530029296875E-6 20
+891 25664 1024 2.86102294921875E-6 20
+955 27520 1024 6.9141387939453125E-6 20
+1024 29504 1024 4.5299530029296875E-6 20
+1097 31552 1024 4.5299530029296875E-6 20
+1176 33856 1024 1.6689300537109375E-6 20
+1261 36288 1024 5.245208740234375E-6 20
+1351 38912 1024 3.0994415283203125E-6 20
+1448 41664 1024 3.5762786865234375E-6 20
+1552 44672 1024 4.291534423828125E-6 20
+1663 47872 1024 1.9073486328125E-6 20
+1783 51328 1024 2.6226043701171875E-6 20
+1911 54976 1024 5.0067901611328125E-6 20
+2048 58944 1024 3.0994415283203125E-6 20
+2195 63168 1024 1.9073486328125E-6 20
+2353 67712 1024 2.86102294921875E-6 20
+2521 72512 1024 1.6689300537109375E-6 20
+2702 77760 1024 2.6226043701171875E-6 20
+2896 83328 1024 3.0994415283203125E-6 20
+3104 89280 1024 7.152557373046875E-7 20
+3327 95680 1024 9.5367431640625E-7 20
+3566 102592 1024 1.6689300537109375E-6 20
+3822 109952 1024 2.86102294921875E-6 20
+4096 117824 1024 2.86102294921875E-6 20
+4390 126272 1024 1.6689300537109375E-6 20
+4705 135296 1024 2.1457672119140625E-6 20
+5043 145024 1024 2.6226043701171875E-6 20
+5405 155456 1024 2.6226043701171875E-6 20
+5793 166592 1024 9.5367431640625E-7 20
+6208 178560 1024 1.1920928955078125E-6 20
+6654 191360 1024 9.5367431640625E-7 20
+7132 205120 1024 1.9073486328125E-6 20
+7643 219776 1024 2.1457672119140625E-6 20
+8192 235584 1024 1.6689300537109375E-6 20
+8780 252480 1024 1.430511474609375E-6 20
+9410 270592 1024 2.384185791015625E-6 20
+10086 290048 1024 7.152557373046875E-7 20
+10809 310848 1024 1.9073486328125E-6 20
+11585 333184 1024 1.9073486328125E-6 20
+12417 357056 1024 2.384185791015625E-6 20
+13308 382720 1024 2.1457672119140625E-6 20
+14263 410176 1024 9.5367431640625E-7 20
+15287 439616 1024 1.430511474609375E-6 20
+16384 471168 1024 1.430511474609375E-6 20
+17560 504960 1024 4.76837158203125E-7 20
+18820 541184 1024 1.430511474609375E-6 20
+20171 580032 1024 1.9073486328125E-6 20
+21619 621696 1024 1.1920928955078125E-6 20
+23170 666304 1024 1.6689300537109375E-6 20
+24834 714112 1024 1.430511474609375E-6 20
+26616 765376 1024 1.1920928955078125E-6 20
+28526 820288 1024 1.430511474609375E-6 20
+30574 879168 1024 4.76837158203125E-7 20
+32768 942272 1024 2.384185791015625E-7 20
+35120 1009920 1024 1.9073486328125E-6 20
+37641 1082432 1024 1.430511474609375E-6 20
+40342 1160064 1024 9.5367431640625E-7 20
+43238 1243328 1024 1.430511474609375E-6 20
+46341 1332544 1024 1.9073486328125E-6 20
+49667 1428224 1024 1.1920928955078125E-6 20
+53232 1530752 1024 1.1920928955078125E-6 20
+57052 1640576 1024 1.430511474609375E-6 20
+61147 1758336 1024 9.5367431640625E-7 20
+65536 1884544 1024 1.430511474609375E-6 20
+70240 2019776 1024 2.6226043701171875E-6 20
+75281 2164736 1024 2.384185791015625E-6 20
+80684 2320128 1024 2.384185791015625E-7 20
+86475 2486656 1024 9.5367431640625E-7 20
+92682 2665088 1024 7.152557373046875E-7 20
+99334 2856384 1024 1.430511474609375E-6 20
+106464 3061440 1024 7.152557373046875E-7 20
+114105 3281152 1024 2.384185791015625E-7 20
+122295 3516672 1024 2.384185791015625E-6 20
+131072 3769024 1024 7.152557373046875E-7 20
+140479 4039552 1024 7.152557373046875E-7 20
+150562 4329472 1024 1.1920928955078125E-6 20
+161369 4640256 1024 1.6689300537109375E-6 20
+172951 4973248 1024 1.1920928955078125E-6 20
+185364 5330176 1024 9.5367431640625E-7 20
+198668 5712768 1024 7.152557373046875E-7 20
+212927 6122816 1024 1.1920928955078125E-6 20
+228210 6562240 1024 4.76837158203125E-7 20
+244589 7033216 1024 2.384185791015625E-7 20
+262144 7538048 1024 9.5367431640625E-7 20
+280959 8079040 1024 7.152557373046875E-7 20
+301124 8658880 1024 4.76837158203125E-7 20
+322737 9280384 1024 2.6226043701171875E-6 20
+345901 9946496 1024 1.6689300537109375E-6 20
+370728 10660352 1024 1.430511474609375E-6 20
+397336 11425472 1024 7.152557373046875E-7 20
+425854 12245568 1024 7.152557373046875E-7 20
+456419 13124416 1024 4.76837158203125E-7 20
+489178 14066432 1024 1.1920928955078125E-6 20
+524288 15076032 1024 2.6226043701171875E-6 20
+561918 16158080 1024 4.76837158203125E-7 20
+602249 17317824 1024 9.5367431640625E-7 20
+645474 18560768 1024 1.430511474609375E-6 20
+691802 19892928 1024 4.76837158203125E-7 20
+741455 21320704 1024 1.6689300537109375E-6 20
+794672 22850944 1024 9.5367431640625E-7 20
+851708 24491072 1024 1.1920928955078125E-6 20
+912838 26248832 1024 1.430511474609375E-6 20
+978356 28132800 1024 1.430511474609375E-6 20
+1048576 30152000 1024 7.152557373046875E-7 20
+
+PROPERTIES:
+FileNameDateFormat=yyyyMMdd'_'HHmmssz
+JobProfile=org.apache.datasketches.characterization.filters.BloomFilterSizeProfile
+OutputFileName=BloomFilterSizeProfile20240703_014140PST.txt
+OutputFileNameData=BloomFilterSizeProfile20240703_014140PST.tsv
+ReadableDateFormat=yyyy/MM/dd HH:mm:ss z
+targetFpp=1e-6
+TimeZone=PST
+TimeZoneOffset=-28800000
+Trials_interData=true
+Trials_lgMaxBpU=5
+Trials_lgMaxT=14
+Trials_lgMaxU=20
+Trials_lgMinBpU=1
+Trials_lgMinT=10
+Trials_lgMinU=0
+Trials_lgQK=12
+Trials_postPMFs=false
+Trials_TPPO=1
+Trials_UPPO=10
+
+Total Job Time: 0:00:34.216
+END JOB BloomFilterSizeProfile
+
+
diff --git
a/src/results/1e-6/QuotientFilterSizeProfile20240703_014255PST_1E-6.txt
b/src/results/1e-6/QuotientFilterSizeProfile20240703_014255PST_1E-6.txt
new file mode 100644
index 0000000..02ee7f7
--- /dev/null
+++ b/src/results/1e-6/QuotientFilterSizeProfile20240703_014255PST_1E-6.txt
@@ -0,0 +1,200 @@
+START JOB QuotientFilterSizeProfile
+Date Time: 2024/07/03 01:42:55 PST
+TrueU Size NumTrials FalsePositiveRate NumHashBits
+3 176 16384 9.5367431640625E-7 20
+4 184 10922 2.384185791015625E-7 20
+5 184 8192 4.76837158203125E-7 20
+6 184 6553 4.76837158203125E-7 20
+7 352 5461 2.384185791015625E-7 20
+8 368 4681 9.5367431640625E-7 20
+9 368 4096 2.384185791015625E-7 20
+10 368 3640 2.384185791015625E-7 20
+11 368 3276 7.152557373046875E-7 20
+12 368 2978 4.76837158203125E-7 20
+13 368 2730 7.152557373046875E-7 20
+14 704 2520 2.384185791015625E-7 20
+15 736 2340 7.152557373046875E-7 20
+16 736 2184 7.152557373046875E-7 20
+17 736 2048 2.384185791015625E-7 20
+18 736 1927 2.384185791015625E-7 20
+20 736 1820 7.152557373046875E-7 20
+21 736 1638 9.5367431640625E-7 20
+23 736 1560 2.384185791015625E-7 20
+24 736 1424 7.152557373046875E-7 20
+26 736 1365 7.152557373046875E-7 20
+28 1408 1260 9.5367431640625E-7 20
+30 1472 1170 4.76837158203125E-7 20
+32 1472 1092 4.76837158203125E-7 20
+34 1472 1024 0.0 20
+37 1472 1024 4.76837158203125E-7 20
+39 1472 1024 2.384185791015625E-7 20
+42 1472 1024 4.76837158203125E-7 20
+45 1472 1024 2.384185791015625E-7 20
+49 1472 1024 7.152557373046875E-7 20
+52 1472 1024 1.1920928955078125E-6 20
+56 1472 1024 9.5367431640625E-7 20
+60 2944 1024 2.384185791015625E-7 20
+64 2944 1024 2.384185791015625E-7 20
+69 2944 1024 0.0 20
+74 2944 1024 2.384185791015625E-7 20
+79 2944 1024 2.384185791015625E-7 20
+84 2944 1024 4.76837158203125E-7 20
+91 2944 1024 2.384185791015625E-7 20
+97 2944 1024 9.5367431640625E-7 20
+104 2944 1024 4.76837158203125E-7 20
+111 2944 1024 7.152557373046875E-7 20
+119 5888 1024 0.0 20
+128 5888 1024 1.430511474609375E-6 20
+137 5888 1024 4.76837158203125E-7 20
+147 5888 1024 9.5367431640625E-7 20
+158 5888 1024 7.152557373046875E-7 20
+169 5888 1024 7.152557373046875E-7 20
+181 5888 1024 0.0 20
+194 5888 1024 1.430511474609375E-6 20
+208 5888 1024 9.5367431640625E-7 20
+223 5888 1024 7.152557373046875E-7 20
+239 11776 1024 7.152557373046875E-7 20
+256 11776 1024 2.384185791015625E-7 20
+274 11776 1024 7.152557373046875E-7 20
+294 11776 1024 7.152557373046875E-7 20
+315 11776 1024 7.152557373046875E-7 20
+338 11776 1024 2.384185791015625E-7 20
+362 11776 1024 7.152557373046875E-7 20
+388 11776 1024 9.5367431640625E-7 20
+416 11776 1024 9.5367431640625E-7 20
+446 11776 1024 1.1920928955078125E-6 20
+478 23552 1024 4.76837158203125E-7 20
+512 23552 1024 7.152557373046875E-7 20
+549 23552 1024 0.0 20
+588 23552 1024 2.384185791015625E-7 20
+630 23552 1024 9.5367431640625E-7 20
+676 23552 1024 9.5367431640625E-7 20
+724 23552 1024 1.430511474609375E-6 20
+776 23552 1024 7.152557373046875E-7 20
+832 23552 1024 1.1920928955078125E-6 20
+891 23552 1024 4.76837158203125E-7 20
+955 47104 1024 4.76837158203125E-7 20
+1024 47104 1024 0.0 20
+1097 47104 1024 4.76837158203125E-7 20
+1176 47104 1024 4.76837158203125E-7 20
+1261 47104 1024 1.430511474609375E-6 20
+1351 47104 1024 2.384185791015625E-7 20
+1448 47104 1024 1.430511474609375E-6 20
+1552 47104 1024 4.76837158203125E-7 20
+1663 47104 1024 9.5367431640625E-7 20
+1783 47104 1024 4.76837158203125E-7 20
+1911 94208 1024 7.152557373046875E-7 20
+2048 94208 1024 2.384185791015625E-7 20
+2195 94208 1024 7.152557373046875E-7 20
+2353 94208 1024 2.384185791015625E-7 20
+2521 94208 1024 2.384185791015625E-7 20
+2702 94208 1024 1.1920928955078125E-6 20
+2896 94208 1024 7.152557373046875E-7 20
+3104 94208 1024 4.76837158203125E-7 20
+3327 94208 1024 4.76837158203125E-7 20
+3566 94208 1024 4.76837158203125E-7 20
+3822 188416 1024 4.76837158203125E-7 20
+4096 188416 1024 4.76837158203125E-7 20
+4390 188416 1024 0.0 20
+4705 188416 1024 4.76837158203125E-7 20
+5043 188416 1024 7.152557373046875E-7 20
+5405 188416 1024 1.1920928955078125E-6 20
+5793 188416 1024 7.152557373046875E-7 20
+6208 188416 1024 9.5367431640625E-7 20
+6654 188416 1024 9.5367431640625E-7 20
+7132 188416 1024 2.384185791015625E-7 20
+7643 376832 1024 0.0 20
+8192 376832 1024 1.1920928955078125E-6 20
+8780 376832 1024 4.76837158203125E-7 20
+9410 376832 1024 4.76837158203125E-7 20
+10086 376832 1024 2.384185791015625E-7 20
+10809 376832 1024 2.384185791015625E-7 20
+11585 376832 1024 4.76837158203125E-7 20
+12417 376832 1024 1.1920928955078125E-6 20
+13308 376832 1024 4.76837158203125E-7 20
+14263 376832 1024 9.5367431640625E-7 20
+15287 753664 1024 4.76837158203125E-7 20
+16384 753664 1024 1.1920928955078125E-6 20
+17560 753664 1024 2.384185791015625E-7 20
+18820 753664 1024 4.76837158203125E-7 20
+20171 753664 1024 1.1920928955078125E-6 20
+21619 753664 1024 4.76837158203125E-7 20
+23170 753664 1024 1.1920928955078125E-6 20
+24834 753664 1024 7.152557373046875E-7 20
+26616 753664 1024 4.76837158203125E-7 20
+28526 753664 1024 7.152557373046875E-7 20
+30574 1507328 1024 7.152557373046875E-7 20
+32768 1507328 1024 9.5367431640625E-7 20
+35120 1507328 1024 4.76837158203125E-7 20
+37641 1507328 1024 4.76837158203125E-7 20
+40342 1507328 1024 7.152557373046875E-7 20
+43238 1507328 1024 9.5367431640625E-7 20
+46341 1507328 1024 7.152557373046875E-7 20
+49667 1507328 1024 9.5367431640625E-7 20
+53232 1507328 1024 1.1920928955078125E-6 20
+57052 1507328 1024 4.76837158203125E-7 20
+61147 3014656 1024 7.152557373046875E-7 20
+65536 3014656 1024 2.384185791015625E-7 20
+70240 3014656 1024 4.76837158203125E-7 20
+75281 3014656 1024 4.76837158203125E-7 20
+80684 3014656 1024 0.0 20
+86475 3014656 1024 2.384185791015625E-7 20
+92682 3014656 1024 2.384185791015625E-7 20
+99334 3014656 1024 4.76837158203125E-7 20
+106464 3014656 1024 1.1920928955078125E-6 20
+114105 3014656 1024 7.152557373046875E-7 20
+122295 6029312 1024 4.76837158203125E-7 20
+131072 6029312 1024 0.0 20
+140479 6029312 1024 7.152557373046875E-7 20
+150562 6029312 1024 1.1920928955078125E-6 20
+161369 6029312 1024 2.384185791015625E-7 20
+172951 6029312 1024 4.76837158203125E-7 20
+185364 6029312 1024 1.1920928955078125E-6 20
+198668 6029312 1024 2.384185791015625E-7 20
+212927 6029312 1024 7.152557373046875E-7 20
+228210 6029312 1024 9.5367431640625E-7 20
+244589 12058624 1024 2.384185791015625E-7 20
+262144 12058624 1024 7.152557373046875E-7 20
+280959 12058624 1024 2.384185791015625E-7 20
+301124 12058624 1024 7.152557373046875E-7 20
+322737 12058624 1024 4.76837158203125E-7 20
+345901 12058624 1024 7.152557373046875E-7 20
+370728 12058624 1024 4.76837158203125E-7 20
+397336 12058624 1024 7.152557373046875E-7 20
+425854 12058624 1024 9.5367431640625E-7 20
+456419 12058624 1024 7.152557373046875E-7 20
+489178 24117248 1024 2.384185791015625E-7 20
+524288 24117248 1024 4.76837158203125E-7 20
+561918 24117248 1024 2.384185791015625E-7 20
+602249 24117248 1024 0.0 20
+645474 24117248 1024 4.76837158203125E-7 20
+691802 24117248 1024 1.430511474609375E-6 20
+741455 24117248 1024 7.152557373046875E-7 20
+794672 24117248 1024 1.1920928955078125E-6 20
+851708 24117248 1024 1.1920928955078125E-6 20
+912838 24117248 1024 2.384185791015625E-7 20
+978356 48234496 1024 7.152557373046875E-7 20
+1048576 48234496 1024 2.384185791015625E-7 20
+
+PROPERTIES:
+FileNameDateFormat=yyyyMMdd'_'HHmmssz
+JobProfile=org.apache.datasketches.characterization.filters.QuotientFilterSizeProfile
+OutputFileName=QuotientFilterSizeProfile20240703_014255PST.txt
+OutputFileNameData=QuotientFilterSizeProfile20240703_014255PST.tsv
+ReadableDateFormat=yyyy/MM/dd HH:mm:ss z
+targetFpp=1e-6
+TimeZone=PST
+TimeZoneOffset=-28800000
+Trials_lgMaxBpU=5
+Trials_lgMaxT=14
+Trials_lgMaxU=20
+Trials_lgMinBpU=1
+Trials_lgMinT=10
+Trials_lgMinU=0
+Trials_TPPO=1
+Trials_UPPO=10
+
+Total Job Time: 0:00:42.539
+END JOB QuotientFilterSizeProfile
+
+
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]