This is an automated email from the ASF dual-hosted git repository.
leerho pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datasketches-website.git
The following commit(s) were added to refs/heads/master by this push:
new 2bc85e16 Merge branch 'master' of
[email protected]:apache/datasketches-website.git
2bc85e16 is described below
commit 2bc85e166f918a76c75277491f2dc552770e02be
Author: Lee Rhodes <[email protected]>
AuthorDate: Tue Jul 9 15:18:52 2024 -0700
Merge branch 'master' of [email protected]:apache/datasketches-website.git
---
docs/Architecture/SketchesByComponent.md | 202 +++++++++++++++++--------------
1 file changed, 108 insertions(+), 94 deletions(-)
diff --git a/docs/Architecture/SketchesByComponent.md
b/docs/Architecture/SketchesByComponent.md
index 6ae3578d..2a0b5605 100644
--- a/docs/Architecture/SketchesByComponent.md
+++ b/docs/Architecture/SketchesByComponent.md
@@ -19,11 +19,11 @@ layout: doc_page
specific language governing permissions and limitations
under the License.
-->
-# Sketches by [Component
Repository](https://github.com/apache?utf8=%E2%9C%93&q=datasketches)
+# Sketches by [Component
Repository](https://datasketches.apache.org/docs/Architecture/Components.html)
The DataSketches Library is organized into the following repository groups:
-## Java
+## Java Sketches
### datasketches-java
This repository has the core-java sketching classes, which are leveraged by
some of the other repositories.
@@ -33,33 +33,31 @@ This code is versioned and the latest release can be
obtained from
<b>High-level Repositories Structure</b>
-Sketches-core Packages. | Package Description
--------------------------------|---------------------
-org.apache.datasketches | Common functions and utilities
-org.apache.datasketches.cpc | New Unique Counting Sketch with better
accuracy per size than HLL
-org.apache.datasketches.fdt | Frequent Distinct Tuples Sketch.
-org.apache.datasketches.frequencies | Frequent Item Sketches, for both longs
and generics
-org.apache.datasketches.hash | The 128-bit MurmurHash3 and adaptors
-org.apache.datasketches.hll | Unique counting HLL sketches for both
heap and off-heap.
-org.apache.datasketches.hllmap | The (HLL) Unique Count Map Sketch
-org.apache.datasketches.kll | Quantiles sketch with better accuracy
per size than the standard quantiles sketch. Includes PMF, CDF functions, for
floats, doubles. On-heap & off-heap.
-org.apache.datasketches.quantiles | Standard Quantiles sketch, plus PMF and
CDF functions, for doubles and generics. On-heap & off-heap.
-org.apache.datasketches.req | Relative Error Quantiles (REQ) sketch,
plus PMF and CDF functions for floats, on-heap. Extremely high accuracy for
very high ranks (e.g., 99.999%ile), or very low ranks (e.g., .00001%ile.
-org.apache.datasketches.sampling | Weighted and uniform reservoir sampling
with generics
-org.apache.datasketches.theta | Unique counting Theta Sketches for both
on-heap & off-heap
-org.apache.datasketches.tuple | Tuple sketches for both primitives and
generics
-org.apache.datasketches.tuple.adouble | A Tuple sketch with a Summary of a
single double
-org.apache.datasketches.tuple.aninteger | A Tuple sketch with a Summary of a
single integer
-org.apache.datasketches.tuple.Strings | A Tuple sketch with a Summary of an
array of Strings
-
-### datasketches-memory
-This code is versioned and the latest release can be obtained from
-[Downloads](https://datasketches.apache.org/docs/Community/Downloads.html).
-
-Memory Packages | Package Description
--------------------------------|---------------------
-org.apache.datasketches.memory | Low level, high-performance Memory
data-structure management primarily for off-heap.
-
+Packages (org.apache.datasketches.*) | Description
+----------------------------------------|---------------------
+common | Common functions and utilities
+cpc | New Unique Counting Sketch with better accuracy per size
than HLL
+fdt | Frequent Distinct Tuples Sketch.
+filters | Bloomfilter, Quotientfilter, etc.
+frequencies | Frequent Item Sketches, for both longs and generics
+hash | The 128-bit MurmurHash3 and adaptors
+hll | Unique counting HLL sketches for both heap and off-heap.
+hllmap | The (HLL) Unique Count Map Sketch
+kll | Quantiles sketch with better accuracy per size than the
standard quantiles sketch. Includes PMF, CDF functions, for floats, doubles.
On-heap & off-heap.
+partitions | Special tools to enable large-scale partitioning using the
quantiles sketches.
+quantiles | Standard Quantiles sketch, plus PMF and CDF functions, for
doubles and generics. On-heap & off-heap.
+quantilescommon | Common functions used by all the quantiles sketches.
+req | Relative Error Quantiles (REQ) sketch, plus PMF and CDF
functions for floats, on-heap. Extremely high accuracy for very high ranks
(e.g., 99.999%ile), or very low ranks (e.g., .00001%ile.
+sampling | Weighted and uniform reservoir sampling with generics
+theta | Unique counting Theta Sketches for both on-heap & off-heap
+thetacommon | Common functions used by all the Theta and Tuple sketches
+tuple | Tuple sketches for both primitives and generics
+tuple.adouble | A Tuple sketch with a Summary of a single double
+tuple.arrayofdoubles | Dedicated implementation of a Tuple sketch with an
array of doubles Summary.
+tuple.aninteger | A Tuple sketch with a Summary of a single integer
+tuple.Strings | A Tuple sketch with a Summary of an array of Strings
+
+## Java Platform Adaptors
### datasketches-hive
This repository contains Hive UDFs and UDAFs for use within Hadoop grid
enviornments.
@@ -68,15 +66,16 @@ Users of this code are advised to use Maven to bring in all
the required depende
This code is versioned and the latest release can be obtained from
[Downloads](https://datasketches.apache.org/docs/Community/Downloads.html).
-Sketches-hive Packages | Package Description
+Packages (org.apache.datasketches.*) | Description
-------------------------------------|---------------------
-org.apache.datasketches.hive.cpc | Hive UDF and UDAFs for CPC sketches
-org.apache.datasketches.hive.frequencies | Hive UDF and UDAFs for Frequent
Items sketches
-org.apache.datasketches.hive.hll | Hive UDF and UDAFs for HLL sketches
-org.apache.datasketches.hive.kll | Hive UDF and UDAFs for KLL sketches
-org.apache.datasketches.hive.quantiles | Hive UDF and UDAFs for Quantiles
sketches
-org.apache.datasketches.hive.theta | Hive UDF and UDAFs for Theta
sketches
-org.apache.datasketches.hive.tuple | Hive UDF and UDAFs for Tuple
sketches
+common | Common functions
+hive.cpc | Hive UDF and UDAFs for CPC sketches
+hive.frequencies | Hive UDF and UDAFs for Frequent Items sketches
+hive.hll | Hive UDF and UDAFs for HLL sketches
+hive.kll | Hive UDF and UDAFs for KLL sketches
+hive.quantiles | Hive UDF and UDAFs for Quantiles sketches
+hive.theta | Hive UDF and UDAFs for Theta sketches
+hive.tuple | Hive UDF and UDAFs for Tuple sketches
### datasketches-pig
This repository contains Pig User Defined Functions (UDF) for use within
Hadoop grid environments.
@@ -85,67 +84,45 @@ Users of this code are advised to use Maven to bring in all
the required depende
This code is versioned and the latest release can be obtained from
[Downloads](https://datasketches.apache.org/docs/Community/Downloads.html).
-Sketches-pig Packages | Package Description
------------------------------------|---------------------
-org.apache.datasketches.pig.cpc | Pig UDFs for CPC sketches
-org.apache.datasketches.pig.frequencies | Pig UDFs for Frequent Items sketches
-org.apache.datasketches.pig.hash | Pig UDFs for MurmerHash3
-org.apache.datasketches.pig.hll | Pig UDFs for HLL sketches
-org.apache.datasketches.pig.kll | Pig UDFs for KLL sketches
-org.apache.datasketches.pig.quantiles | Pig UDFs for Quantiles sketches
-org.apache.datasketches.pig.sampling. | Pig UDFs for Sampling sketches
-org.apache.datasketches.pig.theta | Pig UDFs for Theta sketches
-org.apache.datasketches.pig.tuple | Pig UDFs for Tuple sketches
-
-
-### datasketches-characterization
-This relatively new repository is for Java and C++ code that we use to
characterize the accuracy and speed performance of the sketches in
-the library and is constantly being updated. Examples of the job command
files used for various tests can be found in the src/main/resources directory.
-Some of these tests can run for hours depending on its configuration. This
component is not formally released and code must be obtained from
-the [GitHub site](https://github.com/apache/datasketches-characterization).
-
-Characterization Packages | Package Description
-------------------------------------------------|---------------------
-org.apache.datasketches.characterization | Common functions and
utilities
-org.apache.datasketches.characterization.concurrent | Concurrent Theta Sketch
-org.apache.datasketches.characterization.cpc | Compressed
Probabilistic Counting Sketch
-org.apache.datasketches.characterization.fdt | Frequent Distinct
Tuples Sketch
-org.apache.datasketches.characterization.frequencies | Frequent Items Sketches
-org.apache.datasketches.characterization.hash | Hash function
performance
-org.apache.datasketches.characterization.hll | HyperLogLog Sketcch
-org.apache.datasketches.characterization.memory | Memory performance
-org.apache.datasketches.characterization.quantiles | Quantiles performance
-org.apache.datasketches.characterization.theta | Theta Sketch
-org.apache.datasketches.characterization.uniquecount | Base Profiles for
Unique Counting Sketches
-
-### datasketches-server
-This is a new repository for our experimental docker/container server that
enables easy access to the core sketches in the library via HTTP.
-This component is not formally released and code must be obtained from
-the [GitHub site](https://github.com/apache/datasketches-server).
-
-#### C++ Characterizations
-* CPC
-* Frequent Items
-* HLL
-* KLL
-* Theta
-
-
-### datasketches-vector
-This component implements the [Frequent Directions
Algorithm](/docs/Community/Research.html) [GLP16]. It is still experimental in
that the theoretical work has not yet supplied a suitable measure of error for
production work. It can be used as is, but it will not go through a formal
Apache Release until we can find a way to provide better error properties. It
has a dependence on the Memory component.
-This component is not formally released and code must be obtained from
-the [GitHub site](https://github.com/apache/datasketches-vector).
-
-
-## C++ and Python
+Packages (org.apache.datasketches.*) | Description
+-------------------------------------|---------------------
+pig.cpc | Pig UDFs for CPC sketches
+pig.frequencies | Pig UDFs for Frequent Items sketches
+pig.hash | Pig UDFs for MurmerHash3
+pig.hll | Pig UDFs for HLL sketches
+pig.kll | Pig UDFs for KLL sketches
+pig.quantiles | Pig UDFs for Quantiles sketches
+pig.sampling. | Pig UDFs for Sampling sketches
+pig.theta | Pig UDFs for Theta sketches
+pig.tuple | Pig UDFs for Tuple sketches
+
+## C++ Sketches
### datasketches-cpp
This is the evolving C++ implementations of the same sketches that are
available in Java.
These implementations are *binary compatible* with their counterparts in Java.
-In other words, a sketch created and stored in C++ can be opened and read in
Java and visa-versa.
+In other words, a sketch created and serialized in C++ can be opened and read
in Java and visa-versa.
This code is versioned and the latest release can be obtained from
[Downloads](https://datasketches.apache.org/docs/Community/Downloads.html).
+Directory | Description
+------------------|---------------------
+common | Common functions
+count | Count-Min Sketch
+cpc | CPC Sketch
+density | Density Sketch
+fi | Frequent Items Sketch
+hll | HLL Sketch
+kll | KLL Sketch
+quantiles | Classic Quantiles Sketch
+req | REQ Sketch
+sampling | Sampling sketches
+tdigest | t-Digest Sketch
+theta | Theta sketches
+tuple | Tuple sketches
+
+## C++ Platform Adaptors
+
This site also has our [Python
adaptors](https://github.com/apache/datasketches-cpp/tree/master/python) that
basically wrap the C++ implementations, making the high performance C++
implementations available from Python.
### datasketches-postgresql
@@ -153,9 +130,46 @@ This site provides the postgres-specific adaptors that
wrap the C++ implementati
them available to the PostgreSQL database users. PostgreSQL users should
download the PostgreSQL extension from
[pgxn.org](https://pgxn.org/dist/datasketches/). For examples refer to the
README on the component site. This code is versioned and the latest release can
be obtained from
[Downloads](https://datasketches.apache.org/docs/Community/Downloads.html).
+Files (src/*) | Description
+-----------------------|---------------------
+aod_sketch_c_adapter.h | Tuple Array-Of-Doubles Sketch
+cpc_sketch_c_adapter.h | CPC Sketch
+frequent_strings_sketch_c_adapter.h | Frequent Strings Sketch
+hll_sketch_c_adapter.h | HLL Sketch
+kll_double_sketch_c_adapter.h | KLL Doubles Sketch
+kll_float_sketch_c_adapter.h | KLL Floats Sketch
+quantiles_double_sketch_c_adapter.h | Classic Doubles Quantiles Sketch
+req_float_sketch_c_adapter.h | REQ Floats Sketch
+theta_sketch_c_adapter.h | Theta Sketch
+
+## Python Sketches
+
+### datasketches-python
+Files (src/*) | Description
+-----------------------|---------------------
+count_wrapper.cpp | Count-Min Sketch
+cpc_wrapper.cpp | CPC Sketch
+density_wrapper.cpp | Density Sketch
+ebpps_wrapper.cpp | EB-PPS Sampling Sketch
+fi_wrapper.cpp | Frequent Items Sketch
+hll_wrapper.cpp | HLL Sketch
+kll_wrapper.cpp | KLL Sketch
+quantiles_wrapper.cpp | Classic Quantiles Sketch
+req_wrapper.cpp | REQ Sketch
+theta_wrapper.cpp | Theta sketches
+tuple_wrapper.cpp | Tuple sketches
+vector_of_kll.cpp | KLL Vector
+vo_wrapper.cpp | VarOpt Sketch
+
+## Other
+### datasketches-server
+This is a new experimental repository for our experimental docker/container
server that enables easy access to the core sketches in the library via HTTP.
+This component is not formally released and code must be obtained from
+the [GitHub site](https://github.com/apache/datasketches-server).
-
-
-
+### datasketches-vector
+This experimental component implements the [Frequent Directions
Algorithm](/docs/Community/Research.html) [GLP16]. It is still experimental in
that the theoretical work has not yet supplied a suitable measure of error for
production work. It can be used as is, but it will not go through a formal
Apache Release until we can find a way to provide better error properties. It
has a dependence on the Memory component.
+This component is not formally released and code must be obtained from
+the [GitHub site](https://github.com/apache/datasketches-vector).
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]