Folks, I hope everyone is safe and healthy during these challenging times!

Some updates:

   - The website Downloads
   <https://datasketches.apache.org/docs/Community/Downloads.html> page has
   been completely redesigned and automated.  When any of our components are
   released to dist there is a step in our Release Process
   
<https://dist.apache.org/repos/dist/dev/incubator/datasketches/scripts/APACHE_JAVA_RELEASE_STEPS.md>
that
   just by running a script will automatically update the downloads page with
   the latest release versions.
   - We have also added 3 new TODO lists for Java
   <https://github.com/apache/incubator-datasketches-java/projects/1>, C++
   <https://github.com/apache/incubator-datasketches-cpp/projects/1> and
   the Website
   <https://github.com/apache/incubator-datasketches-website/projects/1>.
   These are brand new and will be filling up with tasks soon.
   - There are a number of new additions to the website that should make it
   easier for users to find the right sketches for their applications:
      - Sketch Features Matrix
      
<https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>.
This
      provides in one view a comparison of the major features of the different
      sketches and sketch families in the library.
      - Features Matrix for Distinct Count Sketches
      <https://datasketches.apache.org/docs/DistinctCountFeaturesMatrix.html>.
      Our library has a wide variety of sketches for counting distinct values,
      each with different capabilities and trade-offs for different
      applications.  This matrix tries to remove some of the mystery by
      highlighting the major differences between the various distinct counting
      sketches.
      - HLL vs CPC Figures of Merit
      <https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html>
There
      is always a lot of interest in the Flajolet, et al, HyperLogLog (HLL)
      sketch.  Not only do we have leading implementations of the HLL
sketch, our
      team developed a new *Compressed Probabilistic Counting* (CPC) sketch
      that outperforms the HLL sketch in terms of accuracy per stored
space. This
      new sketch is discussed briefly on our Research
      <https://datasketches.apache.org/docs/Community/Research.html> page,
      which also links to the theoretical paper
      <https://arxiv.org/abs/1708.06839> that discusses the new algorithm.
      There are also a new section in the Distinct Counting section of the
      website documentation that discusses the CPC sketch along with
programming
      examples.
      - Sketches by Component Repository
      
<https://datasketches.apache.org/docs/Architecture/SketchesByComponent.html>.
      This new page organizes the library by the major repository
components and
      lists the sketches that are available in each of the components.
      - Sketch Criteria for Library Inclusion
      <https://datasketches.apache.org/docs/Architecture/SketchCriteria.html>.
      For new contributors to the library, this page outlines our current
      criteria for including new sketch algorithms into the library.

As always, we look forward to your comments and suggestions!

Cheers,

Lee.

Reply via email to