davecromberge commented on a change in pull request #369:
URL: https://github.com/apache/datasketches-java/pull/369#discussion_r741475255



##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -177,23 +215,61 @@ public void notB(final Sketch<S> skB) {
    *
    * @param skB The incoming Theta sketch for the second (or following) 
argument <i>B</i>.
    */
+  @SuppressWarnings("unchecked")
   public void notB(final org.apache.datasketches.theta.Sketch skB) {
-    if (empty_ || skB == null || skB.isEmpty()) { return; }
-    //skB is not empty
-    final long thetaLongB = skB.getThetaLong();
-    thetaLong_ = Math.min(thetaLong_, thetaLongB);
+    if (skB == null) { return; } //ignore
 
-    //process B
-    final DataArrays<S> daB = getResultArraysTheta(thetaLong_, curCount_, 
hashArr_, summaryArr_, skB);
-    hashArr_ = daB.hashArr;
-    summaryArr_ = daB.summaryArr;
-
-    curCount_ = hashArr_.length;
-    empty_ = curCount_ == 0 && thetaLong_ == Long.MAX_VALUE;
+    final long thetaLongB = skB.getThetaLong();
+    final int countB = skB.getRetainedEntries();
+    final boolean emptyB = skB.isEmpty();
+
+    final int id =
+        SetOperationCornerCases.createCornerCaseId(thetaLong_, curCount_, 
empty_, thetaLongB, countB, emptyB);
+    final CornerCase cCase = CornerCase.idToCornerCase(id);
+    final AnotbResult anotbResult = cCase.getAnotbResult();
+
+    switch (anotbResult) {
+      case NEW_1_0_T: {
+        reset();
+        break;
+      }
+      case RESULTDEGEN_MIN_0_F: {
+        reset();
+        thetaLong_ = min(thetaLong_, thetaLongB);
+        empty_ = false;
+        break;
+      }
+      case RESULTDEGEN_THA_0_F: {
+        empty_ = false;
+        curCount_ = 0;
+        //thetaLong_ is ok
+        break;
+      }
+      case SKA_TRIM: {
+        thetaLong_ = min(thetaLong_, thetaLongB);
+        final DataArrays<S> da = trimDataArrays(hashArr_, 
summaryArr_,thetaLong_);
+        hashArr_ = da.hashArr;
+        curCount_ = (hashArr_ == null) ? 0 : hashArr_.length;
+        summaryArr_ = da.summaryArr;
+        break;
+      }
+      case SKETCH_A: {
+        break; //result is already in A
+      }
+      case FULL_ANOTB: { //both A and B should have valid entries.
+        thetaLong_ = min(thetaLong_, skB.getThetaLong());

Review comment:
       thetaLongB is already available in this scope and can be used instead of 
skB.getThetaLong.

##########
File path: 
src/test/java/org/apache/datasketches/tuple/aninteger/MikhailsBugTupleTest.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.tuple.aninteger;
+
+import org.apache.datasketches.tuple.AnotB;
+import org.apache.datasketches.tuple.CompactSketch;
+import org.apache.datasketches.tuple.Intersection;
+import org.testng.annotations.Test;
+
+/**
+ * Issue #368, from Mikhail Lavrinovich 12 OCT 2021
+ * The failure was AnotB(estimating {<1.0,1,F}, 
Intersect(estimating{<1.0,1,F}, newDegenerative{<1.0,0,T},
+ * Which should be equal to AnotB(estimating{<1.0,1,F}, new{1.0,0,T} = 
estimating{<1.0, 1, F}. The AnotB
+ * threw a null pointer exception because it was not properly handling 
sketches with zero entries.
+ */
+public class MikhailsBugTupleTest {
+
+  @Test
+  public void mikhailsBug() {
+    IntegerSketch x = new IntegerSketch(12, 2, 0.1f, IntegerSummary.Mode.Min);
+    IntegerSketch y = new IntegerSketch(12, 2, 0.1f, IntegerSummary.Mode.Min);
+    x.update(1L, 1);
+    IntegerSummarySetOperations setOperations =
+        new IntegerSummarySetOperations(IntegerSummary.Mode.Min, 
IntegerSummary.Mode.Min);
+    Intersection<IntegerSummary> intersection = new 
Intersection<>(setOperations);
+    CompactSketch<IntegerSummary> intersect = intersection.intersect(x, y);
+    AnotB.aNotB(x, intersect); // NPE was here
+  }
+
+  //@Test
+  public void withTuple() {
+    IntegerSketch x = new IntegerSketch(12, 2, 0.1f, IntegerSummary.Mode.Min);
+    IntegerSketch y = new IntegerSketch(12, 2, 0.1f, IntegerSummary.Mode.Min);
+    x.update(1L, 1);
+    println("Tuple x: Estimating {<1.0,1,F}");
+    println(x.toString());
+    println("Tuple y: NewDegenerative {<1.0,0,T}");
+    println(y.toString());
+    IntegerSummarySetOperations setOperations =
+        new IntegerSummarySetOperations(IntegerSummary.Mode.Min, 
IntegerSummary.Mode.Min);
+    Intersection<IntegerSummary> intersection = new 
Intersection<>(setOperations);
+    CompactSketch<IntegerSummary> intersect = intersection.intersect(x, y);
+    println("Tuple Intersect(Estimating, NewDegen) = new {1.0, 0, T}");
+    println(intersect.toString());
+    CompactSketch<IntegerSummary> csk = AnotB.aNotB(x, intersect);
+    println("Tuple AnotB(Estimating, New) = estimating {<1.0, 1, F}");
+    println(csk.toString());

Review comment:
       Would it be a good idea to use SetOperationCornerCases to verify the 
corner case matches - or is this unnecessary?

##########
File path: 
src/test/java/org/apache/datasketches/theta/CornerCaseThetaSetOperationsTest.java
##########
@@ -0,0 +1,558 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.theta;
+
+import org.testng.annotations.Test;
+
+public class CornerCaseThetaSetOperationsTest {
+
+  /* Hash Values
+   * 9223372036854775807  Theta = 1.0
+   *
+   * 6730918654704304314  hash(3L)[0] >>> 1    GT_MIDP
+   * 4611686018427387904  Theta for p = 0.5f = MIDP
+   * 2206043092153046979  hash(2L)[0] >>> 1    LT_MIDP_V
+   * 1498732507761423037  hash(5L)[0] >>> 1    LTLT_MIDP_V
+   *
+   * 1206007004353599230  hash(6L)[0] >>> 1    GT_LOWP_V
+   *  922337217429372928  Theta for p = 0.1f = LOWP
+   *  593872385995628096  hash(4L)[0] >>> 1    LT_LOWP_V
+   *  405753591161026837  hash(1L)[0] >>> 1    LTLT_LOWP_V
+   */
+
+  private static final long GT_MIDP_V   = 3L;
+  private static final float MIDP       = 0.5f;
+  private static final long LT_MIDP_V   = 2L;
+
+  private static final long GT_LOWP_V   = 6L;
+  private static final float LOWP       = 0.1f;
+  private static final long LT_LOWP_V   = 4L;
+
+  private static final double MIDP_THETA = MIDP;
+  private static final double LOWP_THETA = LOWP;
+
+  private enum SkType {
+    NEW,          //{ 1.0,  0, T} Bin: 101  Oct: 05
+    EXACT,        //{ 1.0, >0, F} Bin: 111  Oct: 07, specify only value
+    ESTIMATION,   //{<1.0, >0, F} Bin: 010  Oct: 02, specify only value
+    NEW_DEGEN,    //{<1.0,  0, T} Bin: 001  Oct: 01, specify only p
+    RESULT_DEGEN  //{<1.0,  0, F} Bin: 000  Oct: 0, specify p, value
+  }

Review comment:
       The website assigns 6 as octal id for the EXACT sketch type, which is 
incorrect - I have created a PR to update this on the website docs.

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -143,19 +142,58 @@ public void setA(final Sketch<S> skA) {
    *
    * @param skB The incoming Tuple sketch for the second (or following) 
argument <i>B</i>.
    */
+  @SuppressWarnings("unchecked")
   public void notB(final Sketch<S> skB) {
-    if (empty_ || skB == null || skB.isEmpty() || hashArr_ == null) { return; }
-    //skB is not empty
-    final long thetaLongB = skB.getThetaLong();
-    thetaLong_ = Math.min(thetaLong_, thetaLongB);
-
-    //process B
-    final DataArrays<S> daB = getResultArraysTuple(thetaLong_, curCount_, 
hashArr_, summaryArr_, skB);
-    hashArr_ = daB.hashArr;
-    summaryArr_ = daB.summaryArr;
+    if (skB == null) { return; } //ignore
 
-    curCount_ = hashArr_.length;
-    empty_ = curCount_ == 0 && thetaLong_ == Long.MAX_VALUE;
+    final long thetaLongB = skB.getThetaLong();
+    final int countB = skB.getRetainedEntries();
+    final boolean emptyB = skB.isEmpty();
+
+    final int id =
+        SetOperationCornerCases.createCornerCaseId(thetaLong_, curCount_, 
empty_, thetaLongB, countB, emptyB);
+    final CornerCase cCase = CornerCase.idToCornerCase(id);
+    final AnotbResult anotbResult = cCase.getAnotbResult();
+
+    switch (anotbResult) {
+      case NEW_1_0_T: {
+        reset();
+        break;

Review comment:
       Where sketch A is a NewDegen {<1.0,0,T}, why does the result not 
preserve the min theta rule?

##########
File path: src/main/java/org/apache/datasketches/tuple/AnotB.java
##########
@@ -143,19 +142,58 @@ public void setA(final Sketch<S> skA) {
    *
    * @param skB The incoming Tuple sketch for the second (or following) 
argument <i>B</i>.
    */
+  @SuppressWarnings("unchecked")
   public void notB(final Sketch<S> skB) {
-    if (empty_ || skB == null || skB.isEmpty() || hashArr_ == null) { return; }
-    //skB is not empty
-    final long thetaLongB = skB.getThetaLong();
-    thetaLong_ = Math.min(thetaLong_, thetaLongB);
-
-    //process B
-    final DataArrays<S> daB = getResultArraysTuple(thetaLong_, curCount_, 
hashArr_, summaryArr_, skB);
-    hashArr_ = daB.hashArr;
-    summaryArr_ = daB.summaryArr;
+    if (skB == null) { return; } //ignore
 
-    curCount_ = hashArr_.length;
-    empty_ = curCount_ == 0 && thetaLong_ == Long.MAX_VALUE;
+    final long thetaLongB = skB.getThetaLong();
+    final int countB = skB.getRetainedEntries();
+    final boolean emptyB = skB.isEmpty();
+
+    final int id =
+        SetOperationCornerCases.createCornerCaseId(thetaLong_, curCount_, 
empty_, thetaLongB, countB, emptyB);
+    final CornerCase cCase = CornerCase.idToCornerCase(id);
+    final AnotbResult anotbResult = cCase.getAnotbResult();
+
+    switch (anotbResult) {
+      case NEW_1_0_T: {
+        reset();
+        break;
+      }
+      case RESULTDEGEN_MIN_0_F: {
+        reset();
+        thetaLong_ = min(thetaLong_, thetaLongB);
+        empty_ = false;
+        break;
+      }
+      case RESULTDEGEN_THA_0_F: {
+        empty_ = false;
+        curCount_ = 0;
+        //thetaLong_ is ok
+        break;

Review comment:
       It is fascinating that Theta sketch estimates are still calculated when 
the result contains no retained entries.  As you mention on the website, this 
is probably too complex a process to explain but for reference does this 
process occur in `BinomialBoundsN.java`?

##########
File path: src/main/java/org/apache/datasketches/tuple/CompactSketch.java
##########
@@ -240,6 +242,34 @@ public int getCountLessThanThetaLong(final long thetaLong) 
{
     return bytes;
   }
 
+  @SuppressWarnings("unchecked")
+  CompactSketch<S> trimToTheta(final long thetaLong) {
+    final QuickSelectSketch<S> qsSk =
+        new QuickSelectSketch<>(this.getRetainedEntries(), 
ResizeFactor.X1.lg(), null);
+    int countOut = 0;
+    final SketchIterator<S> it = iterator();
+
+    while (it.next()) {
+      final long hash = it.getHash();
+      final S summary = it.getSummary();
+      if (hash < thetaLong) {
+        qsSk.insert(it.getHash(), (S)summary.copy());
+        countOut++;
+      }
+    }
+
+    qsSk.setThetaLong(thetaLong);
+    if (countOut == 0) {
+      if (thetaLong == Long.MAX_VALUE) {
+        return new CompactSketch<>(null, null, thetaLong, true);
+      } else {
+        return new CompactSketch<>(null, null, thetaLong, false);
+      }

Review comment:
       In the case where thetaLong is less than the max value, because it was 
constructed as a NewDegen {<1.0,0,T}, would using this as a check for emptiness 
be valid?

##########
File path: 
src/test/java/org/apache/datasketches/theta/CornerCaseThetaSetOperationsTest.java
##########
@@ -0,0 +1,558 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.theta;
+
+import org.testng.annotations.Test;
+
+public class CornerCaseThetaSetOperationsTest {
+
+  /* Hash Values
+   * 9223372036854775807  Theta = 1.0
+   *
+   * 6730918654704304314  hash(3L)[0] >>> 1    GT_MIDP
+   * 4611686018427387904  Theta for p = 0.5f = MIDP
+   * 2206043092153046979  hash(2L)[0] >>> 1    LT_MIDP_V
+   * 1498732507761423037  hash(5L)[0] >>> 1    LTLT_MIDP_V
+   *
+   * 1206007004353599230  hash(6L)[0] >>> 1    GT_LOWP_V
+   *  922337217429372928  Theta for p = 0.1f = LOWP
+   *  593872385995628096  hash(4L)[0] >>> 1    LT_LOWP_V
+   *  405753591161026837  hash(1L)[0] >>> 1    LTLT_LOWP_V
+   */
+
+  private static final long GT_MIDP_V   = 3L;
+  private static final float MIDP       = 0.5f;
+  private static final long LT_MIDP_V   = 2L;
+
+  private static final long GT_LOWP_V   = 6L;
+  private static final float LOWP       = 0.1f;
+  private static final long LT_LOWP_V   = 4L;
+
+  private static final double MIDP_THETA = MIDP;
+  private static final double LOWP_THETA = LOWP;
+
+  private enum SkType {
+    NEW,          //{ 1.0,  0, T} Bin: 101  Oct: 05
+    EXACT,        //{ 1.0, >0, F} Bin: 111  Oct: 07, specify only value
+    ESTIMATION,   //{<1.0, >0, F} Bin: 010  Oct: 02, specify only value
+    NEW_DEGEN,    //{<1.0,  0, T} Bin: 001  Oct: 01, specify only p
+    RESULT_DEGEN  //{<1.0,  0, F} Bin: 000  Oct: 0, specify p, value
+  }
+
+  //NOTE: 0 values in getSketch are not used.
+
+  private static void checks(
+      UpdateSketch thetaA,
+      UpdateSketch thetaB,
+      double resultInterTheta,
+      int resultInterCount,
+      boolean resultInterEmpty,
+      double resultAnotbTheta,
+      int resultAnotbCount,
+      boolean resultAnotbEmpty) {
+    CompactSketch csk;
+
+    //Intersection
+    Intersection inter = SetOperation.builder().buildIntersection();
+
+    csk = inter.intersect(thetaA, thetaB);
+    checkResult("Intersect Stateless Theta, Theta", csk, resultInterTheta, 
resultInterCount, resultInterEmpty);
+    csk = inter.intersect(thetaA.compact(), thetaB.compact());
+    checkResult("Intersect Stateless Theta, Theta", csk, resultInterTheta, 
resultInterCount, resultInterEmpty);
+
+    //AnotB
+    AnotB anotb = SetOperation.builder().buildANotB();
+
+    csk = anotb.aNotB(thetaA, thetaB);
+    checkResult("AnotB Stateless Theta, Theta", csk, resultAnotbTheta, 
resultAnotbCount, resultAnotbEmpty);
+    csk = anotb.aNotB(thetaA.compact(), thetaB.compact());
+    checkResult("AnotB Stateless Theta, Theta", csk, resultAnotbTheta, 
resultAnotbCount, resultAnotbEmpty);
+
+    anotb.setA(thetaA);
+    anotb.notB(thetaB);
+    csk = anotb.getResult(true);
+    checkResult("AnotB Stateful Theta, Theta", csk, resultAnotbTheta, 
resultAnotbCount, resultAnotbEmpty);
+
+    anotb.setA(thetaA.compact());
+    anotb.notB(thetaB.compact());
+    csk = anotb.getResult(true);
+    checkResult("AnotB Stateful Theta, Theta", csk, resultAnotbTheta, 
resultAnotbCount, resultAnotbEmpty);
+  }
+
+
+  @Test
+  public void newNew() {
+    UpdateSketch thetaA = getSketch(SkType.NEW,    0, 0);
+    UpdateSketch thetaB = getSketch(SkType.NEW,    0, 0);
+    final double resultInterTheta = 1.0;
+    final int resultInterCount = 0;
+    final boolean resultInterEmpty = true;
+    final double resultAnotbTheta = 1.0;
+    final int resultAnotbCount = 0;
+    final boolean resultAnotbEmpty = true;
+
+    checks(thetaA, thetaB, resultInterTheta, resultInterCount, 
resultInterEmpty,
+        resultAnotbTheta, resultAnotbCount, resultAnotbEmpty);
+  }
+
+  @Test
+  public void newExact() {
+    UpdateSketch thetaA = getSketch(SkType.NEW,    0, 0);
+    UpdateSketch thetaB = getSketch(SkType.EXACT,  0, GT_MIDP_V);
+    final double resultInterTheta = 1.0;
+    final int resultInterCount = 0;
+    final boolean resultInterEmpty = true;
+    final double resultAnotbTheta = 1.0;
+    final int resultAnotbCount = 0;
+    final boolean resultAnotbEmpty = true;
+
+    checks(thetaA, thetaB, resultInterTheta, resultInterCount, 
resultInterEmpty,
+        resultAnotbTheta, resultAnotbCount, resultAnotbEmpty);
+  }

Review comment:
       What effect does a sampling probability of zero have on whether a sketch 
stores a value in its buffer?  This question is in reference to line 118 where 
an exact sketch is created with a value greater than the MIDP_V.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to