davecromberge commented on code in PR #411:
URL: https://github.com/apache/datasketches-java/pull/411#discussion_r946748248


##########
src/main/java/org/apache/datasketches/DoublesSortedView.java:
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches;
+
+/**
+ * The Sorted View for double values.
+ *
+ * @author Alexander Saydakov
+ * @author Lee Rhodes
+ */
+public interface DoublesSortedView extends SortedView {
+
+  /**
+   * Gets the quantile based on the given normalized rank, and the given 
search criterion.
+   * @param normalizedRank the given normalized rank, which must be in the 
range [0.0, 1.0].
+   * @param searchCrit the given search criterion to use.
+   * @return the associated quantile value.
+   */
+  double getQuantile(double normalizedRank, QuantileSearchCriteria searchCrit);
+
+  /**
+   * Gets the normalized rank based on the given quantile value.
+   * @param value the given quantile value
+   * @param searchCrit the given search criterion to use.
+   * @return the normalized rank, which is a number in the range [0.0, 1.0].
+   */
+  double getRank(double value, QuantileSearchCriteria searchCrit);
+
+  /**
+   * Returns an array of values where each value is a number in the range 
[0.0, 1.0].
+   * The size of this array is one larger than the size of the input 
splitPoints array.
+   *
+   * <p>If <i>isCdf</i> is true, the points in the returned array are 
monotonically increasing and end with the
+   * value 1.0. Each value represents a point along the cumulative 
distribution function that approximates
+   * the CDF of the input data stream. Therefore, each point represents the 
fractional density of the distribution
+   * between from zero. For example, if one of the returned values is 0.5, 
then the splitPoint corresponding to that
+   * value would be the median of the distribution.</p>
+   *
+   * <p>If <i>isCdf</i> is false, the points in the returned array are not 
monotonic and represent the descrete
+   * derivative of the CDF, or the Probablity Mass Function (PMF). Each 
returned point represents the fractional
+   * area of the total distribution which lies between the previous point (or 
zero) and the given point, which
+   * corresponds to the given splitPoint.<p>
+   *
+   * @param splitPoints the given array of quantile values or splitPoints. 
This is a sorted, unique, monotonic array
+   * of values in the range of (minValue, maxValue). This array should not 
include either the minValue or the maxValue.
+   * The returned array will have one extra interval representing the very top 
of the distribution.
+   * @param isCdf if true, a CDF will be returned, otherwise, a PMF will be 
returned.
+   * @param searchCrit if INCLUSIVE, each interval within the distribution 
will include its top value and exclude its
+   * bottom value. Otherwise, it will be the reverse.  The only exception is 
that the top portion will always include
+   * the top value retained by the sketch.
+   * @return an array of points that correspond to the given splitPoints, and 
represents the data distributio
+   * as a CDF or PMF.
+   */
+  double[] getPmfOrCdf(double[] splitPoints, boolean isCdf, 
QuantileSearchCriteria searchCrit);

Review Comment:
   Boolean parameters can make code harder to read and maintain - could the 
same method be split into two methods instead?  Or would this lead to 
significant duplication?



##########
src/test/java/org/apache/datasketches/GenericInequalitySearchTest.java:
##########
@@ -109,7 +109,7 @@ private void checkBinarySearchFloatLimits(final Float[] 
arr, final int low, fina
     v = highV + 1;
     res = find(arr, low, high, v, LT, comparator);
     println(desc(arr, low, high, v, res, LT, comparator));
-    assertEquals(res, high);
+    assertEquals(res, high); //??

Review Comment:
   The comment here is meaningless.



##########
src/test/java/org/apache/datasketches/ReflectUtility.java:
##########
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches;
+
+import java.lang.reflect.Constructor;
+import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+
+import static org.apache.datasketches.QuantileSearchCriteria.*;
+import static org.testng.Assert.assertEquals;
+
+import org.apache.datasketches.req.ReqSketchSortedView;
+import org.testng.annotations.Test;
+
+public final class ReflectUtility {
+
+  private ReflectUtility() {}
+
+  static final Class<?> REQ_SV;
+  static final Class<?> KLL_FLOATS_SV;
+  static final Class<?> KLL_DOUBLES_SV;
+
+  static final Constructor<?> REQ_SV_CTOR;
+  static final Constructor<?> KLL_FLOATS_SV_CTOR;
+  static final Constructor<?> KLL_DOUBLES_SV_CTOR;
+
+  static {
+    REQ_SV = getClass("org.apache.datasketches.req.ReqSketchSortedView");
+    KLL_FLOATS_SV = 
getClass("org.apache.datasketches.kll.KllFloatsSketchSortedView");
+    KLL_DOUBLES_SV = 
getClass("org.apache.datasketches.kll.KllDoublesSketchSortedView");
+
+    REQ_SV_CTOR = getConstructor(REQ_SV, float[].class, long[].class, 
long.class);
+    KLL_FLOATS_SV_CTOR = getConstructor(KLL_FLOATS_SV, float[].class, 
long[].class, long.class);
+    KLL_DOUBLES_SV_CTOR = getConstructor(KLL_DOUBLES_SV, double[].class, 
long[].class, long.class);
+  }
+
+  @Test //Example
+  public void checkCtr() throws Exception {
+    float[] farr = { 10, 20, 30 };
+    long[] larr = { 1, 2, 3 };
+    long n = 3;
+    ReqSketchSortedView reqSV =
+        (ReqSketchSortedView) REQ_SV_CTOR.newInstance(farr, larr, n);
+    float q = reqSV.getQuantile(1.0, INCLUSIVE);
+    assertEquals(q, 30f);
+  }

Review Comment:
   Why is it necessary to access these constructors via reflection?



##########
src/main/java/org/apache/datasketches/kll/KllQuantilesHelper.java:
##########
@@ -102,6 +88,7 @@ public static int chunkContainingPos(final long[] wtArr, 
final long pos) {
   //
   // A) and B) provide the invariants for our binary search.
   // Observe that they are satisfied by the initial conditions:  l = 0 and r = 
len.
+  @Deprecated

Review Comment:
   These deprecation notices should include the preferred alternative as well 
as the version that this notice applies to (ie. since ...)



##########
src/main/java/org/apache/datasketches/DoublesSortedView.java:
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches;
+
+/**
+ * The Sorted View for double values.
+ *
+ * @author Alexander Saydakov
+ * @author Lee Rhodes
+ */
+public interface DoublesSortedView extends SortedView {
+
+  /**
+   * Gets the quantile based on the given normalized rank, and the given 
search criterion.
+   * @param normalizedRank the given normalized rank, which must be in the 
range [0.0, 1.0].
+   * @param searchCrit the given search criterion to use.
+   * @return the associated quantile value.
+   */
+  double getQuantile(double normalizedRank, QuantileSearchCriteria searchCrit);
+
+  /**
+   * Gets the normalized rank based on the given quantile value.
+   * @param value the given quantile value
+   * @param searchCrit the given search criterion to use.
+   * @return the normalized rank, which is a number in the range [0.0, 1.0].
+   */
+  double getRank(double value, QuantileSearchCriteria searchCrit);
+
+  /**
+   * Returns an array of values where each value is a number in the range 
[0.0, 1.0].
+   * The size of this array is one larger than the size of the input 
splitPoints array.
+   *
+   * <p>If <i>isCdf</i> is true, the points in the returned array are 
monotonically increasing and end with the
+   * value 1.0. Each value represents a point along the cumulative 
distribution function that approximates
+   * the CDF of the input data stream. Therefore, each point represents the 
fractional density of the distribution
+   * between from zero. For example, if one of the returned values is 0.5, 
then the splitPoint corresponding to that
+   * value would be the median of the distribution.</p>

Review Comment:
   `density of the distribution between from zero` may need to be reworded.



##########
src/test/java/org/apache/datasketches/CrossCheckQuantilesTest.java:
##########
@@ -19,300 +19,219 @@
 
 package org.apache.datasketches;
 
-import static org.apache.datasketches.CrossCheckQuantilesTest.PrimType.DOUBLE;
-import static org.apache.datasketches.CrossCheckQuantilesTest.PrimType.FLOAT;
-import static org.apache.datasketches.CrossCheckQuantilesTest.SkType.CLASSIC;
-import static org.apache.datasketches.CrossCheckQuantilesTest.SkType.KLL;
-import static org.apache.datasketches.CrossCheckQuantilesTest.SkType.REQ;
-import static 
org.apache.datasketches.CrossCheckQuantilesTest.SkType.REQ_NO_DEDUP;
-import static org.apache.datasketches.CrossCheckQuantilesTest.SkType.REQ_SV;
+
+import static org.apache.datasketches.QuantileSearchCriteria.INCLUSIVE;
+import static org.apache.datasketches.QuantileSearchCriteria.NON_INCLUSIVE;
+import static 
org.apache.datasketches.QuantileSearchCriteria.NON_INCLUSIVE_STRICT;
+import static org.apache.datasketches.ReflectUtility.KLL_DOUBLES_SV_CTOR;
+import static org.apache.datasketches.ReflectUtility.KLL_FLOATS_SV_CTOR;
+import static org.apache.datasketches.ReflectUtility.REQ_SV_CTOR;
 import static org.testng.Assert.assertEquals;
 
+import org.apache.datasketches.kll.KllDoublesSketch;
+import org.apache.datasketches.kll.KllDoublesSketchSortedView;
 import org.apache.datasketches.kll.KllFloatsSketch;
-import org.apache.datasketches.quantiles.DoublesSketch;
-import org.apache.datasketches.quantiles.UpdateDoublesSketch;
+import org.apache.datasketches.kll.KllFloatsSketchSortedView;
 import org.apache.datasketches.req.ReqSketch;
-import org.apache.datasketches.req.ReqSketchBuilder;
 import org.apache.datasketches.req.ReqSketchSortedView;
 import org.testng.annotations.Test;
 
 public class CrossCheckQuantilesTest {
 
-  enum SkType { REQ, REQ_SV, REQ_NO_DEDUP, KLL, CLASSIC }
+  final int k = 32; //all sketches are in exact mode
 
-  enum PrimType { DOUBLE, FLOAT }
+  //These test sets are specifically designed to test some tough corner cases 
so don't mess with them
+  //  unless you know what you are doing.
+  //These sets must start with 10 and be multiples of 10.

Review Comment:
   Do you think that it is worth documenting what the corner cases are?  Or, 
are they self evident to the expert reader?



##########
src/main/java/org/apache/datasketches/req/ReqSketch.java:
##########
@@ -336,23 +323,17 @@ public double getRankLowerBound(final double rank, final 
int numStdDev) {
 
   @Override
   public double[] getRanks(final float[] values) {
-    return getRanks(values, ltEq);
+    return getRanks(values, ltEq == true ? INCLUSIVE : NON_INCLUSIVE);
   }
 
   @Override
-  public double[] getRanks(final float[] values, final boolean inclusive) {
+  public double[] getRanks(final float[] values, final QuantileSearchCriteria 
searchCrit) {
     if (isEmpty()) { return null; }
+    refreshSortedView();

Review Comment:
   Why is the sorted view refreshed on read-only functions?  Would it be better 
to offset the cost on updates to the sketch or is this necessary because the 
sketch may be backed by direct memory and modified behind the scenes?



##########
src/test/java/org/apache/datasketches/kll/KllMiscFloatsTest.java:
##########
@@ -35,10 +35,22 @@
 /**
  * @author Lee Rhodes
  */
-public class MiscFloatsTest {
+public class KllMiscFloatsTest {
   static final String LS = System.getProperty("line.separator");
   private final MemoryRequestServer memReqSvr = new 
DefaultMemoryRequestServer();
 
+  @Test
+  public void checkConvertToCumulative() {
+    long[] array = {1,2,3,2,1};
+    long out = KllHelper.convertToCumulative(array);
+    assertEquals(out, 9);
+  }
+
+  @Test
+  public void checkSortedViewConstruction() {
+
+  }

Review Comment:
   This test requires implementation or a suitable TODO comment.



##########
src/main/java/org/apache/datasketches/QuantileSearchCriteria.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches;
+
+/**
+ * These search criteria are used by the KLL, REQ and Classic Quantiles 
sketches in the DataSketches library.
+ *
+ * @see <a 
href="https://datasketches.apache.org/docs/Quantiles/SketchingQuantilesAndRanksTutorial.html";>
+ * Sketching Quantiles and Ranks Tutorial</a>
+ *
+ * @author Lee Rhodes
+ */
+public enum QuantileSearchCriteria {
+
+  /**
+   * <b>Definition of INCLUSIVE <i>getQuantile(r)</i> search:</b><br>
+   * Given rank <i>r</i>, return the quantile of the smallest rank that is
+   * strictly greater than or equal to <i>r</i>.
+   *
+   * <p><b>Definition of INCLUSIVE <i>getRank(q)</i> search:</b><br>
+   * Given quantile <i>q</i>, return the rank, <i>r</i>, of the largest 
quantile that is
+   * less than or equal to <i>q</i>.</p>
+   */
+  INCLUSIVE,
+
+  /**
+   * <b>Definition of NON_INCLUSIVE <i>getQuantile(r)</i> search:</b><br>
+   * Given rank <i>r</i>, return the quantile of the smallest rank that is
+   * strictly greater than <i>r</i>.
+   *
+   * <p>If the given rank is is equal to 1.0, or there is no quantile that 
satisfies this criterion
+   * the method will return the largest quantile value retained by the sketch 
as a convenience.
+   * This is not strictly mathematically correct, but very convenient as it is 
most often what we expect and
+   * avoids having to return a <i>NaN</i>.</p>
+   *
+   * <p><b>Definition of NON_INCLUSIVE <i>getRank(q)</i> search:</b><br>
+   * Given quantile <i>q</i>, return the rank, <i>r</i>, of the largest 
quantile that is
+   * strictly less than <i>q</i>.</p>
+   *
+   * <p>If there is no quantile value that is strictly less than <i>q</i>,
+   * the method will return a rank of zero.</p>
+   *
+   */
+  NON_INCLUSIVE,

Review Comment:
   Would `exclusive` be preferred, or is non-inclusive used for duality?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to