leerho commented on code in PR #475:
URL: https://github.com/apache/datasketches-java/pull/475#discussion_r1409954772
##########
src/main/java/org/apache/datasketches/kll/KllItemsSketchSortedView.java:
##########
@@ -132,18 +204,36 @@ public double[] getPMF(final T[] splitPoints, final
QuantileSearchCriteria searc
public T getQuantile(final double rank, final QuantileSearchCriteria
searchCrit) {
if (isEmpty()) { throw new SketchesArgumentException(EMPTY_MSG); }
QuantilesUtil.checkNormalizedRankBounds(rank);
+ final int index = getQuantileIndex(rank, searchCrit);
+ return getQuantileFromIndex(index);
+ }
+
+ private T getQuantileFromIndex(final int index) { return quantiles[index]; }
Review Comment:
No it is not. When constructing the partition boundaries arrays, I need to
obtain both the quantile and the cumWeight at the same time. Without the
index, I would have to search the quantiles array and the cumWeights array
separately. This way, I only need to do one search. The index is a direct
result of the underlying search. Once I have the index, getting the quantile
and the cumWt are trivial lookups.
##########
src/main/java/org/apache/datasketches/kll/KllItemsSketchSortedView.java:
##########
@@ -116,10 +132,66 @@ public long[] getCumulativeWeights() {
return cumWeights.clone();
}
- @Override //implemented here because it needs the comparator
+ @Override
+ public T getMaxItem() {
+ return maxItem;
+ }
+
+ @Override
+ public T getMinItem() {
+ return minItem;
+ }
+
+ @Override
+ public long getN() {
+ return totalN;
+ }
+
+ @Override
+ public double[] getNormalizedRanks() {
+ return normRanks.clone();
+ }
+
+ @Override
+ @SuppressWarnings("unchecked")
+ public GenericPartitionBoundaries<T> getPartitionBoundaries(final int
numEquallySized,
+ final QuantileSearchCriteria searchCrit) {
+ if (isEmpty()) { throw new
IllegalArgumentException(QuantilesAPI.EMPTY_MSG); }
+ final long totalN = this.totalN;
+ final int svLen = cumWeights.length;
+ //adjust ends of sortedView arrays
+ cumWeights[0] = 1L;
+ cumWeights[svLen - 1] = totalN;
+ normRanks[0] = 1.0 / totalN;
+ normRanks[svLen - 1] = 1.0;
+ quantiles[0] = this.getMinItem();
+ quantiles[svLen - 1] = this.getMaxItem();
+
+ final double[] evSpNormRanks = evenlySpacedDoubles(0, 1.0, numEquallySized
+ 1);
+ final int len = evSpNormRanks.length;
+ final T[] evSpQuantiles = (T[]) Array.newInstance(clazz, len);
+
+ final long[] evSpNatRanks = new long[len];
+ for (int i = 0; i < len; i++) {
+ final int index = getQuantileIndex(evSpNormRanks[i], searchCrit);
+ evSpQuantiles[i] = getQuantileFromIndex(index);
+ evSpNatRanks[i] = getCumWeightFromIndex(index);
+ }
+ final GenericPartitionBoundaries<T> gpb = new GenericPartitionBoundaries<>(
+ this.totalN,
+ evSpQuantiles.clone(),
Review Comment:
I looked at this more closely. We don't need the clones here. This will be
fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]