kadirozde commented on code in PR #2428:
URL: https://github.com/apache/phoenix/pull/2428#discussion_r3192229704


##########
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##########
@@ -792,6 +792,76 @@ public int compare(QueryPlan plan1, QueryPlan plan2) {
     return stopAtBestPlan ? bestCandidates.subList(0, 1) : bestCandidates;
   }
 
+  /**
+   * Variant of {@link 
org.apache.phoenix.compile.ScanRanges#getBoundPkColumnCount} for the cost
+   * comparator: only honors the extra span reported by {@code slotSpan[i]} 
when the slot's ranges
+   * genuinely span multiple PK columns via byte-level encoding. A slot whose 
only ranges are
+   * non-point finite-bounded scalars (e.g. {@code BETWEEN a AND b} on a 
single PK col) under V2
+   * can carry an artificial {@code slotSpan[i] > 0} set by
+   * {@link 
org.apache.phoenix.compile.keyspace.KeyRangeExtractor#emitV1Projection} as a
+   * V1-compat marker for {@link org.apache.phoenix.filter.SkipScanFilter}'s 
per-region cursor
+   * stepping. That marker is needed for SkipScanFilter correctness on 
CDC-style scans but
+   * would inflate the PK-col count the cost comparator ranks plans by, 
letting a plan with a
+   * narrow bounded range on one PK col tie with a plan whose compound range 
genuinely covers
+   * multiple PK cols. Tie on primary key falls through to weaker tiebreakers 
that flip the
+   * plan choice (see CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3 
and its
+   * Upsert/Delete variants).
+   * <p>
+   * Rule: {@code slotSpan[i]} is honored unless every range in the slot is a 
non-point
+   * finite-bounded scalar. Point keys (single-key compound bytes encoding 
multiple PK cols,
+   * e.g. from RVC-IN or tuple equality) and ranges with an UNBOUND side 
(where trailing cols
+   * inherit the unbounded side's min/max at the byte level) both represent 
genuine multi-col
+   * narrowing and are counted as {@code slotSpan[i]+1}.
+   */
+  private static int effectiveBoundPkColumnCount(
+      org.apache.phoenix.compile.ScanRanges scanRanges) {
+    java.util.List<java.util.List<org.apache.phoenix.query.KeyRange>> ranges =
+      scanRanges.getRanges();
+    int[] slotSpan = scanRanges.getSlotSpans();
+    int count = 0;
+    boolean hasUnbound = false;
+    int nRanges = ranges.size();
+    for (int i = 0; i < nRanges && !hasUnbound; i++) {
+      java.util.List<org.apache.phoenix.query.KeyRange> orRanges = 
ranges.get(i);
+      boolean slotHasUnbound = false;
+      boolean slotAllBoundedNonPoint = true;
+      for (org.apache.phoenix.query.KeyRange range : orRanges) {
+        if (range == org.apache.phoenix.query.KeyRange.EVERYTHING_RANGE) {
+          return count;
+        }
+        if (range.isUnbound()) {
+          slotHasUnbound = true;
+          hasUnbound = true;
+        }
+        if (range.isSingleKey() || range.isUnbound()) {
+          slotAllBoundedNonPoint = false;
+        }
+      }
+      int span = (slotSpan != null && i < slotSpan.length) ? slotSpan[i] : 0;
+      // V2-only artificial extension: {@code 
KeyRangeExtractor.emitV1Projection} bumps
+      // {@code slotSpan[last]} to span trailing unconstrained PK cols for
+      // {@link SkipScanFilter} cursor stepping, even when the slot holds a 
single-column
+      // finite-bounded scalar range whose bytes don't actually encode 
trailing cols.
+      // For cost ranking we want to count only columns genuinely narrowed by 
the scan
+      // bounds, so discount that shape to 1.
+      //
+      // Discriminator: the extension path ALWAYS has >=2 slots (at least one 
leading
+      // concretely-narrowed slot plus the last bumped slot). A single-slot 
shape with
+      // {@code slotSpan > 0} is the compound-emission path — a genuine 
multi-col range
+      // whose bytes concatenate {@code span+1} columns — and keeps {@code 
span + 1}.
+      // See CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3 
(multi-slot shape
+      // must discount) vs. 
RowTimestampIT.testAutomaticallySettingRowTimestampWith*
+      // (single-slot compound must not discount).
+      boolean isArtificialExtension = nRanges > 1 && i == nRanges - 1;

Review Comment:
   Good catch on the single-slot extension case — the nRanges > 1 guard was too 
coarse and would have over-counted it. But i == nRanges - 1 alone would 
re-introduce a different regression: single-slot compound ranges 
(RowTimestampIT.testAutomaticallySettingRowTimestampWith* — a PK1=v AND PK2 
BETWEEN a AND b data-table plan emits one slot with slotSpan=[1] and a compound 
range concatenating PK1's bytes + separator + PK2's bytes) are genuine 2-col 
narrowing that must count as span+1, not 1.
   
   The distinguishing signal is byte-vs-schema: a compound range's bytes decode 
into span+1 PK fields; a V1-compat extension's bytes cover one field. I 
switched the discriminator to a byte-length decode against the schema so both 
cases resolve correctly:
   
   Multi-slot extension 
(CostBasedDecisionIT.testCostOverridesStaticPlanOrdering3): bytes cover 1 field 
→ discount to 1.
   Single-slot extension (your new case): bytes cover 1 field → discount to 1.
   Single-slot compound (RowTimestampIT): bytes cover span+1 fields → keep 
span+1.
   Point-key compound 
(testQueryOptimizerShouldSelectThePlanWithMoreNumberOfPKColumns): gated out by 
slotAllBoundedNonPoint=false, falls through to span+1.
   Verified: RowTimestampIT 24/24, CostBasedDecisionIT 20/20, 
WhereOptimizerTest 142/142, QueryOptimizerTest 48/48.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to