[ 
https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=632309&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-632309
 ]

ASF GitHub Bot logged work on HIVE-25061:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Aug/21 12:42
            Start Date: 02/Aug/21 12:42
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on a change in pull request #2225:
URL: https://github.com/apache/hive/pull/2225#discussion_r680937890



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java
##########
@@ -664,6 +632,87 @@ public Object computeValue(Object row) throws 
HiveException {
    */
   public abstract boolean isEqual(Object v1, Object v2);
 
+  protected int binarySearchBack(int r, PTFPartition p, Object sortKey, int 
amt,
+      Order order) throws HiveException {
+    boolean isOrderDesc = order.equals(Order.DESC);
+    Object rowVal = sortKey;
+
+    int rMin = -1;  // tracks lowest possible number fulfilling the range 
requirement
+    int rMax = r; // tracks highest possible number fulfilling the range 
requirement
+
+    boolean isDistanceGreater = true;
+    while (true) {
+      if (rMin == rMax) {
+        return rMin;
+      }
+      isDistanceGreater =
+          isDistanceGreater(isOrderDesc ? rowVal : sortKey, isOrderDesc ? 
sortKey : rowVal, amt);
+      if (!isDistanceGreater && r == 0) {

Review comment:
       we need this, otherwise there is a scenario, when the algorithm finds 
that we need to step further, even beyond 0, so:
   ```
   isDistanceGreater = false
   r == 0
   ```
   in this case rMax becomes rowId (which is 0), and as rMin is still -1, so:
   ```
   rowId = rMin + (rMax - rMin) / 2 = -1
   ```
   
   which could lead to this exception:
   ```
   java.lang.IndexOutOfBoundsException: Index: -1, Size: 21
        at java.util.LinkedList.checkElementIndex(LinkedList.java:555)
        at java.util.LinkedList.get(LinkedList.java:476)
        at 
org.apache.hadoop.hive.ql.udf.ptf.TestBoundaryCache.lambda$setupMocks$3(TestBoundaryCache.java:256)
        at 
org.apache.hadoop.hive.ql.exec.PTFPartition.getAt(PTFPartition.java:120)
        at 
org.apache.hadoop.hive.ql.udf.ptf.ValueBoundaryScanner.computeValueUseCache(ValueBoundaryScanner.java:302)
        at 
org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.binarySearchBack(ValueBoundaryScanner.java:657)
        at 
org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.binarySearchBack(ValueBoundaryScanner.java:635)
        at 
org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeStartPreceding(ValueBoundaryScanner.java:423)
        at 
org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeStartPreceding(ValueBoundaryScanner.java:397)
        at 
org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeStart(ValueBoundaryScanner.java:387)
        at 
org.apache.hadoop.hive.ql.udf.ptf.SingleValueBoundaryScanner.computeStart(ValueBoundaryScanner.java:385)
   ```
   
   I can put comment for clarity's sake




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 632309)
    Time Spent: 2h 40m  (was: 2.5h)

> PTF: Improve ValueBoundaryScanner
> ---------------------------------
>
>                 Key: HIVE-25061
>                 URL: https://issues.apache.org/jira/browse/HIVE-25061
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> -First, I need to check whether TreeMap is really needed for our case.-
> It turned out a binary-ish search approach could help in range calculation, 
> as we're searching in an ordered set of values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to