abiesps commented on issue #15197:
URL: https://github.com/apache/lucene/issues/15197#issuecomment-3339845848

   Ok, I got a working solution. Code is really in very early stage (I need to 
clean up a lot). 
   
   **Approach** 
   
   At a high level, this is what I am doing:  
   1. Start traversing the BKD tree as is, 
   2. If cell is inside the query 
       a) Call goes to visitDocIDs method variation where if isLeaf is true, 
      b)  Do not read the leaf node as of yet. Instead get the leafOrdinal, if 
this leaf ordinal is in continuation to last matching leaf ordinal (which I am 
storing in visitor) i.e leafOrdinal == visitor.lastMatchingLeafOrdinal() + 1. 
Do not call prefetch as this should be hopefully taken care by kernel 
readaheads. Otherwise if its not a continuous match i.e leafOrdinal != 
visitor.lastMatchingLeafOrdinal() + 1, then call prefetch on this leaf node 
file pointer's first page. Also store this leaf node fp in visitor for visting 
matching doc IDs later.  For early termination, also pass number of matching 
points to visitor from "int count = isLastLeaf() ? lastLeafNodePointCount : 
config.maxPointsInLeafNode();" 
    c) If its not a leafNode continue with recursion as is. 
   
   3. For remaining traversa,l code remains unchanged. 
   4. Once the traversal is complete, I am calling visitDocIDs on the leaf file 
pointers I stored for visting later on. 
   
   
   
   
   **Benchmark Results**
   
   I was able to do a cold index test, this is the test that I wrote 
(https://github.com/abiesps/lucene-learnings/blob/main/src/main/java/com/sps/lucene/learnings/LuceneBKDTraversalPrefetchBenchmark.java),
 and I am seeing following results 
   
   
   Iteration | p50 - with prefetching (nanos) | p50 - without prefetching  
(nanos) | p90 - with prefetching  (nanos) | p90 - without prefetching  (nanos) 
| p99 - with prefetching  (nanos) | p99 - without prefetching  (nanos)
   -- | -- | -- | -- | -- | -- | --
   1 | 1428747 | 1873421 | 3204614 | 4011134 | 6223980 | 7440390
   2 | 1423662 | 1868136 | 3257577 | 3999692 | 6045371 | 7452711
   3 | 1431170 | 1881138 | 3286315 | 4026663 | 6614156 | 7655345
   4 | 1463966 | 1882202 | 3351880 | 4031743 | 6434328 | 7826272
   5 | 1448586 | 1860455 | 3253163 | 3993159 | 5995588 | 7826855
   6 | 1451429 | 1882752 | 3235150 | 4013533 | 5681142 | 7661777
   7 | 1447876 | 1840063 | 3301089 | 4025598 | 6419166 | 7884995
   8 | 1427931 | 1835098 | 3188111 | 4007786 | 6166556 | 7742863
   9 | 1473577 | 1868620 | 3358062 | 4106067 | 7045409 | 8093894
   10 | 1444055 | 1863376 | 3342137 | 3995365 | 6286142 | 7813705
    Avg across all iteration | 1444099.9 | 1865526.1 | 3277809.8 | 4021074 | 
6291183.8 | 7739880.7
   
   I ran 1000 identical queries, for each iteration with explicitly clearing 
the page cache between runs for with and without page cache.  
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to