sanjeet006py commented on code in PR #8237:
URL: https://github.com/apache/hbase/pull/8237#discussion_r3249760526


##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableRegionLocator.java:
##########
@@ -112,6 +113,54 @@ default CompletableFuture<List<HRegionLocation>> 
getRegionLocations(byte[] row)
    */
   CompletableFuture<List<HRegionLocation>> getAllRegionLocations();
 
+  /**
+   * Bulk lookup of region locations from {@code hbase:meta} in a single RPC, 
starting at
+   * {@code startKey} (region start-key boundary, inclusive) and returning at 
most {@code limit}
+   * regions in start-key order.
+   * <p/>
+   * The returned list includes all replicas of each region (matching
+   * {@link #getAllRegionLocations()}), and the result is also written to the 
connection's region
+   * location cache.
+   * <p/>
+   * Ordering: regions are returned in ascending region start-key order (the 
natural order of
+   * {@code hbase:meta} rows for a single table). Within each region, replicas 
are returned in
+   * ascending replica-id order (replica 0, then 1, then 2, ...). Split 
parents are filtered out,
+   * which may cause a page to contain fewer than {@code limit} regions but 
never disturbs ordering

Review Comment:
   Thanks for flagging — let me share my understanding and please correct me if 
I've got this wrong.
   
   I think merge parents don't need filtering here because of how splits vs 
merges record cleanup state in `hbase:meta`:
   
   - **Split:** the parent row stays around with 
`setOffline(true).setSplit(true)` (see `RegionStateStore#splitRegion`) and 
carries `info:splitA`/`info:splitB` pointers. The catalog janitor seems to use 
that row to track whether daughters still hold HFile references, and only 
deletes the parent row + HDFS dir once references are gone. So during that 
window, meta scans do see the parent — which is why 
`excludeOfflinedSplitParents` exists.
   
   - **Merge:** from what I can see in `RegionStateStore#mergeRegions`, the 
same multi-mutate that commits the merge *deletes* every parent row and writes 
a single child row with `info:merge_0000`, `info:merge_0001`, … pointing back 
at the parents. So the "waiting on HFile cleanup" bookkeeping lives on the 
child row, not on the parents.
   
   If that's right, a meta scan should never see merge parents, and the merged 
child row is itself the live region for that key range — which is what we'd 
want to return. This also seems to match `getAllRegionLocations()`'s existing 
behavior; both share 
`CollectRegionLocationsVisitor(excludeOfflinedSplitParents=true)`.
   
   Does this line up with your understanding? Happy to add a comment near the 
visitor call site noting this if it'd help future readers, or revisit the 
filter if I'm missing a case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to