sanjeet006py commented on code in PR #8237:
URL: https://github.com/apache/hbase/pull/8237#discussion_r3249760526
##########
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableRegionLocator.java:
##########
@@ -112,6 +113,54 @@ default CompletableFuture<List<HRegionLocation>>
getRegionLocations(byte[] row)
*/
CompletableFuture<List<HRegionLocation>> getAllRegionLocations();
+ /**
+ * Bulk lookup of region locations from {@code hbase:meta} in a single RPC,
starting at
+ * {@code startKey} (region start-key boundary, inclusive) and returning at
most {@code limit}
+ * regions in start-key order.
+ * <p/>
+ * The returned list includes all replicas of each region (matching
+ * {@link #getAllRegionLocations()}), and the result is also written to the
connection's region
+ * location cache.
+ * <p/>
+ * Ordering: regions are returned in ascending region start-key order (the
natural order of
+ * {@code hbase:meta} rows for a single table). Within each region, replicas
are returned in
+ * ascending replica-id order (replica 0, then 1, then 2, ...). Split
parents are filtered out,
+ * which may cause a page to contain fewer than {@code limit} regions but
never disturbs ordering
Review Comment:
Thanks for flagging — let me share my understanding and please correct me if
I've got this wrong.
I think merge parents don't need filtering here because of how splits vs
merges record cleanup state in `hbase:meta`:
- **Split:** the parent row stays around with
`setOffline(true).setSplit(true)` (see `RegionStateStore#splitRegion`) and
carries `info:splitA`/`info:splitB` pointers. The catalog janitor seems to use
that row to track whether daughters still hold HFile references, and only
deletes the parent row + HDFS dir once references are gone. So during that
window, meta scans do see the parent — which is why
`excludeOfflinedSplitParents` exists.
- **Merge:** from what I can see in `RegionStateStore#mergeRegions`, the
same multi-mutate that commits the merge *deletes* every parent row and writes
a single child row with `info:merge_0000`, `info:merge_0001`, … pointing back
at the parents. So the "waiting on HFile cleanup" bookkeeping lives on the
child row, not on the parents.
If that's right, a meta scan should never see merge parents, and the merged
child row is itself the live region for that key range — which is what we'd
want to return. This also seems to match `getAllRegionLocations()`'s existing
behavior; both share
`CollectRegionLocationsVisitor(excludeOfflinedSplitParents=true)`.
Does this line up with your understanding? Happy to add a comment near the
visitor call site noting this if it'd help future readers, or revisit the
filter if I'm missing a case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]