mihir6692 commented on code in PR #5675:
URL: https://github.com/apache/hbase/pull/5675#discussion_r1485211063
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java:
##########
@@ -510,19 +510,38 @@ public Void call() {
private Void readColumnFamily(Table table, ColumnFamilyDescriptor column) {
byte[] startKey = null;
- Get get = null;
Scan scan = null;
ResultScanner rs = null;
StopWatch stopWatch = new StopWatch();
startKey = region.getStartKey();
// Can't do a get on empty start row so do a Scan of first element if
any instead.
if (startKey.length > 0) {
- get = new Get(startKey);
+ Get get = new Get(startKey);
get.setCacheBlocks(false);
get.setFilter(new FirstKeyOnlyFilter());
get.addFamily(column.getName());
+ // Converting get object to scan to enable RAW SCAN.
+ // This will work for all the regions of the HBase tables except first
region of the table.
+ scan = new Scan(get);
+ scan.setRaw(rawScanEnabled);
} else {
scan = new Scan();
+ // In case of first region of the HBase Table, we do not have
start-key for the region.
+ // For Region Canary, we only need scan a single row/cell in the
region to make sure that
+ // region is accessible.
+ //
+ // When HBase table has more than 1 empty regions at start of the
row-key space, Canary will
+ // create multiple scan object to find first available row in the
table by scanning all the
+ // regions in sequence until it can find first available row.
+ //
+ // This could result in multiple millions of scans based on the size
of table and number of
+ // empty regions in sequence. In test environment, A table no data and
1000 empty regions,
+ // Single canary run was creating close to half million to 1 million
scans to successfully
+ // do canary run for the table.
+ //
+ // Since First region of the table doesn't have any start key, We
should set End Key as
+ // stop row and set inclusive=false to limit scan to single region
only.
Review Comment:
Added some more comments as well as TODO for future.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]