Emilio Setiadarma created NIFI-12825:
----------------------------------------
Summary: Implement processor to get row key ranges for HBase
regions
Key: NIFI-12825
URL: https://issues.apache.org/jira/browse/NIFI-12825
Project: Apache NiFi
Issue Type: New Feature
Reporter: Emilio Setiadarma
Assignee: Emilio Setiadarma
A common way for parallelizing scan operations to HBase is to scan by row key
ranges. In the HBase architecture, HBase splits tables into regions, each with
a range of row keys. These row key ranges are mutually exclusive, and they
include all the row keys.
The manual approach currently to parallelize scans to HBase via row key ranges
is to go to HBase shell, perform the "list_regions" function to obtain row key
ranges. This approach has its downsides, most importantly being the fact that
row key ranges are not static. HBase regions may also split, creating two
regions with the row key range split in the middle.
Providing a way for NiFi to obtain these row key ranges per HBase region could
help improve the ease of creating a flow that performs scans to HBase
parallelized by row key range. Once we know row key ranges, this information
could be easily fed into a scanning processor (i.e. ScanHBase).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)