[ https://issues.apache.org/jira/browse/NIFI-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Burgess updated NIFI-12825: -------------------------------- Status: Patch Available (was: Open) > Implement processor to get row key ranges for HBase regions > ----------------------------------------------------------- > > Key: NIFI-12825 > URL: https://issues.apache.org/jira/browse/NIFI-12825 > Project: Apache NiFi > Issue Type: New Feature > Reporter: Emilio Setiadarma > Assignee: Emilio Setiadarma > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > A common way for parallelizing scan operations to HBase is to scan by row key > ranges. In the HBase architecture, HBase splits tables into regions, each > with a range of row keys. These row key ranges are mutually exclusive, and > they include all the row keys. > The manual approach currently to parallelize scans to HBase via row key > ranges is to go to HBase shell, perform the "list_regions" function to obtain > row key ranges. This approach has its downsides, most importantly being the > fact that row key ranges are not static. HBase regions may also split, > creating two regions with the row key range split in the middle. > Providing a way for NiFi to obtain these row key ranges per HBase region > could help improve the ease of creating a flow that performs scans to HBase > parallelized by row key range. Once we know row key ranges, this information > could be easily fed into a scanning processor (i.e. ScanHBase). > -- This message was sent by Atlassian Jira (v8.20.10#820010)