[ 
https://issues.apache.org/jira/browse/NIFI-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12825:
--------------------------------
    Status: Patch Available  (was: Open)

> Implement processor to get row key ranges for HBase regions
> -----------------------------------------------------------
>
>                 Key: NIFI-12825
>                 URL: https://issues.apache.org/jira/browse/NIFI-12825
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Emilio Setiadarma
>            Assignee: Emilio Setiadarma
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> A common way for parallelizing scan operations to HBase is to scan by row key 
> ranges. In the HBase architecture, HBase splits tables into regions, each 
> with a range of row keys. These row key ranges are mutually exclusive, and 
> they include all the row keys.
> The manual approach currently to parallelize scans to HBase via row key 
> ranges is to go to HBase shell, perform the "list_regions" function to obtain 
> row key ranges. This approach has its downsides, most importantly being the 
> fact that row key ranges are not static. HBase regions may also split, 
> creating two regions with the row key range split in the middle.
> Providing a way for NiFi to obtain these row key ranges per HBase region 
> could help improve the ease of creating a flow that performs scans to HBase 
> parallelized by row key range. Once we know row key ranges, this information 
> could be easily fed into a scanning processor (i.e. ScanHBase).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to