[
https://issues.apache.org/jira/browse/NIFI-12825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824219#comment-17824219
]
ASF subversion and git services commented on NIFI-12825:
--------------------------------------------------------
Commit bee65b8447303a49a5a244aed027ea387c96a2d8 in nifi's branch
refs/heads/main from Emilio Setiadarma
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=bee65b8447 ]
NIFI-12825: implemented ListHBaseRegions processor
Signed-off-by: Matt Burgess <[email protected]>
This closes #8439
> Implement processor to get row key ranges for HBase regions
> -----------------------------------------------------------
>
> Key: NIFI-12825
> URL: https://issues.apache.org/jira/browse/NIFI-12825
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Emilio Setiadarma
> Assignee: Emilio Setiadarma
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> A common way for parallelizing scan operations to HBase is to scan by row key
> ranges. In the HBase architecture, HBase splits tables into regions, each
> with a range of row keys. These row key ranges are mutually exclusive, and
> they include all the row keys.
> The manual approach currently to parallelize scans to HBase via row key
> ranges is to go to HBase shell, perform the "list_regions" function to obtain
> row key ranges. This approach has its downsides, most importantly being the
> fact that row key ranges are not static. HBase regions may also split,
> creating two regions with the row key range split in the middle.
> Providing a way for NiFi to obtain these row key ranges per HBase region
> could help improve the ease of creating a flow that performs scans to HBase
> parallelized by row key range. Once we know row key ranges, this information
> could be easily fed into a scanning processor (i.e. ScanHBase).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)