[
https://issues.apache.org/jira/browse/NIFI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186828#comment-17186828
]
ASF subversion and git services commented on NIFI-7745:
-------------------------------------------------------
Commit 3952c70448f053cd3f7cea27665dc29ac0a999f4 in nifi's branch
refs/heads/main from Matt Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3952c70 ]
NIFI-7745: Add SampleRecord processor
This closes #4482.
Signed-off-by: Joey Frazee <[email protected]>
> Add a SampleRecord processor
> ----------------------------
>
> Key: NIFI-7745
> URL: https://issues.apache.org/jira/browse/NIFI-7745
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Priority: Major
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> Sampling records in a flowfile can be a helpful way to test with "real" data,
> especially for source systems that contain large datasets. It may not be
> possible on the source system to sample the data or test NiFi flows on
> smaller datasets from the source system(s). Sampling in NiFi may be currently
> possible (such as QueryRecord with row numbers), but is likely done in-memory
> (in the QueryRecord case) or in a simplistic fashion.
> This Jira proposes a SampleRecord processor that should offer (at the least)
> the following sampling options:
> Interval Sampling (every Nth record)
> Probabilistic Sampling (each record has a probability P of being chosen)
> Reservoir Sampling (A sample of size K with each record having equal
> probability of being chosen)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)