[ 
https://issues.apache.org/jira/browse/NIFI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186828#comment-17186828
 ] 

ASF subversion and git services commented on NIFI-7745:
-------------------------------------------------------

Commit 3952c70448f053cd3f7cea27665dc29ac0a999f4 in nifi's branch 
refs/heads/main from Matt Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=3952c70 ]

NIFI-7745: Add SampleRecord processor

This closes #4482.

Signed-off-by: Joey Frazee <[email protected]>


> Add a SampleRecord processor
> ----------------------------
>
>                 Key: NIFI-7745
>                 URL: https://issues.apache.org/jira/browse/NIFI-7745
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Sampling records in a flowfile can be a helpful way to test with "real" data, 
> especially for source systems that contain large datasets. It may not be 
> possible on the source system to sample the data or test NiFi flows on 
> smaller datasets from the source system(s). Sampling in NiFi may be currently 
> possible (such as QueryRecord with row numbers), but is likely done in-memory 
> (in the QueryRecord case) or in a simplistic fashion.
> This Jira proposes a SampleRecord processor that should offer (at the least) 
> the following sampling options:
> Interval Sampling (every Nth record)
> Probabilistic Sampling (each record has a probability P of being chosen)
> Reservoir Sampling (A sample of size K with each record having equal 
> probability of being chosen)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to