[ 
https://issues.apache.org/jira/browse/NIFI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Witt updated NIFI-7745:
---------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Add a SampleRecord processor
> ----------------------------
>
>                 Key: NIFI-7745
>                 URL: https://issues.apache.org/jira/browse/NIFI-7745
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>             Fix For: 1.13.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Sampling records in a flowfile can be a helpful way to test with "real" data, 
> especially for source systems that contain large datasets. It may not be 
> possible on the source system to sample the data or test NiFi flows on 
> smaller datasets from the source system(s). Sampling in NiFi may be currently 
> possible (such as QueryRecord with row numbers), but is likely done in-memory 
> (in the QueryRecord case) or in a simplistic fashion.
> This Jira proposes a SampleRecord processor that should offer (at the least) 
> the following sampling options:
> Interval Sampling (every Nth record)
> Probabilistic Sampling (each record has a probability P of being chosen)
> Reservoir Sampling (A sample of size K with each record having equal 
> probability of being chosen)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to