[ https://issues.apache.org/jira/browse/NIFI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe Witt updated NIFI-7745: --------------------------- Fix Version/s: 1.13.0 > Add a SampleRecord processor > ---------------------------- > > Key: NIFI-7745 > URL: https://issues.apache.org/jira/browse/NIFI-7745 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions > Reporter: Matt Burgess > Assignee: Matt Burgess > Priority: Major > Fix For: 1.13.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Sampling records in a flowfile can be a helpful way to test with "real" data, > especially for source systems that contain large datasets. It may not be > possible on the source system to sample the data or test NiFi flows on > smaller datasets from the source system(s). Sampling in NiFi may be currently > possible (such as QueryRecord with row numbers), but is likely done in-memory > (in the QueryRecord case) or in a simplistic fashion. > This Jira proposes a SampleRecord processor that should offer (at the least) > the following sampling options: > Interval Sampling (every Nth record) > Probabilistic Sampling (each record has a probability P of being chosen) > Reservoir Sampling (A sample of size K with each record having equal > probability of being chosen) -- This message was sent by Atlassian Jira (v8.3.4#803005)