[
https://issues.apache.org/jira/browse/NIFI-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Burgess updated NIFI-7745:
-------------------------------
Status: Patch Available (was: In Progress)
> Add a SampleRecord processor
> ----------------------------
>
> Key: NIFI-7745
> URL: https://issues.apache.org/jira/browse/NIFI-7745
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Sampling records in a flowfile can be a helpful way to test with "real" data,
> especially for source systems that contain large datasets. It may not be
> possible on the source system to sample the data or test NiFi flows on
> smaller datasets from the source system(s). Sampling in NiFi may be currently
> possible (such as QueryRecord with row numbers), but is likely done in-memory
> (in the QueryRecord case) or in a simplistic fashion.
> This Jira proposes a SampleRecord processor that should offer (at the least)
> the following sampling options:
> Interval Sampling (every Nth record)
> Probabilistic Sampling (each record has a probability P of being chosen)
> Reservoir Sampling (A sample of size K with each record having equal
> probability of being chosen)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)