[
https://issues.apache.org/jira/browse/NIFI-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Sampson updated NIFI-11985:
---------------------------------
Status: Patch Available (was: In Progress)
> Implement a processor to consume documents from Elasticsearch indices
> ---------------------------------------------------------------------
>
> Key: NIFI-11985
> URL: https://issues.apache.org/jira/browse/NIFI-11985
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Chris Sampson
> Assignee: Chris Sampson
> Priority: Minor
>
> It is possible to use Elasticsearch to store series data, i.e. data is
> continually added to an Elasticsearch index over time, with a {{date}} or a
> 1-up numeric {{long}} field.
> This is more likely with the advent of [Data
> Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
> or the recent [Time Series Data
> Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
> both of which use a {{@timestamp}} field to indicate when a document was
> added to the stream.
> There are use cases where NiFi users may want to consume new data from the
> Elasticsearch index/data stream after it's arrived, then pass it to another
> service.
> NiFi would need to:
> * know which field to use as the "series field" (e.g. {{@timestamp}})
> * track the last read "series field" value via State so that the same
> documents are not retrieved from Elasticsearch multiple times
> * allow for the optional specification of the "last read" field value, e.g.
> if a user wants to offset the start of the documents to be read (this value
> should only be used if a value doesn't also exist within the processor's
> State)
> * allow for the fact that the "last read" vlaue will be blank when the
> processor is first run (and the value is not otherwise specified), meaning we
> want to retrieve all existing data
> * allow for users to specify an optional Query Filter to apply to the search
> within Elasticsearch when finding documents to retrieve
> Possible implementations should consider using the {{SearchElasticsearch}}
> processor as a basis, which already uses State tracking between processor
> executions and allows for the retrieval of Elasticsearch documents in a
> paginated manner (thus avoiding pulling too much data in a single request).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)