[jira] [Created] (NIFI-11985) Implement a processor to consume documents from Elasticsearch indices

Chris Sampson (Jira) Wed, 23 Aug 2023 12:27:07 -0700

Chris Sampson created NIFI-11985:
------------------------------------

             Summary: Implement a processor to consume documents from 
Elasticsearch indices
                 Key: NIFI-11985
                 URL: https://issues.apache.org/jira/browse/NIFI-11985
             Project: Apache NiFi
          Issue Type: New Feature
            Reporter: Chris Sampson



It is possible to use Elasticsearch to store series data, i.e. data is 
continually added to an Elasticsearch index over time, with a {{date}} or a 
1-up numeric {{long}} field.

This is more likely with the advent of [Data 
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
 or the recent [Time Series Data 
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
 both of which use a {{@timestamp}} field to indicate when a document was added 
to the stream.

There are use cases where NiFi users may want to consume new data from the 
Elasticsearch index/data stream after it's arrived, then pass it to another 
service.

NiFi would need to know which field to use as the "series field" (e.g. 
{{@timestamp}}) and track this via State so that the same documents are not 
retrieved from Elasticsearch multiple times. Possible implementations should 
consider using the {{SearchElasticsearch}} processor as a basis, which already 
uses State tracking between processor executions and allows for the retrieval 
of Elasticsearch documents in a paginated manner (thus avoiding pulling too 
much data in a single request).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (NIFI-11985) Implement a processor to consume documents from Elasticsearch indices

Reply via email to