Chris Sampson created NIFI-11985:
------------------------------------
Summary: Implement a processor to consume documents from
Elasticsearch indices
Key: NIFI-11985
URL: https://issues.apache.org/jira/browse/NIFI-11985
Project: Apache NiFi
Issue Type: New Feature
Reporter: Chris Sampson
It is possible to use Elasticsearch to store series data, i.e. data is
continually added to an Elasticsearch index over time, with a {{date}} or a
1-up numeric {{long}} field.
This is more likely with the advent of [Data
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
or the recent [Time Series Data
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
both of which use a {{@timestamp}} field to indicate when a document was added
to the stream.
There are use cases where NiFi users may want to consume new data from the
Elasticsearch index/data stream after it's arrived, then pass it to another
service.
NiFi would need to know which field to use as the "series field" (e.g.
{{@timestamp}}) and track this via State so that the same documents are not
retrieved from Elasticsearch multiple times. Possible implementations should
consider using the {{SearchElasticsearch}} processor as a basis, which already
uses State tracking between processor executions and allows for the retrieval
of Elasticsearch documents in a paginated manner (thus avoiding pulling too
much data in a single request).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)