[
https://issues.apache.org/jira/browse/NIFI-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Sampson updated NIFI-11985:
---------------------------------
Description:
It is possible to use Elasticsearch to store series data, i.e. data is
continually added to an Elasticsearch index over time, with a {{date}} or a
1-up numeric {{long}} field.
This is more likely with the advent of [Data
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
or the recent [Time Series Data
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
both of which use a {{@timestamp}} field to indicate when a document was added
to the stream.
There are use cases where NiFi users may want to consume new data from the
Elasticsearch index/data stream after it's arrived, then pass it to another
service.
NiFi would need to:
* know which field to use as the "series field" (e.g. {{@timestamp}})
* track the last read "series field" value via State so that the same documents
are not retrieved from Elasticsearch multiple times
* allow for the optional specification of the "last read" field value, e.g. if
a user wants to offset the start of the documents to be read (this value should
only be used if a value doesn't also exist within the processor's State)
* allow for the fact that the "last read" vlaue will be blank when the
processor is first run (and the value is not otherwise specified), meaning we
want to retrieve all existing data
* allow for users to specify an optional Query Filter to apply to the search
within Elasticsearch when finding documents to retrieve
Possible implementations should consider using the {{SearchElasticsearch}}
processor as a basis, which already uses State tracking between processor
executions and allows for the retrieval of Elasticsearch documents in a
paginated manner (thus avoiding pulling too much data in a single request).
was:
It is possible to use Elasticsearch to store series data, i.e. data is
continually added to an Elasticsearch index over time, with a {{date}} or a
1-up numeric {{long}} field.
This is more likely with the advent of [Data
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
or the recent [Time Series Data
Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
both of which use a {{@timestamp}} field to indicate when a document was added
to the stream.
There are use cases where NiFi users may want to consume new data from the
Elasticsearch index/data stream after it's arrived, then pass it to another
service.
NiFi would need to know which field to use as the "series field" (e.g.
{{@timestamp}}) and track this via State so that the same documents are not
retrieved from Elasticsearch multiple times. Possible implementations should
consider using the {{SearchElasticsearch}} processor as a basis, which already
uses State tracking between processor executions and allows for the retrieval
of Elasticsearch documents in a paginated manner (thus avoiding pulling too
much data in a single request).
> Implement a processor to consume documents from Elasticsearch indices
> ---------------------------------------------------------------------
>
> Key: NIFI-11985
> URL: https://issues.apache.org/jira/browse/NIFI-11985
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Chris Sampson
> Priority: Minor
>
> It is possible to use Elasticsearch to store series data, i.e. data is
> continually added to an Elasticsearch index over time, with a {{date}} or a
> 1-up numeric {{long}} field.
> This is more likely with the advent of [Data
> Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
> or the recent [Time Series Data
> Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
> both of which use a {{@timestamp}} field to indicate when a document was
> added to the stream.
> There are use cases where NiFi users may want to consume new data from the
> Elasticsearch index/data stream after it's arrived, then pass it to another
> service.
> NiFi would need to:
> * know which field to use as the "series field" (e.g. {{@timestamp}})
> * track the last read "series field" value via State so that the same
> documents are not retrieved from Elasticsearch multiple times
> * allow for the optional specification of the "last read" field value, e.g.
> if a user wants to offset the start of the documents to be read (this value
> should only be used if a value doesn't also exist within the processor's
> State)
> * allow for the fact that the "last read" vlaue will be blank when the
> processor is first run (and the value is not otherwise specified), meaning we
> want to retrieve all existing data
> * allow for users to specify an optional Query Filter to apply to the search
> within Elasticsearch when finding documents to retrieve
> Possible implementations should consider using the {{SearchElasticsearch}}
> processor as a basis, which already uses State tracking between processor
> executions and allows for the retrieval of Elasticsearch documents in a
> paginated manner (thus avoiding pulling too much data in a single request).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)