[
https://issues.apache.org/jira/browse/NIFI-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773639#comment-17773639
]
ASF subversion and git services commented on NIFI-11985:
--------------------------------------------------------
Commit e4f43379577207eb0b4402647a56c972febeb3d8 in nifi's branch
refs/heads/support/nifi-1.x from Chris Sampson
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=e4f4337957 ]
NIFI-11985: Add ConsumeElasticsearch processor
Signed-off-by: Joe Gresock <[email protected]>
This closes #7671.
> Implement a processor to consume documents from Elasticsearch indices
> ---------------------------------------------------------------------
>
> Key: NIFI-11985
> URL: https://issues.apache.org/jira/browse/NIFI-11985
> Project: Apache NiFi
> Issue Type: New Feature
> Reporter: Chris Sampson
> Assignee: Chris Sampson
> Priority: Minor
> Fix For: 1.latest, 2.latest
>
> Attachments: NIFI-11985_Flow.json
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> It is possible to use Elasticsearch to store series data, i.e. data is
> continually added to an Elasticsearch index over time, with a {{date}} or a
> 1-up numeric {{long}} field.
> This is more likely with the advent of [Data
> Streams|https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html]
> or the recent [Time Series Data
> Streams|https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html],
> both of which use a {{@timestamp}} field to indicate when a document was
> added to the stream.
> There are use cases where NiFi users may want to consume new data from the
> Elasticsearch index/data stream after it's arrived, then pass it to another
> service.
> NiFi would need to:
> * know which field to use as the "series field" (e.g. {{@timestamp}})
> * track the last read "series field" value via State so that the same
> documents are not retrieved from Elasticsearch multiple times
> * allow for the optional specification of the "last read" field value, e.g.
> if a user wants to offset the start of the documents to be read (this value
> should only be used if a value doesn't also exist within the processor's
> State)
> * allow for the fact that the "last read" vlaue will be blank when the
> processor is first run (and the value is not otherwise specified), meaning we
> want to retrieve all existing data
> * allow for users to specify an optional Query Filter to apply to the search
> within Elasticsearch when finding documents to retrieve
> Possible implementations should consider using the {{SearchElasticsearch}}
> processor as a basis, which already uses State tracking between processor
> executions and allows for the retrieval of Elasticsearch documents in a
> paginated manner (thus avoiding pulling too much data in a single request).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)