[
https://issues.apache.org/jira/browse/NIFI-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397446#comment-15397446
]
ASF GitHub Bot commented on NIFI-2417:
--------------------------------------
GitHub user gresockj opened a pull request:
https://github.com/apache/nifi/pull/733
NIFI-2417: Implementing QueryElasticsearchHttp and ScrollElasticsearchHttp
I have implemented these processors for my own project, and thought it
might be useful to submit them to NiFi. They are based on
FetchElasticsearchHttp, and have the following execution designs:
- QueryElasticsearchHttp - submits an ES query and pages through the
results in a single execution, emitting one flow file per document. Allows
both flow file input (in case the flow file has an attribute with the query to
run) and non-input execution.
- ScrollElasticsearchHttp - submits an ES query and uses the scroll API to
scroll through the results. The scroll_id for each respective page is kept in
the state management for the processor, and each subsequent execution of the
processor emits a single page of documents as a flow file. We found this to be
the most efficient way to scroll through a huge result set, as in the case of
reindexing Elasticsearch, without losing our place if NiFi goes down. The only
quirky thing is that the processor state must be cleared before another query
can be run, but this is documented in the processor, and jives with the use
case of only being needed for rare events like a reindex.
Since the processors already work correctly in our system, I am no longer
authorized to put time into making major modifications to the code. As a
result, if any re-designs of this code is desired, I will be unable to put time
toward it.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gresockj/nifi NIFI-2417
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/733.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #733
----
commit 5bbe09e2a7c4689bfa01588260ea89d2375e8356
Author: Joe Gresock <[email protected]>
Date: 2016-07-28T11:44:29Z
NIFI-2417: Implementing QueryElasticsearchHttp and ScrollElasticsearchHttp
----
> Implement Query and Scroll processors for ElasticSearch
> -------------------------------------------------------
>
> Key: NIFI-2417
> URL: https://issues.apache.org/jira/browse/NIFI-2417
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Affects Versions: 1.0.0
> Reporter: Joseph Gresock
> Assignee: Joseph Gresock
> Priority: Minor
>
> FetchElasticsearchHttp allows users to select a single document from
> Elasticsearch in NiFi, but there is no way to run a query to retrieve
> multiple documents.
> We should add a QueryElasticsearchHttp processor for running a query and
> returning a flow file per result, for small result sets. This should allow
> both input and non-input execution.
> A separate ScrollElasticsearchHttp processor would also be useful for
> scrolling through a huge result set. This should use the state manager to
> maintain the scroll_id value, and use this as input to the next scroll page.
> As a result, this processor should not allow flow file input, but should
> retrieve one page per run.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)