[ 
https://issues.apache.org/jira/browse/NIFI-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397446#comment-15397446
 ] 

ASF GitHub Bot commented on NIFI-2417:
--------------------------------------

GitHub user gresockj opened a pull request:

    https://github.com/apache/nifi/pull/733

    NIFI-2417: Implementing QueryElasticsearchHttp and ScrollElasticsearchHttp

    I have implemented these processors for my own project, and thought it 
might be useful to submit them to NiFi.  They are based on 
FetchElasticsearchHttp, and have the following execution designs:
    
    - QueryElasticsearchHttp - submits an ES query and pages through the 
results in a single execution, emitting one flow file per document.  Allows 
both flow file input (in case the flow file has an attribute with the query to 
run) and non-input execution.
    - ScrollElasticsearchHttp - submits an ES query and uses the scroll API to 
scroll through the results.  The scroll_id for each respective page is kept in 
the state management for the processor, and each subsequent execution of the 
processor emits a single page of documents as a flow file.  We found this to be 
the most efficient way to scroll through a huge result set, as in the case of 
reindexing Elasticsearch, without losing our place if NiFi goes down.  The only 
quirky thing is that the processor state must be cleared before another query 
can be run, but this is documented in the processor, and jives with the use 
case of only being needed for rare events like a reindex.
    
    Since the processors already work correctly in our system, I am no longer 
authorized to put time into making major modifications to the code.  As a 
result, if any re-designs of this code is desired, I will be unable to put time 
toward it.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gresockj/nifi NIFI-2417

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #733
    
----
commit 5bbe09e2a7c4689bfa01588260ea89d2375e8356
Author: Joe Gresock <[email protected]>
Date:   2016-07-28T11:44:29Z

    NIFI-2417: Implementing QueryElasticsearchHttp and ScrollElasticsearchHttp

----


> Implement Query and Scroll processors for ElasticSearch
> -------------------------------------------------------
>
>                 Key: NIFI-2417
>                 URL: https://issues.apache.org/jira/browse/NIFI-2417
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Joseph Gresock
>            Assignee: Joseph Gresock
>            Priority: Minor
>
> FetchElasticsearchHttp allows users to select a single document from 
> Elasticsearch in NiFi, but there is no way to run a query to retrieve 
> multiple documents.
> We should add a QueryElasticsearchHttp processor for running a query and 
> returning a flow file per result, for small result sets.  This should allow 
> both input and non-input execution.  
> A separate ScrollElasticsearchHttp processor would also be useful for 
> scrolling through a huge result set.  This should use the state manager to 
> maintain the scroll_id value, and use this as input to the next scroll page.  
> As a result, this processor should not allow flow file input, but should 
> retrieve one page per run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to