Hello Yogesh, Couple of quick pointers that might help. 1) Set InvokeHTTP to only run on primary node. You don't want all nodes in the cluster pulling the data 2) After SplitJSON use site-to-site to send the splits across the cluster to parallelize the work. Sounds like overkill for what you've described but will certainly scale and get the point across. 3) You're pulling from an endpoint that does not offer queuing semantics so of course duplicates are a thing to consider. Given that it is an hourly dataset it appears i would add DetectDuplicate right after the http pull of JSON and i'd schedule the pull to happen ever 10 or 15 or 30 minutes or so. Take a look at the docs for setting up duplicate detection.
The pattern you've laid out makes sense, is quite straightforward, and is common. Thanks Joe On Wed, Aug 10, 2016 at 6:31 AM, yogesh sharma <[email protected]> wrote: > Hello Team, > > > I am new to Apache Nifi and started working on it. Currently we have Nifi > installed in cluster and that has three nodes. > > > I am facing duplicate data while implementing below use-case, > > > Use Case : I need to fetch data from US > Earthquake(http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.geojson) > website and load data in incremental way. > > > For that I am using below processors, > > * InvokeHTTP > > * SplitJson > > * EvaluateJsonPath > > * ReplaceText > > * MergerContent > > * PutFile/PutHDFS > > I haI have attached my template as well. > Issue which I am facing is duplicate data because every time InvokeHttp hit > to API and get the available details. But it might fetch the existing data as > well so it load same data again in Target. > > I need to load only unique data into taget. I found DetectDuplicate but not > know how to configure it, Can you tell me how to configure services in > cluster. Or itf you fhave any other solution so please let me know. We want > to use Nifi in our upcoming project but facing issue while implementing > small POCs. > > > ThaThanks > > YogYogesh (+91-9689942310) > > Tha > > > If > > I > > > > > >
