We did something similar to this, but kept a simple flat file of where we left off, basically used the date or a sequence number along with a custom flow processor. We also had the system with the data on it send it and put in a directory that NiFi monitored with the GetFile processor. This would require something on the sending system then to keep track.
Ralph Spangler Chief Engineer L-3 NSS Data Tactics 7901 Jones Branch Drive, Suite 700 McLean, VA 22102 Office: (571) 257-0491 Cell: (321) 212-9552 Fax: (703) 506-6703 [email protected] The information contained in this message may be privileged and/or confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting the material from any computer. -----Original Message----- From: Joe Witt [mailto:[email protected]] Sent: Wednesday, April 08, 2015 12:35 AM To: [email protected] Subject: Re: Conflict Resolution Strategy Kartik Ok yes so your reply is definitely in the nifi wheelhouse. For your original case whereby you want to copy but retain the original object there are a few ways to do it. One is to actually pull the data from its original location and send a copy to your analytic system and also give a copy back to the original system. If you truly must keep the original where it was then there are really only 'ok' options. You need nifi then to act as an idempotent receiver which means it will keep state about what it has grabbed a copy of and will avoid sending it through more than once. Sounds like no big deal but it means some database and constantly checking the same things and tension on clustering. It is in many ways something which isnt conducive to healthy dataflow. It can be done but isnt fun. So before walking that path is putting back a copy of the data in the original system but not in a directory you are polling an option? Please feel free to subscribe to the mailing list so your notes will get through without delay. Thanks Joe On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[email protected]> wrote: > Corey, > > > My apologies for not making myself clear. But, the points you listed > are exactly what I meant. > > > Joe: I did checkout RSync, but we are planning to establish a > continuos data flow pipeline from wide range of servers, message bus, etc. to > HDFS. > We think Apache Nifi can be integrated/used as a data flow system with > our Analytics as a Service Platform that we are building. Thanks for the help. > > > Kartik >
