Good morning Kartik,
Joe is correct, the real question is what puts the original file in
place. If have a "temporary landing folder" to use the getfile on, then the
problem is very simple. Just have the getfile pick them up and send copies to
the permanent location and the outside systems. If the creation/delivery file
location can't be changed then there are really only "ok" solutions because of
the idea of known state of the files. You could use file renaming, temp files,
and a few other ideas but really the best would be to use a neutral landing
folder for the original files and then have nifi place them where you needed
them.
Sorry, I hope that, isn't too confusing.
Corey
Sent from my iPhone
> On Apr 8, 2015, at 12:33 AM, Joe Witt <[email protected]> wrote:
>
> Kartik
>
> Ok yes so your reply is definitely in the nifi wheelhouse.
>
> For your original case whereby you want to copy but retain the original
> object there are a few ways to do it. One is to actually pull the data
> from its original location and send a copy to your analytic system and also
> give a copy back to the original system.
>
> If you truly must keep the original where it was then there are really only
> 'ok' options. You need nifi then to act as an idempotent receiver which
> means it will keep state about what it has grabbed a copy of and will avoid
> sending it through more than once. Sounds like no big deal but it means
> some database and constantly checking the same things and tension on
> clustering. It is in many ways something which isnt conducive to healthy
> dataflow. It can be done but isnt fun.
>
> So before walking that path is putting back a copy of the data in the
> original system but not in a directory you are polling an option?
>
> Please feel free to subscribe to the mailing list so your notes will get
> through without delay.
>
> Thanks
> Joe
> On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[email protected]>
> wrote:
>
>> Corey,
>>
>>
>> My apologies for not making myself clear. But, the points you listed are
>> exactly what I meant.
>>
>>
>> Joe: I did checkout RSync, but we are planning to establish a continuos
>> data flow pipeline from wide range of servers, message bus, etc. to HDFS.
>> We think Apache Nifi can be integrated/used as a data flow system with our
>> Analytics as a Service Platform that we are building. Thanks for the help.
>>
>>
>> Kartik
>>