Sam, if I remember correctly, you will have to list the entire bucket (optionally with the matching prefix) once in order to store the last-seen object key/version and its timestamp in the NiFi processor state. After you do this initial listing, ListS3 will then behave as you describe, where it will only emit new files put into the bucket. You can use Joe Witt's example to filter out any flow files from the initial run of ListS3.
On Mon, Feb 13, 2017 at 9:50 PM, Joe Witt <[email protected]> wrote: > Hello > > The ListS3 processor does write out important metadata such as > 's3.lastModified'. With this you can likely filter out that which > you're interested in. For example you could do something like > > ListS3 -> RouteOnAttribute -> FetchS3 > > And in the route on attribute you can use expression language > statements potentially to filter things out based on that attribute > and some range of time from 'now'. I don't know what the expression > language statement would be off-hand but it does seem like a strong > path to evaluate. > > Thanks > Joe > > On Mon, Feb 13, 2017 at 3:03 PM, sam <[email protected]> wrote: > > Thanks for the help! Seems in this case, what is eating up space is the > > ListS3 and FetchS3, its somehow reading all objects in the bucket though > I > > just wanted it to read only object added 'now'. > > > > > > > > -- > > View this message in context: http://apache-nifi-developer- > list.39713.n7.nabble.com/Nifi-in-a-hung-state-tp14713p14730.html > > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. > -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength. *-Philippians 4:12-13*
