ddewaele,

> 2. Sometimes, when syncing files to my S3 buckets, I notice that the ListS3
> processor is picking up the same file twice. Is there a way to avoid that ?

Joe's response is correct. If you upload an object to S3 that
overwrites an existing key, the modified date will change, and ListS3
will emit a flowfile for the "new" object with the same key. Likewise,
changes such as object metadata, setting server-side encryption, etc,
will also cause a change to the object modified date. The List->Fetch
strategy works well for a directory being used as queue, but it
doesn't always work as well for monitoring an entire S3 bucket over
time.

You may be able to achieve finer grained control using event
notifications and an SQS queue, which I wrote about a while back:
https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/

I suspect this will function a bit closer to your expectations and the
latency from object creation to NiFi receiving the event should be
much shorter as well.

Hope that helps,
Adam

Reply via email to