Re: ListS3 processor question (duplicate files / maintaining state)

Joe Skora Mon, 27 Jun 2016 05:30:08 -0700

1. ListS3 uses the framework's state management.  (see the persistState()
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java#L140>
and restoreState()
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java#L129>
methods)

2. The ListS3 state tracks the latest modified timestamp and any keys that
go with it that have already been processed.  On subsequent runs, it should
exclude any file before the timestamp or in the list of keys processed for
the timestamp.

If the files are being re-written on S3 with new timestamps, I believe
ListS3 will see that as a new file.  Do you see duplication of the FlowFile
name and id fields, or just the name?

Do you see the duplicate under any specific circumstances like after
processor or instance starts/stop or during periods of high or low flow
volume?

On Sun, Jun 26, 2016 at 7:30 AM, ddewaele <[email protected]> wrote:

> Hi,
>
> I had a question on the ListS3 processor.
> I'm using it to monitor the content of an S3 bucket.
> The idea is that when new files come in, they need to be processed and sent
> through the dataflow, using a FetchS3Object to process the file. This all
> works but I had 2 questions :
>
> 1. Where does the S3 processor keep its state ? How does it know what files
> it has already processed and is there a way to clear this state ?
> 2. Sometimes, when syncing files to my S3 buckets, I notice that the ListS3
> processor is picking up the same file twice. Is there a way to avoid that ?
>
>
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/ListS3-processor-question-duplicate-files-maintaining-state-tp12278.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: ListS3 processor question (duplicate files / maintaining state)

Reply via email to