[
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297848#comment-16297848
]
Joseph Witt commented on NIFI-4715:
-----------------------------------
[~dmilan77] ListS3 is annotated with TriggerSerially. This means it can only
ever have one thread. It is designed to be run this way exclusively. When you
say run in multi-threaded mode are you saying you're able to have it run with
more than one thread? Can you share a screen shot.
It is designed to be single threaded for the listing then the listing results
can be sent around the cluster via S2S protocol and Fetched in parallel. This
List/Fetch pattern is extremely common now for massive scale flows.
Please confirm whether there is a bug or a misunderstanding of how it works.
> ListS3 list duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
> Key: NIFI-4715
> URL: https://issues.apache.org/jira/browse/NIFI-4715
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.2.0, 1.3.0, 1.4.0
> Environment: All
> Reporter: Milan Das
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When
> ListS3 operates in multi threaded mode, sometimes it tries to list same
> file from S3 bucket. Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like
> currentKeys = //ConcurrentHashMap.newKeySet();
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)