[ 
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Das updated NIFI-4715:
----------------------------
    Description: 
ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
ListS3 operates in multi threaded mode, sometimes it  tries to list  same file 
from S3 bucket.  Seems like HashSet data is getting corrupted.

currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
currentKeys = //ConcurrentHashMap.newKeySet();

Update:
This is not a HashSet issue:


  was:
ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
ListS3 operates in multi threaded mode, sometimes it  tries to list  same file 
from S3 bucket.  Seems like HashSet data is getting corrupted.

currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
currentKeys = //ConcurrentHashMap.newKeySet();


> ListS3 list  duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4715
>                 URL: https://issues.apache.org/jira/browse/NIFI-4715
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>         Environment: All
>            Reporter: Milan Das
>         Attachments: List-S3-dup-issue.xml, screenshot-1.png
>
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
> ListS3 operates in multi threaded mode, sometimes it  tries to list  same 
> file from S3 bucket.  Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
> currentKeys = //ConcurrentHashMap.newKeySet();
> Update:
> This is not a HashSet issue:



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to