[jira] [Commented] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

Joseph Witt (JIRA) Wed, 20 Dec 2017 05:38:24 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298517#comment-16298517
 ]


Joseph Witt commented on NIFI-4715:
-----------------------------------

i'm not questioning whether duplicate listings are possible.  When listing from 
systems like this there are a range of interesting complex cases to handle.  
What i'm saying is that the suggested fix likely is unrelated to the stated 
problem.  Given that there is only a single thread we're  not having a 
threading issue.  It is not uncommon for people to use DetectDuplicate after a 
processor such as this so we can handle being given the same filename again.

> ListS3 list  duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4715
>                 URL: https://issues.apache.org/jira/browse/NIFI-4715
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>         Environment: All
>            Reporter: Milan Das
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
> ListS3 operates in multi threaded mode, sometimes it  tries to list  same 
> file from S3 bucket.  Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
> currentKeys = //ConcurrentHashMap.newKeySet();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

Reply via email to