[jira] [Commented] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

Joseph Witt (JIRA) Wed, 20 Dec 2017 05:57:44 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298534#comment-16298534
 ]


Joseph Witt commented on NIFI-4715:
-----------------------------------

I'm not familiar enough with this specific processor but am definitely familiar 
with the get/fetch pattern/implementations.  There may well be a bug or 
opportunity to tighten the handling of edge conditions like time boundaries, 
files with the same name being placed twice, etc...  My primary note in 
replying was to help focus the effort away from it being a threading problem 
since the processor is designed to operate with a single thread by intent.

> ListS3 list  duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4715
>                 URL: https://issues.apache.org/jira/browse/NIFI-4715
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>         Environment: All
>            Reporter: Milan Das
>         Attachments: List-S3-dup-issue.xml, screenshot-1.png
>
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
> ListS3 operates in multi threaded mode, sometimes it  tries to list  same 
> file from S3 bucket.  Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
> currentKeys = //ConcurrentHashMap.newKeySet();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

Reply via email to