[jira] [Commented] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

Milan Das (JIRA) Wed, 20 Dec 2017 05:51:21 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298529#comment-16298529
 ]


Milan Das commented on NIFI-4715:
---------------------------------

[~joewitt] Now I am able to fix the bug using DetectDuplicate. But seems like a 
bug in ListS3. 
I think Only reason the following code block is failing. 
{code:title=ListS3.java|borderStyle=solid}

                if (lastModified < currentTimestamp
                        || lastModified == currentTimestamp && 
currentKeys.contains(versionSummary.getKey())) {
                    continue;
                }


                

> ListS3 list  duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4715
>                 URL: https://issues.apache.org/jira/browse/NIFI-4715
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>         Environment: All
>            Reporter: Milan Das
>         Attachments: List-S3-dup-issue.xml, screenshot-1.png
>
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
> ListS3 operates in multi threaded mode, sometimes it  tries to list  same 
> file from S3 bucket.  Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
> currentKeys = //ConcurrentHashMap.newKeySet();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

Reply via email to