[ 
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298473#comment-16298473
 ] 

Milan Das commented on NIFI-4715:
---------------------------------

[~joewitt]

Steps to reproduce (I can reproduce and  upload the fiow template if needed) . 
1. Keep the ListS3 flow running
2. Start loading files into S3 Bucket. My files have the naming conventions 
like: ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part165.txt
2. Followed by FetchS3 and PutFile.

Here is the error:
2017-12-19 13:21:10,491 WARN [Timer-Driven Process Thread-9] 
o.a.nifi.processors.standard.PutFile 
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing 
StandardFlowFileRecord[uuid=3d7c66c4-3f46-43af-bf4f-0a4952cee85c,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default, 
section=1], offset=10616, 
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part165.txt,size=16]
 and routing to failure as configured because file with the same name already 
exists
2017-12-19 13:21:10,526 WARN [Timer-Driven Process Thread-2] 
o.a.nifi.processors.standard.PutFile 
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing 
StandardFlowFileRecord[uuid=1f4121af-7eab-4793-8736-c4c79dbec609,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default, 
section=1], offset=10632, 
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part166.txt,size=16]
 and routing to failure as configured because file with the same name already 
exists
2017-12-19 13:21:10,610 WARN [Timer-Driven Process Thread-2] 
o.a.nifi.processors.standard.PutFile 
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing 
StandardFlowFileRecord[uuid=b7087a30-1a9e-4f47-b431-18cd5a8f90da,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default, 
section=1], offset=10648, 
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part167.txt,size=16]
 and routing to failure as configured because file with the same name already 
exists
2017-12-19 13:21:10,643 WARN [Timer-Driven Process Thread-6] 
o.a.nifi.processors.standard.PutFile 
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing 
StandardFlowFileRecord[uuid=f0bb0d4a-5f83-4776-8ccd-e6c603590a2b,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default, 
section=1], offset=10664, 
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part168.txt,size=16]
 and routing to failure as configured because file with the same name already 
exists
 
 


> ListS3 list  duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4715
>                 URL: https://issues.apache.org/jira/browse/NIFI-4715
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>         Environment: All
>            Reporter: Milan Das
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When 
> ListS3 operates in multi threaded mode, sometimes it  tries to list  same 
> file from S3 bucket.  Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like 
> currentKeys = //ConcurrentHashMap.newKeySet();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to