[
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298473#comment-16298473
]
Milan Das commented on NIFI-4715:
---------------------------------
[~joewitt]
Steps to reproduce (I can reproduce and upload the fiow template if needed) .
1. Keep the ListS3 flow running
2. Start loading files into S3 Bucket. My files have the naming conventions
like: ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part165.txt
2. Followed by FetchS3 and PutFile.
Here is the error:
2017-12-19 13:21:10,491 WARN [Timer-Driven Process Thread-9]
o.a.nifi.processors.standard.PutFile
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing
StandardFlowFileRecord[uuid=3d7c66c4-3f46-43af-bf4f-0a4952cee85c,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default,
section=1], offset=10616,
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part165.txt,size=16]
and routing to failure as configured because file with the same name already
exists
2017-12-19 13:21:10,526 WARN [Timer-Driven Process Thread-2]
o.a.nifi.processors.standard.PutFile
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing
StandardFlowFileRecord[uuid=1f4121af-7eab-4793-8736-c4c79dbec609,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default,
section=1], offset=10632,
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part166.txt,size=16]
and routing to failure as configured because file with the same name already
exists
2017-12-19 13:21:10,610 WARN [Timer-Driven Process Thread-2]
o.a.nifi.processors.standard.PutFile
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing
StandardFlowFileRecord[uuid=b7087a30-1a9e-4f47-b431-18cd5a8f90da,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default,
section=1], offset=10648,
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part167.txt,size=16]
and routing to failure as configured because file with the same name already
exists
2017-12-19 13:21:10,643 WARN [Timer-Driven Process Thread-6]
o.a.nifi.processors.standard.PutFile
PutFile[id=cea1f31b-9865-32a2-e355-f7e07add4b17] Penalizing
StandardFlowFileRecord[uuid=f0bb0d4a-5f83-4776-8ccd-e6c603590a2b,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1513688204862-1, container=default,
section=1], offset=10664,
length=16],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part168.txt,size=16]
and routing to failure as configured because file with the same name already
exists
> ListS3 list duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
> Key: NIFI-4715
> URL: https://issues.apache.org/jira/browse/NIFI-4715
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.2.0, 1.3.0, 1.4.0
> Environment: All
> Reporter: Milan Das
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When
> ListS3 operates in multi threaded mode, sometimes it tries to list same
> file from S3 bucket. Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like
> currentKeys = //ConcurrentHashMap.newKeySet();
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)