[
https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298478#comment-16298478
]
Milan Das commented on NIFI-4715:
---------------------------------
Additional logs after enabling log4j on ListS3:
AWS Last Modified Times
ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt – AWS
Last modified is Dec 8, 2017 9:34:00 AM GMT-0500
·
ls.s3.7236a098-30cf-40c9-8be7-f8df1d52b33e.2017-12-08T14.33.part104.txt – AWS
Last modified is Dec 8, 2017 9:34:35 AM GMT-0500
First time
ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt appears
in the logs.
2017-12-08 14:34:00,762 DEBUG [Timer-Driven Process Thread-6]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Setting CLUSTER State to
{key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt,
currentTimestamp=1512743602000}
Events where S3List detects new files and pulls the old one down again.
2017-12-08 14:34:28,491 INFO [Provenance Repository Rollover Thread-1]
o.a.n.p.PersistentProvenanceRepository Successfully Rolled over Provenance
Event file containing 25 records. In the past 5 minutes, 118 events have been
written to the Provenance Repository, totaling 94.96 KB
2017-12-08 14:34:29,486 INFO [Heartbeat Monitor Thread-1]
o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats in 5162
nanos
2017-12-08 14:34:29,806 INFO [Timer-Driven Process Thread-2]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed S3 bucket
amber-marvin in 1869 millis
2017-12-08 14:34:29,806 DEBUG [Timer-Driven Process Thread-2]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] No new objects in S3 bucket
amber-marvin to list. Yielding.
2017-12-08 14:34:29,806 DEBUG [Timer-Driven Process Thread-2]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] has chosen to yield its
resources; will not be scheduled to run again for 1000 milliseconds
2017-12-08 14:34:30,807 DEBUG [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Returning CLUSTER State:
StandardStateMap[version=2143,
values={key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt,
currentTimestamp=1512743640000}]
2017-12-08 14:34:31,929 INFO [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed S3 bucket
amber-marvin in 1121 millis
2017-12-08 14:34:31,929 DEBUG [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] No new objects in S3 bucket
amber-marvin to list. Yielding.
2017-12-08 14:34:31,929 DEBUG [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] has chosen to yield its
resources; will not be scheduled to run again for 1000 milliseconds
2017-12-08 14:34:32,014 INFO [Process Cluster Protocol Request-6]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
2294c28c-b82a-4b90-9e24-bbb79d478ae4 (type=HEARTBEAT, length=2095 bytes) from
iset-streaming-01.dev.secureauth.local:8085 in 3 millis
2017-12-08 14:34:32,015 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-12-08
14:34:32,009 and sent to iset-streaming-01.dev.secureauth.local:8086 at
2017-12-08 14:34:32,015; send took 5 millis
2017-12-08 14:34:32,930 DEBUG [Timer-Driven Process Thread-3]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Returning CLUSTER State:
StandardStateMap[version=2143,
values={key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt,
currentTimestamp=1512743640000}]
2017-12-08 14:34:34,213 INFO [Timer-Driven Process Thread-3]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed S3 bucket
amber-marvin in 1283 millis
2017-12-08 14:34:34,213 DEBUG [Timer-Driven Process Thread-3]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] No new objects in S3 bucket
amber-marvin to list. Yielding.
2017-12-08 14:34:34,213 DEBUG [Timer-Driven Process Thread-3]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] has chosen to yield its
resources; will not be scheduled to run again for 1000 milliseconds
2017-12-08 14:34:34,487 INFO [Heartbeat Monitor Thread-1]
o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats in 5029
nanos
2017-12-08 14:34:35,214 DEBUG [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Returning CLUSTER State:
StandardStateMap[version=2143,
values={key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt,
currentTimestamp=1512743640000}]
2017-12-08 14:34:35,950 INFO [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed 2 new files
from S3; routing to success
2017-12-08 14:34:35,951 DEBUG [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Setting CLUSTER State to
{key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.7236a098-30cf-40c9-8be7-f8df1d52b33e.2017-12-08T14.33.part104.txt,
currentTimestamp=1512743640000}
2017-12-08 14:34:36,507 INFO [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed S3 bucket
amber-marvin in 1292 millis
2017-12-08 14:34:36,507 DEBUG [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Setting CLUSTER State to
{key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.7236a098-30cf-40c9-8be7-f8df1d52b33e.2017-12-08T14.33.part104.txt,
currentTimestamp=1512743675000}
2017-12-08 14:34:36,508 DEBUG [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] No new objects in S3 bucket
amber-marvin to list. Yielding.
2017-12-08 14:34:36,508 DEBUG [Timer-Driven Process Thread-1]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] has chosen to yield its
resources; will not be scheduled to run again for 1000 milliseconds
2017-12-08 14:34:37,018 INFO [Process Cluster Protocol Request-7]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
49334dbb-b1db-4f7f-b946-862e74fb6555 (type=HEARTBEAT, length=2095 bytes) from
iset-streaming-01.dev.secureauth.local:8085 in 1 millis
2017-12-08 14:34:37,019 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2017-12-08
14:34:37,016 and sent to iset-streaming-01.dev.secureauth.local:8086 at
2017-12-08 14:34:37,019; send took 3 millis
2017-12-08 14:34:37,509 DEBUG [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Returning CLUSTER State:
StandardStateMap[version=2145,
values={key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.7236a098-30cf-40c9-8be7-f8df1d52b33e.2017-12-08T14.33.part104.txt,
currentTimestamp=1512743675000}]
2017-12-08 14:34:37,887 ERROR [Thread-144]
o.a.n.p.standard.ExecuteStreamCommand
ExecuteStreamCommand[id=a683a10f-ab2f-332d-94f9-a60bdf3b89bf] Failed to write
flow file to stdin due to java.io.IOException: Broken pipe:
java.io.IOException: Broken pipe
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
....
2017-12-08 14:34:38,204 WARN [Timer-Driven Process Thread-8]
o.a.nifi.processors.standard.PutFile
PutFile[id=df9f8593-56d9-322c-904d-1ea113d5a19c] Penalizing
StandardFlowFileRecord[uuid=ae35ff77-bf15-4716-976f-f4319eb9c228,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1512743677878-116, container=default,
section=116], offset=0,
length=5253245],offset=0,name=ls.s3.9447cb9e-0b42-4464-85d3-3117cf947f77.2017-12-08T14.33.part103.txt,size=5253245]
and routing to failure as configured because file with the same name already
exists
2017-12-08 14:34:38,836 INFO [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed S3 bucket
amber-marvin in 1327 millis
2017-12-08 14:34:38,836 DEBUG [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] No new objects in S3 bucket
amber-marvin to list. Yielding.
2017-12-08 14:34:38,836 DEBUG [Timer-Driven Process Thread-10]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] has chosen to yield its
resources; will not be scheduled to run again for 1000 milliseconds
2017-12-08 14:34:39,488 INFO [Heartbeat Monitor Thread-1]
o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats in 5477
nanos
2017-12-08 14:34:39,837 DEBUG [Timer-Driven Process Thread-8]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Returning CLUSTER State:
StandardStateMap[version=2145,
values={key-0=AMBER/MARVIN/07708c5c-2939-4dba-99b1-de2727f1c080/ls.s3.7236a098-30cf-40c9-8be7-f8df1d52b33e.2017-12-08T14.33.part104.txt,
currentTimestamp=1512743675000}]
2017-12-08 14:34:41,198 INFO [Timer-Driven Process Thread-8]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] Successfully listed S3 bucket
amber-marvin in 1360 millis
2017-12-08 14:34:41,198 DEBUG [Timer-Driven Process Thread-8]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] No new objects in S3 bucket
amber-marvin to list. Yielding.
2017-12-08 14:34:41,198 DEBUG [Timer-Driven Process Thread-8]
org.apache.nifi.processors.aws.s3.ListS3
ListS3[id=db049bf5-f3e4-3e2c-83b5-fd4388377832] has chosen to yield its
resources; will not be scheduled to run again for 1000 milliseconds
2017-12-08 14:34:42,023 INFO [Process Cluster Protocol Request-8]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
f22b6895-c726-40f4-82b8-962cf1491447 (type=HEARTBEAT, length=2095 bytes) from
iset-streaming-01.dev.secureauth.local:8085 in 1 millis
> ListS3 list duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
> Key: NIFI-4715
> URL: https://issues.apache.org/jira/browse/NIFI-4715
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.2.0, 1.3.0, 1.4.0
> Environment: All
> Reporter: Milan Das
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When
> ListS3 operates in multi threaded mode, sometimes it tries to list same
> file from S3 bucket. Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like
> currentKeys = //ConcurrentHashMap.newKeySet();
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)