[
https://issues.apache.org/jira/browse/HDDS-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091594#comment-18091594
]
Chi-Hsuan Huang commented on HDDS-14474:
----------------------------------------
Investigated the reported WARN flood. Two distinct findings:
_1. The reported WARN lines are not a Freon leak._ They come from
{{org.apache.ratis.grpc.GrpcUtil}} \(Ratis graceful channel shutdown timing out
under high {{\-t}} concurrency\); {{:9858}} is the datanode Ratis port. Worker
threads do close their FileSystem: {{BaseFreonGenerator.runTaskLoop\(\)}}
always calls {{taskLoopCompleted\(\)}}, which {{HadoopBaseFreonGenerator}}
overrides to call {{fileSystem.close\(\)}}. The flood is the legitimate close
path being slow/noisy, tracked separately in HDDS\-15670.
_2. There is a genuine, smaller FileSystem leak that this issue will cover._
The per\-thread FileSystem lives in a ThreadLocal, but
{{taskLoopCompleted\(\)}} only runs on the worker pool threads.
{{HadoopFsGenerator}} and {{HadoopFsValidator}} call {{getFileSystem\(\)}} on
the main/calling thread before {{runTests\(\)}} \({{HadoopFsGenerator}} for
{{mkdirs}}, {{HadoopFsValidator}} for {{open}}\), populating the main thread's
ThreadLocal, which is never closed. {{HadoopDirTreeGenerator}} and
{{HadoopNestedDirGenerator}} do not touch the FileSystem on the main thread and
are unaffected.
_Fix:_ close the main\-thread FileSystem with a {{try/finally}} in
{{HadoopFsGenerator.call\(\)}} and {{HadoopFsValidator.call\(\)}}, matching the
existing pattern in {{HsyncGenerator}} / {{OmMetadataGenerator}}. Kept local to
these two generators rather than added to the base class, because calling
{{threadLocalFileSystem.get\(\)}} on the dir generators' main thread would
create \(then close\) a spurious FileSystem due to {{ThreadLocal.withInitial}}.
Note: this removes one leaked OM\-facing client per run; it will not noticeably
reduce the {{:9858}} Ratis WARN volume \(see HDDS\-15670\).
> Freon dfsg produces a number of Orphan ManagedChannels
> ------------------------------------------------------
>
> Key: HDDS-14474
> URL: https://issues.apache.org/jira/browse/HDDS-14474
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Andrey Yarovoy
> Assignee: Chi-Hsuan Huang
> Priority: Major
>
> Freon dfsg produces a number of Orphan ManagedChannels, since it uses
> org.apache.hadoop.fs.FileSystem for interfacing Ozone and supposed to call
> FileSystem.close() after thread completion, there is a possibility that the
> same applies for all consumers of org.apache.hadoop.fs.FileSystem.
>
> command used for reproducing the issue:
> {code:java}
> ozone freon dfsg -s 268435456 --prefix beg0i3ghm2 --path
> ofs://ozone/ratis-vol/andrey -n10000 -t160 --buffer=1048576
> --copy-buffer=1048576 {code}
> Example of Orphan ManagedChannels:
> {code:java}
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=15509,
> target=10.15.25.39:9858}}.
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=58,
> target=10.15.25.26:9858}}.
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=26033,
> target=10.15.25.13:9858}}.
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=20785,
> target=10.15.25.14:9858}}.
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=8929,
> target=10.15.25.13:9858}}.
> 26/01/21 14:15:27 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=3993,
> target=10.15.25.34:9858}}.
> 26/01/21 14:15:27 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=25649,
> target=10.15.25.26:9858}}.
> 26/01/21 14:15:27 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=12989,
> target=10.15.25.24:9858}}.
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=19417,
> target=10.15.25.18:9858}}.
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=17729,
> target=10.15.25.33:9858}}.
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=20521,
> target=10.15.25.35:9858}}.
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=27817,
> target=10.15.25.34:9858}}.
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=16453,
> target=10.15.25.38:9858}}.
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=22981,
> target=10.15.25.24:9858}}.
> 26/01/21 14:15:29 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=22533,
> target=10.15.25.33:9858}}.
> 26/01/21 14:15:29 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=12441,
> target=10.15.25.14:9858}}.
> 26/01/21 14:15:29 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=14841,
> target=10.15.25.33:9858}}.
> 26/01/21 14:15:31 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=27229,
> target=10.15.25.33:9858}}.
> 26/01/21 14:15:31 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=27529,
> target=10.15.25.34:9858}}.
> 26/01/21 14:15:32 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=26717,
> target=10.15.25.35:9858}}.
> 26/01/21 14:15:34 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=25021,
> target=10.15.25.35:9858}}.
> 26/01/21 14:15:34 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=13573,
> target=10.15.25.33:9858}}.
> 26/01/21 14:15:35 WARN grpc.GrpcUtil: Timed out gracefully shutting down
> connection:
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=28033,
> target=10.15.25.34:9858}}. {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]