[ 
https://issues.apache.org/jira/browse/HDDS-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18091594#comment-18091594
 ] 

Chi-Hsuan Huang commented on HDDS-14474:
----------------------------------------

Investigated the reported WARN flood. Two distinct findings:

_1. The reported WARN lines are not a Freon leak._ They come from 
{{org.apache.ratis.grpc.GrpcUtil}} \(Ratis graceful channel shutdown timing out 
under high {{\-t}} concurrency\); {{:9858}} is the datanode Ratis port. Worker 
threads do close their FileSystem: {{BaseFreonGenerator.runTaskLoop\(\)}} 
always calls {{taskLoopCompleted\(\)}}, which {{HadoopBaseFreonGenerator}} 
overrides to call {{fileSystem.close\(\)}}. The flood is the legitimate close 
path being slow/noisy, tracked separately in HDDS\-15670.

_2. There is a genuine, smaller FileSystem leak that this issue will cover._ 
The per\-thread FileSystem lives in a ThreadLocal, but 
{{taskLoopCompleted\(\)}} only runs on the worker pool threads. 
{{HadoopFsGenerator}} and {{HadoopFsValidator}} call {{getFileSystem\(\)}} on 
the main/calling thread before {{runTests\(\)}} \({{HadoopFsGenerator}} for 
{{mkdirs}}, {{HadoopFsValidator}} for {{open}}\), populating the main thread's 
ThreadLocal, which is never closed. {{HadoopDirTreeGenerator}} and 
{{HadoopNestedDirGenerator}} do not touch the FileSystem on the main thread and 
are unaffected.

_Fix:_ close the main\-thread FileSystem with a {{try/finally}} in 
{{HadoopFsGenerator.call\(\)}} and {{HadoopFsValidator.call\(\)}}, matching the 
existing pattern in {{HsyncGenerator}} / {{OmMetadataGenerator}}. Kept local to 
these two generators rather than added to the base class, because calling 
{{threadLocalFileSystem.get\(\)}} on the dir generators' main thread would 
create \(then close\) a spurious FileSystem due to {{ThreadLocal.withInitial}}.

Note: this removes one leaked OM\-facing client per run; it will not noticeably 
reduce the {{:9858}} Ratis WARN volume \(see HDDS\-15670\).



> Freon dfsg produces a number of Orphan ManagedChannels
> ------------------------------------------------------
>
>                 Key: HDDS-14474
>                 URL: https://issues.apache.org/jira/browse/HDDS-14474
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Andrey Yarovoy
>            Assignee: Chi-Hsuan Huang
>            Priority: Major
>
> Freon dfsg produces a number of Orphan ManagedChannels, since it uses 
> org.apache.hadoop.fs.FileSystem for interfacing Ozone and supposed to call 
> FileSystem.close() after thread completion, there is a possibility that the 
> same applies for all consumers of org.apache.hadoop.fs.FileSystem.
>  
> command used for reproducing the issue:
> {code:java}
> ozone freon dfsg -s 268435456 --prefix beg0i3ghm2 --path 
> ofs://ozone/ratis-vol/andrey -n10000 -t160 --buffer=1048576 
> --copy-buffer=1048576 {code}
> Example of Orphan ManagedChannels:
> {code:java}
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=15509, 
> target=10.15.25.39:9858}}. 
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=58, 
> target=10.15.25.26:9858}}. 
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=26033, 
> target=10.15.25.13:9858}}. 
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=20785, 
> target=10.15.25.14:9858}}. 
> 26/01/21 14:15:26 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=8929, 
> target=10.15.25.13:9858}}. 
> 26/01/21 14:15:27 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=3993, 
> target=10.15.25.34:9858}}. 
> 26/01/21 14:15:27 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=25649, 
> target=10.15.25.26:9858}}. 
> 26/01/21 14:15:27 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=12989, 
> target=10.15.25.24:9858}}. 
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=19417, 
> target=10.15.25.18:9858}}. 
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=17729, 
> target=10.15.25.33:9858}}. 
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=20521, 
> target=10.15.25.35:9858}}. 
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=27817, 
> target=10.15.25.34:9858}}. 
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=16453, 
> target=10.15.25.38:9858}}. 
> 26/01/21 14:15:28 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=22981, 
> target=10.15.25.24:9858}}. 
> 26/01/21 14:15:29 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=22533, 
> target=10.15.25.33:9858}}. 
> 26/01/21 14:15:29 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=12441, 
> target=10.15.25.14:9858}}. 
> 26/01/21 14:15:29 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=14841, 
> target=10.15.25.33:9858}}. 
> 26/01/21 14:15:31 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=27229, 
> target=10.15.25.33:9858}}. 
> 26/01/21 14:15:31 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=27529, 
> target=10.15.25.34:9858}}. 
> 26/01/21 14:15:32 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=26717, 
> target=10.15.25.35:9858}}. 
> 26/01/21 14:15:34 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=25021, 
> target=10.15.25.35:9858}}. 
> 26/01/21 14:15:34 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=13573, 
> target=10.15.25.33:9858}}. 
> 26/01/21 14:15:35 WARN grpc.GrpcUtil: Timed out gracefully shutting down 
> connection: 
> ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=28033, 
> target=10.15.25.34:9858}}.  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to