[
https://issues.apache.org/jira/browse/HDFS-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701862#comment-14701862
]
Daryn Sharp commented on HDFS-8870:
-----------------------------------
A combination of errors occurred. Pipelines were frequently breaking because
the cluster erroneously "thought" it was full. Mis-accounting bugs in the RBW
reserved space and storage report contributed to the problem but almost full
clusters will exhibit the same problems. A thread leaks and continues to renew
the lease on a defunct file.
Didn't seem like a big deal until we saw it in long running daemons. Then it
was the NMs. Consider log aggregation pipelines breaking, NMs leaking dozens
or hundreds of renewer threads, over thousands of nodes, NN has an insane
number of open connections nearing your "this will never happen" fd limit,
clogging it with worthless renewals. Now it gets good. The renewer threads
won't abort until the token expires. Oh, you don't have security enabled?
Better restart your NMs, hdfs proxies, oozies, DNs (webhdfs writes), hbase
region servers, etc...
I'm swamped and if you want to wait till 2.6.2, I'm ok.
> Lease is leaked on write failure
> --------------------------------
>
> Key: HDFS-8870
> URL: https://issues.apache.org/jira/browse/HDFS-8870
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: HDFS
> Affects Versions: 2.6.0
> Reporter: Rushabh S Shah
> Assignee: Daryn Sharp
>
> Creating this ticket on behalf of [~daryn]
> We've seen this in our of our cluster. When a long running process has a
> write failure, the lease is leaked and gets renewed until the token is
> expired.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)