[
https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083157#comment-14083157
]
Daryn Sharp commented on HDFS-6757:
-----------------------------------
Trying to catch up...
bq. I think it's fine to use INode in the lease map. However, in the meanwhile,
I'm not sure if this will increase the possibility of memory leak due to
failing to update the lease map (maybe because of buggy code).
bq. I agree-- we might get some really nasty bugs by having a bunch of
references to inodes that somehow are "really" deleted here. It would only
require a minor mistake in the code. The memory savings doesn't seem worth it
(and, of course, we're already reducing memory consumption by switching away
from strings)
Yes, but minor bugs anywhere can have devastating results. The block manager
doesn't make attempts to add indirection just in case there's a future bug.
Even if there was a leak, lease expiration will eventually clean it up?
bq. My thinking is that even though the lease manager is buggy, NN should
continue to function when replay edit logs / saving namespaces instead of
crashing. Though not ideal, but it allows NN to tolerate the bugs and to treat
them as lease recoveries. [...] I'm updating my patch to print out warnings to
help developers to catch these bugs.
I can agree to saving the namespace being tolerant. We would have been in a
world of hurt after the recent edit log corruption bugs if we couldn't save the
namespace on the active.
I'm not comfortable with being tolerant of buggy edits. If the NN cannot re-do
what it thinks it did, that's pretty serious, and should continue to be a stop
the world event. The _only_ reason we noticed the recent edit log corruptions
is because the standby died. Sadly, nobody is going to look for or notice log
messages until a major service interruption has occurred.
The idea that the NN should be tolerant of its own bugs will inevitably lead to
namespace and/or data corruption...
bq. Jing Zhao's made a good point about how we don't want to remove leases
for open files just because we deleted an under-construction inode inside a
snapshot.
Perhaps I'm understanding. What is the use case for allowing a client to
continue writing to a deleted file and have the results reflected in a
snapshot? The lease is effectively for the mutable/non-snapshot path. If it's
deleted, why wouldn't we want to revoke the lease? It's already "bad" enough
that snapshotting a UC file does not record the visible length so the file in
the snapshot continue to mutate, but continuing to modify a deleted file in a
snapshot?
bq. Imagine that a client gets SIGSTOP, and then its hard lease on its
open-for-write file expires and is removed. The client can be resumed later,
come back and then try to close. So not being able to find a lease during a
close is certainly worthy of a nasty log message, but I don't think we can call
it a bug.
I thought the close fails if the lease is expired? If yes, my question is
whether replaying edit logs should tolerate a close with no lease if the live
NN does not allow it.
bq. You're right that the new lease manager won't (and doesn't need to) update
the lease during the renames, but I'm not sure that I fully understand your
comments.
My mild concern/question is let's say this change is in 2.6. We deploy it, run
a week, find a horrible problem and back up to 2.5 because it has the same
layout. If 2.5 replays the edits, will it be in a sane state due to the
missing lease reassigns? I wish I had time to study the code and think about
it, but I don't, so I'm posing the question.
> Simplify lease manager with INodeID
> -----------------------------------
>
> Key: HDFS-6757
> URL: https://issues.apache.org/jira/browse/HDFS-6757
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Haohui Mai
> Assignee: Haohui Mai
> Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch,
> HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch
>
>
> Currently the lease manager records leases based on path instead of inode
> ids. Therefore, the lease manager needs to carefully keep track of the path
> of active leases during renames and deletes. This can be a non-trivial task.
> This jira proposes to simplify the logic by tracking leases using inodeids
> instead of paths.
--
This message was sent by Atlassian JIRA
(v6.2#6252)