[ 
https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083157#comment-14083157
 ] 

Daryn Sharp commented on HDFS-6757:
-----------------------------------

Trying to catch up...
bq. I think it's fine to use INode in the lease map. However, in the meanwhile, 
I'm not sure if this will increase the possibility of memory leak due to 
failing to update the lease map (maybe because of buggy code).

bq. I agree-- we might get some really nasty bugs by having a bunch of 
references to inodes that somehow are "really" deleted here. It would only 
require a minor mistake in the code. The memory savings doesn't seem worth it 
(and, of course, we're already reducing memory consumption by switching away 
from strings)

Yes, but minor bugs anywhere can have devastating results.  The block manager 
doesn't make attempts to add indirection just in case there's a future bug.  
Even if there was a leak, lease expiration will eventually clean it up?

bq. My thinking is that even though the lease manager is buggy, NN should 
continue to function when replay edit logs / saving namespaces instead of 
crashing. Though not ideal, but it allows NN to tolerate the bugs and to treat 
them as lease recoveries. [...]  I'm updating my patch to print out warnings to 
help developers to catch these bugs.

I can agree to saving the namespace being tolerant.  We would have been in a 
world of hurt after the recent edit log corruption bugs if we couldn't save the 
namespace on the active.

I'm not comfortable with being tolerant of buggy edits.  If the NN cannot re-do 
what it thinks it did, that's pretty serious, and should continue to be a stop 
the world event.  The _only_ reason we noticed the recent edit log corruptions 
is because the standby died.  Sadly, nobody is going to look for or notice log 
messages until a major service interruption has occurred.  

The idea that the NN should be tolerant of its own bugs will inevitably lead to 
namespace and/or data corruption...

bq.   Jing Zhao's made a good point about how we don't want to remove leases 
for open files just because we deleted an under-construction inode inside a 
snapshot.

Perhaps I'm understanding.  What is the use case for allowing a client to 
continue writing to a deleted file and have the results reflected in a 
snapshot?  The lease is effectively for the mutable/non-snapshot path.  If it's 
deleted, why wouldn't we want to revoke the lease?  It's already "bad" enough 
that snapshotting a UC file does not record the visible length so the file in 
the snapshot continue to mutate, but continuing to modify a deleted file in a 
snapshot?

bq.  Imagine that a client gets SIGSTOP, and then its hard lease on its 
open-for-write file expires and is removed. The client can be resumed later, 
come back and then try to close. So not being able to find a lease during a 
close is certainly worthy of a nasty log message, but I don't think we can call 
it a bug.

I thought the close fails if the lease is expired?  If yes, my question is 
whether replaying edit logs should tolerate a close with no lease if the live 
NN does not allow it.

bq.  You're right that the new lease manager won't (and doesn't need to) update 
the lease during the renames, but I'm not sure that I fully understand your 
comments. 

My mild concern/question is let's say this change is in 2.6.  We deploy it, run 
a week, find a horrible problem and back up to 2.5 because it has the same 
layout.  If 2.5 replays the edits, will it be in a sane state due to the 
missing lease reassigns?  I wish I had time to study the code and think about 
it, but I don't, so I'm posing the question.




> Simplify lease manager with INodeID
> -----------------------------------
>
>                 Key: HDFS-6757
>                 URL: https://issues.apache.org/jira/browse/HDFS-6757
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
>            Assignee: Haohui Mai
>         Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, 
> HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch
>
>
> Currently the lease manager records leases based on path instead of inode 
> ids. Therefore, the lease manager needs to carefully keep track of the path 
> of active leases during renames and deletes. This can be a non-trivial task.
> This jira proposes to simplify the logic by tracking leases using inodeids 
> instead of paths.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to