[ 
https://issues.apache.org/jira/browse/HDFS-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813059#comment-15813059
 ] 

James Clampffer commented on HDFS-11028:
----------------------------------------

In case we needed another example of why "delete this" and tenuous resource 
management in C++ are a recipe for pain: it looks like this can leak memory if 
the FileSystem destructor is called while this waits on the asio dns resolver.  
The bug existed before this patch but the cancel test executable in my patch 
provides a simple reproducer.

Situation:
In common/namenode_info.cc BulkResolve takes a set of NamenodeInfo objects and 
does a DNS lookup on each host to get a vector of endpoints.  In order to be 
fast the function does an arbitrary amount of async lookups in parallel and 
joins at the end to make the API reasonably simple to use.

In order to keep track of multiple pipelines a vector of 
std::pair<Pipeline<T>*, std::shared_ptr<std::promise<Status>>> is set up.  Each 
pair represents a continuation pipeline that's doing the resolve work and the 
std::promise that will eventually contain the result status assuming the 
continuation runs to completion.  This seemed like a reasonable way to 
encapsulate async work using continuations that needed to be joined but it 
turns out it's incredibly difficult to clean this up if it's been interrupted.

-Can't simply call delete on the Pipeline<T> pointers contained in the vector 
because the continuation may have already called "delete this", if it has 
self-destructed the pointer remains non-null so double deleting will break 
things.
-Can't loop though the vector can call cancel on all the Pipelines because some 
may have been destructed via "delete this".  If the malloc implementation is 
being generous the call might give __cxa_pure_virtual exception but it's more 
likely to just trash the heap.
-Can't check the status of the Pipeline because it's wrapped in a promise, so 
that will just block.

Possible fixes:
-Add a pointer-to-a-flag to the continuation state so the pipeline can indicate 
it self destructed, make sure the ResolveContinuation can actually deal with 
cancel semantics.
-Rewrite dns lookup by allocating memory correctly and calling asio functions.

I'll file another jira to rewrite the dns resolution code as I don't think an 
issue that's been around for so long should block this.  The temporary fix is 
to avoid deleting the FileSystem object immediately after cancel.  The pipeline 
will clean itself up when the resolver returns, but it risks invalid writes if 
the vector of endpoints disappears since it's holding a back_inserter i.e. it's 
a dangling pointer issue obfuscated by piles of abstraction.

> libhdfs++: FileHandleImpl::CancelOperations needs to be able to cancel 
> pending connections
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11028
>                 URL: https://issues.apache.org/jira/browse/HDFS-11028
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-11028.HDFS-8707.000.patch, 
> HDFS-11028.HDFS-8707.001.patch, HDFS-11028.HDFS-8707.002.patch
>
>
> Cancel support is now reasonably robust except the case where a FileHandle 
> operation ends up causing the RpcEngine to try to create a new RpcConnection. 
>  In HA configs it's common to have something like 10-20 failovers and a 20 
> second failover delay (no exponential backoff just yet). This means that all 
> of the functions with synchronous interfaces can still block for many minutes 
> after an operation has been canceled, and often the cause of this is 
> something trivial like a bad config file.
> The current design makes this sort of thing tricky to do because the 
> FileHandles need to be individually cancelable via CancelOperations, but they 
> share the RpcEngine that does the async magic.
> Updated design:
> Original design would end up forcing lots of reconnects.  Not a huge issue on 
> an unauthenticated cluster but on a kerberized cluster this is a recipe for 
> Kerberos thinking we're attempting a replay attack.
> User visible cancellation and internal resources cleanup are separable 
> issues.  The former can be implemented by atomically swapping the callback of 
> the operation to be canceled with a no-op callback.  The original callback is 
> then posted to the IoService with an OperationCanceled status and the user is 
> no longer blocked.  For RPC cancels this is sufficient, it's not expensive to 
> keep a request around a little bit longer and when it's eventually invoked or 
> timed out it invokes the no-op callback and is ignored (other than a trace 
> level log notification).  Connect cancels push a flag down into the RPC 
> engine to kill the connection and make sure it doesn't attempt to reconnect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to