[ 
https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004033#comment-15004033
 ] 

Bob Hansen commented on HDFS-9144:
----------------------------------

[~wheat9]: thank you for the salient questions.

bq. What is the purpose of having multiple inheritances on FileHandle, 
FileHandleImpl, CancelHandle, CancelHandleImpl?

(I assume you mean multiple layers of inheritance, not multiple inheritance)  
The intention there was in interface/implementation separation.  The primary 
consumer C++ interface will be in interactions with FileSystem and FileHandle 
(the renamed InputStream), and I wanted to keep the public surface of those as 
small as possible.  There is no CancelHandleImpl; if you mean Cancelable vs. 
CancelHandle, the intention is that Cancelable is an interface for something 
that can be canceled.  CancelHandle encapsulates a copyable object that ensures 
that the lifecycle of the cancelable object does not exceed the lifecycle of 
the handle.  I apologize for the lack of comments in this module.

bq. Why having multi-inheritance?

The only instance that I recall (correct me if I am wrong) is in the 
MockDNConnection where I wanted to bridge the (well-written, thank you) 
MockConnection to the DNConnection abstraction.  In a perfect world, I would 
have refactored the NNCode to also use the AsyncConnection interface and had 
the base MockConnection simple implement AsyncConnection, but the timeframe for 
this patch had already dragged on too long.  I can add that additional 
refactoring to this patch, but that doesn't seem to be the way you would like 
it to go.

bq. What is the principle of life cycle management here? Why changing pointers 
to shared_ptr in RemoteBlockReader? It's deliberately left as normal pointer as 
continuations do not own the stream, but the State object own it. 

The move to shared pointers for streams and the NNEngine comes from the 
uncertain nature of the asio callbacks.  It is very hard to know when there may 
be a queued but unsatisfied call in the asio queue, and we have had numerous 
instances where we had a segfault because there was a reference to a destroyed 
instance in a handler.  Once we get real integration tests with real servers, I 
suspect we will be seeing more of them.  Using shared pointers ensures that the 
object will be valid, even if the the operation becomes a no-op (such as the 
underlying fd being closed).

bq. What is the purpose of copying the callback handlers?

Obviously, in they need to be copied eventually to keep the references that 
they contain alive.  I made them pass-by-value in the interfaces to make it 
unambiguous in the interface that the implementation was making copies of the 
callbacks.  If the public interface includes references, there could be 
uncertainty whether the consumer needed to keep the reference to the handle 
valid.  You and I both know what we eventually make copies deep in the 
implementation, but kind interface design is to remove the question entirely.  
It is possible, if there is a _lot_ of state captured directly in the callback, 
that the copy might involve heap allocation, but in the vast majority of the 
cases, it is a simple object copy.  I don't suspect that that will be on the 
critical path for performance for a long, long time.

bq. What is cancelable? Is a request cancelable or a stream cancelable? Why 
AsyncStream inherits Cancelable.

Cancelable is an aspect that can have its async operations canceled.  We have a 
strong use case for canceling AsyncStream operations in that they can have 
potentially unbounded latency (if a server has gone cataonic during a client 
read, for example).  The current implementation of DNConnection.cancel is naive 
but correct in order to respond to your request to show how AsyncReads could be 
canceled.  A more sophisticated approach would support canceling individual 
reads in addition to all reads on a DNConnection to satisfy different use 
cases.  Again, I apologize for the lack of comments in the Cancelable classes 
describing their bounds; they were included purely as an example of how 
cancellation could be implemented with the refactoring.

bq. The patch contains many unnecessary changes and can be separated into 
several patches. There are refactors in NN, DN, various changes from here and 
there. It needs to be separated, reviewed and applied independently.

That would be very difficult.  Because the structure of the trait injection and 
state management were very closely intertwined in the original code, it came 
naturally to refactor them both together.  Approximately a week and a half of 
engineering time was necessary to get them untangled and correct to this state 
and are not easily separable as can be seen from the (too verbose, sorry) 
change lot; can I ask for some flexibility in not requiring substantial 
additional engineering just to make review slightly smoother?  I am happy to 
address the substantive concerns you brought up (and I appreciate them), but I 
am very leery of spending additional weeks that add no new functionality to the 
codebase.


> Refactor libhdfs into stateful/ephemeral objects
> ------------------------------------------------
>
>                 Key: HDFS-9144
>                 URL: https://issues.apache.org/jira/browse/HDFS-9144
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: HDFS-8707
>            Reporter: Bob Hansen
>            Assignee: Bob Hansen
>         Attachments: HDFS-9144.HDFS-8707.001.patch, 
> HDFS-9144.HDFS-8707.002.patch
>
>
> In discussion for other efforts, we decided that we should separate several 
> concerns:
> * A posix-like FileSystem/FileHandle object (stream-based, positional reads)
> * An ephemeral ReadOperation object that holds the state for 
> reads-in-progress, which consumes
> * An immutable FileInfo object which holds the block map and file size (and 
> other metadata about the file that we assume will not change over the life of 
> the file)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to