[
https://issues.apache.org/jira/browse/HDFS-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004033#comment-15004033
]
Bob Hansen commented on HDFS-9144:
----------------------------------
[~wheat9]: thank you for the salient questions.
bq. What is the purpose of having multiple inheritances on FileHandle,
FileHandleImpl, CancelHandle, CancelHandleImpl?
(I assume you mean multiple layers of inheritance, not multiple inheritance)
The intention there was in interface/implementation separation. The primary
consumer C++ interface will be in interactions with FileSystem and FileHandle
(the renamed InputStream), and I wanted to keep the public surface of those as
small as possible. There is no CancelHandleImpl; if you mean Cancelable vs.
CancelHandle, the intention is that Cancelable is an interface for something
that can be canceled. CancelHandle encapsulates a copyable object that ensures
that the lifecycle of the cancelable object does not exceed the lifecycle of
the handle. I apologize for the lack of comments in this module.
bq. Why having multi-inheritance?
The only instance that I recall (correct me if I am wrong) is in the
MockDNConnection where I wanted to bridge the (well-written, thank you)
MockConnection to the DNConnection abstraction. In a perfect world, I would
have refactored the NNCode to also use the AsyncConnection interface and had
the base MockConnection simple implement AsyncConnection, but the timeframe for
this patch had already dragged on too long. I can add that additional
refactoring to this patch, but that doesn't seem to be the way you would like
it to go.
bq. What is the principle of life cycle management here? Why changing pointers
to shared_ptr in RemoteBlockReader? It's deliberately left as normal pointer as
continuations do not own the stream, but the State object own it.
The move to shared pointers for streams and the NNEngine comes from the
uncertain nature of the asio callbacks. It is very hard to know when there may
be a queued but unsatisfied call in the asio queue, and we have had numerous
instances where we had a segfault because there was a reference to a destroyed
instance in a handler. Once we get real integration tests with real servers, I
suspect we will be seeing more of them. Using shared pointers ensures that the
object will be valid, even if the the operation becomes a no-op (such as the
underlying fd being closed).
bq. What is the purpose of copying the callback handlers?
Obviously, in they need to be copied eventually to keep the references that
they contain alive. I made them pass-by-value in the interfaces to make it
unambiguous in the interface that the implementation was making copies of the
callbacks. If the public interface includes references, there could be
uncertainty whether the consumer needed to keep the reference to the handle
valid. You and I both know what we eventually make copies deep in the
implementation, but kind interface design is to remove the question entirely.
It is possible, if there is a _lot_ of state captured directly in the callback,
that the copy might involve heap allocation, but in the vast majority of the
cases, it is a simple object copy. I don't suspect that that will be on the
critical path for performance for a long, long time.
bq. What is cancelable? Is a request cancelable or a stream cancelable? Why
AsyncStream inherits Cancelable.
Cancelable is an aspect that can have its async operations canceled. We have a
strong use case for canceling AsyncStream operations in that they can have
potentially unbounded latency (if a server has gone cataonic during a client
read, for example). The current implementation of DNConnection.cancel is naive
but correct in order to respond to your request to show how AsyncReads could be
canceled. A more sophisticated approach would support canceling individual
reads in addition to all reads on a DNConnection to satisfy different use
cases. Again, I apologize for the lack of comments in the Cancelable classes
describing their bounds; they were included purely as an example of how
cancellation could be implemented with the refactoring.
bq. The patch contains many unnecessary changes and can be separated into
several patches. There are refactors in NN, DN, various changes from here and
there. It needs to be separated, reviewed and applied independently.
That would be very difficult. Because the structure of the trait injection and
state management were very closely intertwined in the original code, it came
naturally to refactor them both together. Approximately a week and a half of
engineering time was necessary to get them untangled and correct to this state
and are not easily separable as can be seen from the (too verbose, sorry)
change lot; can I ask for some flexibility in not requiring substantial
additional engineering just to make review slightly smoother? I am happy to
address the substantive concerns you brought up (and I appreciate them), but I
am very leery of spending additional weeks that add no new functionality to the
codebase.
> Refactor libhdfs into stateful/ephemeral objects
> ------------------------------------------------
>
> Key: HDFS-9144
> URL: https://issues.apache.org/jira/browse/HDFS-9144
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Affects Versions: HDFS-8707
> Reporter: Bob Hansen
> Assignee: Bob Hansen
> Attachments: HDFS-9144.HDFS-8707.001.patch,
> HDFS-9144.HDFS-8707.002.patch
>
>
> In discussion for other efforts, we decided that we should separate several
> concerns:
> * A posix-like FileSystem/FileHandle object (stream-based, positional reads)
> * An ephemeral ReadOperation object that holds the state for
> reads-in-progress, which consumes
> * An immutable FileInfo object which holds the block map and file size (and
> other metadata about the file that we assume will not change over the life of
> the file)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)