[
https://issues.apache.org/jira/browse/HDFS-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549959#comment-13549959
]
Colin Patrick McCabe commented on HDFS-4353:
--------------------------------------------
This patch refactors some code in the {{DFSClient}} and the DataNode's
{{DataXceiver}}. The refactor encapsulates connections to peers into a single
class named {{Peer}}.
Suresh, please excuse me if I'm covering things you already know, but I want to
give some context to random people reading this JIRA. Java has no standard
mechanism for setting write timeouts on blocking sockets. So we usually
wrap our sockets in {{org.apache.hadoop.net.SocketOutputStream}}. This class
sets the {{Socket}} to nonblocking and simulates blocking I/O with a
timeout. (There is also a parallel
{{org.apache.hadoop.net.SocketInputStream}}.) However, we can't * always* do
this, since some Sockets cannot be used in non-blocking mode-- for example, the
SOCKS sockets classes don't support this. The other thing that we do a lot
of is wrapping output and input streams in encrypted streams.
The end result of this is that we end up passing around a lot of objects just
to represent a single connection to a Peer. {{IOStreamPair}} is a good example
of this. We also end up using {{instanceof}} a lot because we're dealing
with types that don't have a common ancestor. This refactor encapsulates all
of thos objects in a single object, the {{Peer}}. This avoids the need to use
{{instanceof}} to set socket timeouts and other properties.
The main reason for doing this refactor now is that {{DomainSocket}}, which is
introduced by HDFS-4354, doesn't inherit from {{Socket}}. We made the decision
not to inherit from {{Socket}} because inheriting would require us to rely on
non-public JVM classes. There is more discsussion on HDFS-347 about this
issue, if you're curious.
Specific changes:
{{PeerServer}}: a class that creates {{Peers}}. {{TcpPeerServer}} is basically
a wrapper around {{ServerSocket}}. The next patch introduces another subclass,
{{DomainPeerServer}}.
{{BlockReader#close}}: now returns the Peer to the PeerCache directly. This
replaces the multi-step process involving {{hasSentStatusCode}},
{{takeSocket}}, and {{getStreams}}.
{{SocketCache}}: was renamed to {{PeerCache}}. Now caches based on
{{DatanodeID}} rather than socket address. This is needed to prepare the way
for putting DomainSockets into the cache. Aside from that it should be
very similar.
> Encapsulate connections to peers in Peer and PeerServer classes
> ---------------------------------------------------------------
>
> Key: HDFS-4353
> URL: https://issues.apache.org/jira/browse/HDFS-4353
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, hdfs-client
> Affects Versions: 2.0.3-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: _02a.patch, 02b-cumulative.patch, 02c.patch, 02c.patch,
> 02-cumulative.patch, 02d.patch, 02e.patch, 02f.patch
>
>
> Encapsulate connections to peers into the {{Peer}} and {{PeerServer}}
> classes. Since many Java classes may be involved with these connections, it
> makes sense to create a container for them. For example, a connection to a
> peer may have an input stream, output stream, readablebytechannel, encrypted
> output stream, and encrypted input stream associated with it.
> This makes us less dependent on the {{NetUtils}} methods which use
> {{instanceof}} to manipulate socket and stream states based on the runtime
> type. it also paves the way to introduce UNIX domain sockets which don't
> inherit from {{java.net.Socket}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira