Shall I create the jira directly?

On Thu, Oct 26, 2017 at 12:34 PM, Xie Gang <xiegang...@gmail.com> wrote:

> Hi,
>
> We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain
> socket is disabled when datanode throw block invalid exception.
>
> The block is invalidated for some reason on datanote and it's OK. Then
> DFSClient tries to access this block on this datanode via domain socket.
> This triggers a IOExcetion. On DFSClient side, when get a IOExcetion and
> error code 'ERROR', it disables the domain socket and fails back to TCP.
> and the worst is that it seems never recover the socket.
>
> I think this is a defect and with such "block invalid" exception, we
> should not disable the domain socket because the is nothing wrong about the
> domain socket service.
>
> And thoughts?
>
> The code:
>
> private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
>         Slot slot) throws IOException {
>   ShortCircuitCache cache = clientContext.getShortCircuitCache();
>   final DataOutputStream out =
>       new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
>   SlotId slotId = slot == null ? null : slot.getSlotId();
>   new Sender(out).requestShortCircuitFds(block, token, slotId, 1);
>   DataInputStream in = new DataInputStream(peer.getInputStream());
>   BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
>       PBHelper.vintPrefixed(in));
>   DomainSocket sock = peer.getDomainSocket();
>   switch (resp.getStatus()) {
>   case SUCCESS:
>     byte buf[] = new byte[1];
>     FileInputStream fis[] = new FileInputStream[2];
>     sock.recvFileInputStreams(fis, buf, 0, buf.length);
>     ShortCircuitReplica replica = null;
>     try {
>       ExtendedBlockId key =
>           new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
>       replica = new ShortCircuitReplica(key, fis[0], fis[1], cache,
>           Time.monotonicNow(), slot);
>     } catch (IOException e) {
>       // This indicates an error reading from disk, or a format error.  Since
>       // it's not a socket communication problem, we return null rather than
>       // throwing an exception.
>       LOG.warn(this + ": error creating ShortCircuitReplica.", e);
>       return null;
>     } finally {
>       if (replica == null) {
>         IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]);
>       }
>     }
>     return new ShortCircuitReplicaInfo(replica);
>   case ERROR_UNSUPPORTED:
>     if (!resp.hasShortCircuitAccessVersion()) {
>       LOG.warn("short-circuit read access is disabled for " +
>           "DataNode " + datanode + ".  reason: " + resp.getMessage());
>       clientContext.getDomainSocketFactory()
>           .disableShortCircuitForPath(pathInfo.getPath());
>     } else {
>       LOG.warn("short-circuit read access for the file " +
>           fileName + " is disabled for DataNode " + datanode +
>           ".  reason: " + resp.getMessage());
>     }
>     return null;
>   case ERROR_ACCESS_TOKEN:
>     String msg = "access control error while " +
>         "attempting to set up short-circuit access to " +
>         fileName + resp.getMessage();
>     if (LOG.isDebugEnabled()) {
>       LOG.debug(this + ":" + msg);
>     }
>     return new ShortCircuitReplicaInfo(new InvalidToken(msg));
>   default:
>     LOG.warn(this + ": unknown response code " + resp.getStatus() +
>         " while attempting to set up short-circuit access. " +
>         resp.getMessage());
>     clientContext.getDomainSocketFactory()
>         .disableShortCircuitForPath(pathInfo.getPath());
>     return null;
>   }
>
>
>
> --
> Xie Gang
>



-- 
Xie Gang

Reply via email to