Shall I create the jira directly? On Thu, Oct 26, 2017 at 12:34 PM, Xie Gang <xiegang...@gmail.com> wrote:
> Hi, > > We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain > socket is disabled when datanode throw block invalid exception. > > The block is invalidated for some reason on datanote and it's OK. Then > DFSClient tries to access this block on this datanode via domain socket. > This triggers a IOExcetion. On DFSClient side, when get a IOExcetion and > error code 'ERROR', it disables the domain socket and fails back to TCP. > and the worst is that it seems never recover the socket. > > I think this is a defect and with such "block invalid" exception, we > should not disable the domain socket because the is nothing wrong about the > domain socket service. > > And thoughts? > > The code: > > private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer, > Slot slot) throws IOException { > ShortCircuitCache cache = clientContext.getShortCircuitCache(); > final DataOutputStream out = > new DataOutputStream(new BufferedOutputStream(peer.getOutputStream())); > SlotId slotId = slot == null ? null : slot.getSlotId(); > new Sender(out).requestShortCircuitFds(block, token, slotId, 1); > DataInputStream in = new DataInputStream(peer.getInputStream()); > BlockOpResponseProto resp = BlockOpResponseProto.parseFrom( > PBHelper.vintPrefixed(in)); > DomainSocket sock = peer.getDomainSocket(); > switch (resp.getStatus()) { > case SUCCESS: > byte buf[] = new byte[1]; > FileInputStream fis[] = new FileInputStream[2]; > sock.recvFileInputStreams(fis, buf, 0, buf.length); > ShortCircuitReplica replica = null; > try { > ExtendedBlockId key = > new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId()); > replica = new ShortCircuitReplica(key, fis[0], fis[1], cache, > Time.monotonicNow(), slot); > } catch (IOException e) { > // This indicates an error reading from disk, or a format error. Since > // it's not a socket communication problem, we return null rather than > // throwing an exception. > LOG.warn(this + ": error creating ShortCircuitReplica.", e); > return null; > } finally { > if (replica == null) { > IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]); > } > } > return new ShortCircuitReplicaInfo(replica); > case ERROR_UNSUPPORTED: > if (!resp.hasShortCircuitAccessVersion()) { > LOG.warn("short-circuit read access is disabled for " + > "DataNode " + datanode + ". reason: " + resp.getMessage()); > clientContext.getDomainSocketFactory() > .disableShortCircuitForPath(pathInfo.getPath()); > } else { > LOG.warn("short-circuit read access for the file " + > fileName + " is disabled for DataNode " + datanode + > ". reason: " + resp.getMessage()); > } > return null; > case ERROR_ACCESS_TOKEN: > String msg = "access control error while " + > "attempting to set up short-circuit access to " + > fileName + resp.getMessage(); > if (LOG.isDebugEnabled()) { > LOG.debug(this + ":" + msg); > } > return new ShortCircuitReplicaInfo(new InvalidToken(msg)); > default: > LOG.warn(this + ": unknown response code " + resp.getStatus() + > " while attempting to set up short-circuit access. " + > resp.getMessage()); > clientContext.getDomainSocketFactory() > .disableShortCircuitForPath(pathInfo.getPath()); > return null; > } > > > > -- > Xie Gang > -- Xie Gang