Please go ahead. On Thu, Oct 26, 2017 at 6:12 PM, Xie Gang <xiegang...@gmail.com> wrote:
> Shall I create the jira directly? > > On Thu, Oct 26, 2017 at 12:34 PM, Xie Gang <xiegang...@gmail.com> wrote: > > > Hi, > > > > We use HDFS2.4 & 2.6, and recently hit a issue that DFSClient domain > > socket is disabled when datanode throw block invalid exception. > > > > The block is invalidated for some reason on datanote and it's OK. Then > > DFSClient tries to access this block on this datanode via domain socket. > > This triggers a IOExcetion. On DFSClient side, when get a IOExcetion and > > error code 'ERROR', it disables the domain socket and fails back to TCP. > > and the worst is that it seems never recover the socket. > > > > I think this is a defect and with such "block invalid" exception, we > > should not disable the domain socket because the is nothing wrong about > the > > domain socket service. > > > > And thoughts? > > > > The code: > > > > private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer, > > Slot slot) throws IOException { > > ShortCircuitCache cache = clientContext.getShortCircuitCache(); > > final DataOutputStream out = > > new DataOutputStream(new BufferedOutputStream(peer. > getOutputStream())); > > SlotId slotId = slot == null ? null : slot.getSlotId(); > > new Sender(out).requestShortCircuitFds(block, token, slotId, 1); > > DataInputStream in = new DataInputStream(peer.getInputStream()); > > BlockOpResponseProto resp = BlockOpResponseProto.parseFrom( > > PBHelper.vintPrefixed(in)); > > DomainSocket sock = peer.getDomainSocket(); > > switch (resp.getStatus()) { > > case SUCCESS: > > byte buf[] = new byte[1]; > > FileInputStream fis[] = new FileInputStream[2]; > > sock.recvFileInputStreams(fis, buf, 0, buf.length); > > ShortCircuitReplica replica = null; > > try { > > ExtendedBlockId key = > > new ExtendedBlockId(block.getBlockId(), > block.getBlockPoolId()); > > replica = new ShortCircuitReplica(key, fis[0], fis[1], cache, > > Time.monotonicNow(), slot); > > } catch (IOException e) { > > // This indicates an error reading from disk, or a format error. > Since > > // it's not a socket communication problem, we return null rather > than > > // throwing an exception. > > LOG.warn(this + ": error creating ShortCircuitReplica.", e); > > return null; > > } finally { > > if (replica == null) { > > IOUtils.cleanup(DFSClient.LOG, fis[0], fis[1]); > > } > > } > > return new ShortCircuitReplicaInfo(replica); > > case ERROR_UNSUPPORTED: > > if (!resp.hasShortCircuitAccessVersion()) { > > LOG.warn("short-circuit read access is disabled for " + > > "DataNode " + datanode + ". reason: " + resp.getMessage()); > > clientContext.getDomainSocketFactory() > > .disableShortCircuitForPath(pathInfo.getPath()); > > } else { > > LOG.warn("short-circuit read access for the file " + > > fileName + " is disabled for DataNode " + datanode + > > ". reason: " + resp.getMessage()); > > } > > return null; > > case ERROR_ACCESS_TOKEN: > > String msg = "access control error while " + > > "attempting to set up short-circuit access to " + > > fileName + resp.getMessage(); > > if (LOG.isDebugEnabled()) { > > LOG.debug(this + ":" + msg); > > } > > return new ShortCircuitReplicaInfo(new InvalidToken(msg)); > > default: > > LOG.warn(this + ": unknown response code " + resp.getStatus() + > > " while attempting to set up short-circuit access. " + > > resp.getMessage()); > > clientContext.getDomainSocketFactory() > > .disableShortCircuitForPath(pathInfo.getPath()); > > return null; > > } > > > > > > > > -- > > Xie Gang > > > > > > -- > Xie Gang > -- John