[
https://issues.apache.org/jira/browse/HDFS-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490768#comment-16490768
]
Sean Mackrory commented on HDFS-13121:
--------------------------------------
I had an offline conversation with [~zvenczel] about the lack of tests - and
testing this requires a pretty unreasonable level of refactoring and / or
introducing new dependencies to do the mocking. One piece of feedback, though,
is that I'd like to see a more helpful error message. The stack trace could
show them that one of those fields was null, so if the text we pass to the
exception could include, "This is often because Hadoop has exceeded the allowed
number of open file descriptors" or something like that that hints at the
likely root cause and possible solution would be good.
> NPE when request file descriptors when SC read
> ----------------------------------------------
>
> Key: HDFS-13121
> URL: https://issues.apache.org/jira/browse/HDFS-13121
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 3.0.0
> Reporter: Gang Xie
> Assignee: Zsolt Venczel
> Priority: Minor
> Attachments: HDFS-13121.01.patch
>
>
> Recently, we hit an issue that the DFSClient throws NPE. The case is that,
> the app process exceeds the limit of the max open file. In the case, the
> libhadoop never throw and exception but return null to the request of fds.
> But requestFileDescriptors use the returned fds directly without any check
> and then NPE.
>
> We need add a sanity check here of null pointer.
>
> private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
> Slot slot) throws IOException {
> ShortCircuitCache cache = clientContext.getShortCircuitCache();
> final DataOutputStream out =
> new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
> SlotId slotId = slot == null ? null : slot.getSlotId();
> new Sender(out).requestShortCircuitFds(block, token, slotId, 1,
> failureInjector.getSupportsReceiptVerification());
> DataInputStream in = new DataInputStream(peer.getInputStream());
> BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
> PBHelperClient.vintPrefixed(in));
> DomainSocket sock = peer.getDomainSocket();
> failureInjector.injectRequestFileDescriptorsFailure();
> switch (resp.getStatus()) {
> case SUCCESS:
> byte buf[] = new byte[1];
> FileInputStream[] fis = new FileInputStream[2];
> {color:#d04437}sock.recvFileInputStreams(fis, buf, 0, buf.length);{color}
> ShortCircuitReplica replica = null;
> try {
> ExtendedBlockId key =
> new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
> if (buf[0] == USE_RECEIPT_VERIFICATION.getNumber()) {
> LOG.trace("Sending receipt verification byte for slot {}", slot);
> sock.getOutputStream().write(0);
> }
> {color:#d04437}replica = new ShortCircuitReplica(key, fis[0], fis[1],
> cache,{color}
> {color:#d04437} Time.monotonicNow(), slot);{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]