[ https://issues.apache.org/jira/browse/HDFS-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805094#comment-13805094 ]
Jing Zhao commented on HDFS-5364: --------------------------------- Some comments so far: # OpenFileCtxCache#scan holds the monitor of OpenFileCtxCache. In the meanwhile, it calls OpenFileCtx#streamCleanup, which may be blocked for relatively long time in some I/O ops. This can block new OpenFileCtx from being added to the cache. # It may be better to move class StreamMonitor from WriteManager to OpenFileCtxCache. # In WriteManager.java, {code} FileHandle handle = request.getHandle(); if (LOG.isDebugEnabled()) { LOG.debug("handleWrite fileId: " + handle.getFileId() + " offset: " - + offset + " length:" + count + " stableHow:" + stableHow.getValue()); + + offset + " length:" + count + " stableHow:" + stableHow.name()); } {code} Do you want to use the new added WRITE3Request#toString here? # Looks like DFSClient#append does not unwrap the AlreadyBeingCreatedException, thus we may not be able to catch the AlreadyBeingCreatedException directly in the following code. {code} + } catch (AlreadyBeingCreatedException e) { + LOG.warn("Can't append file:" + fileIdPath + + ". Possibly the file is being closed. Drop the request:" + + request + ", wait for the client to retry..."); + return; {code} # If the following code, we first create fos for appending data (through an NN RPC), then check if we can add a new OpenFileCtx into the cache, if not, we will close fos (through NN RPC again). So here can we first check the cache availablity, and create fos only when cache has available spot? {code} + if (!addOpenFileStream(fileHandle, openFileCtx)) { + LOG.info("Can't add new stream. Close it. Tell client to retry."); + try { + fos.close(); + } catch (IOException e) { + LOG.error("Can't close stream for fileId:" + handle.getFileId()); + } {code} > Add OpenFileCtx cache > --------------------- > > Key: HDFS-5364 > URL: https://issues.apache.org/jira/browse/HDFS-5364 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs > Reporter: Brandon Li > Assignee: Brandon Li > Attachments: HDFS-5364.001.patch, HDFS-5364.002.patch > > > NFS gateway can run out of memory when the stream timeout is set to a > relatively long period(e.g., >1 minute) and user uploads thousands of files > in parallel. Each stream DFSClient creates a DataStreamer thread, and will > eventually run out of memory by creating too many threads. > NFS gateway should have a OpenFileCtx cache to limit the total opened files. -- This message was sent by Atlassian JIRA (v6.1#6144)