[
https://issues.apache.org/jira/browse/HDFS-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309780#comment-14309780
]
Colin Patrick McCabe commented on HDFS-7694:
--------------------------------------------
bq. CanUnbuffer ain't too pretty. Unbufferable is about as ugly. Its fine I
suppose as is.
It's consistent with our other "input stream extension" interfaces such as
{{Syncable}}, {{CanSetReadahead}}, etc. The problem is that we can't add the
new APIs to {{FSInputStream}}, or else we'd break a bunch of non-HDFS streams
(in and out of the tree) that don't implement the new API. I guess Java is
adding default implementations for interface functions in some future
version... too bad we're not there yet.
bq.l In DFSIS#unbuffer, should we be resetting data members back to zero, etc?
I'm not sure what else we'd reset. This isn't changing the {{closed}} state,
it's not a seek so the {{pos}} is not affected, it's not changing the
{{cachingStrategy}} or {{fileEncryptionInfo}}... we certainly don't want to
clear the block location info because then we need to do an RPC to the NN to
get it again...
Actually I do see one thing we should change. We should set {{blockEnd}} to
-1. Otherwise, {{seek}} may attempt to use {{blockReader}} even though it's
{{null}}. It seems like this is also a problem in {{closeCurrentBlockReader}}.
And let me add a {{seek}} after the unbuffer in {{testUnbufferClosesSockets}}
to make sure that this doesn't regress.
bq. In testOpenManyFilesViaTcp, we assert we can read but is there a reason why
we would not be able to that unbuffer enables? (pardon if dumb question)
Not a dumb question at all. What I was testing here was that opening a lot of
files didn't consume too many resources. In my local test environment, I
increased {{NUM_OPENS}} to be a really big number... I didn't want to burden
Jenkins too much, though. {{testUnbufferClosesSockets}} is a more "direct" and
straightforward test than {{testOpenManyFilesViaTcp}}... the latter is perhaps
more of a stress test.
> FSDataInputStream should support "unbuffer"
> -------------------------------------------
>
> Key: HDFS-7694
> URL: https://issues.apache.org/jira/browse/HDFS-7694
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7694.001.patch, HDFS-7694.002.patch
>
>
> For applications that have many open HDFS (or other Hadoop filesystem) files,
> it would be useful to have an API to clear readahead buffers and sockets.
> This could be added to the existing APIs as an optional interface, in much
> the same way as we added setReadahead / setDropBehind / etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)