[ 
https://issues.apache.org/jira/browse/HDFS-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309780#comment-14309780
 ] 

Colin Patrick McCabe commented on HDFS-7694:
--------------------------------------------

bq. CanUnbuffer ain't too pretty. Unbufferable is about as ugly. Its fine I 
suppose as is.

It's consistent with our other "input stream extension" interfaces such as 
{{Syncable}}, {{CanSetReadahead}}, etc.  The problem is that we can't add the 
new APIs to {{FSInputStream}}, or else we'd break a bunch of non-HDFS streams 
(in and out of the tree) that don't implement the new API.  I guess Java is 
adding default implementations for interface functions in some future 
version... too bad we're not there yet.

bq.l In DFSIS#unbuffer, should we be resetting data members back to zero, etc?

I'm not sure what else we'd reset.  This isn't changing the {{closed}} state, 
it's not a seek so the {{pos}} is not affected, it's not changing the 
{{cachingStrategy}} or {{fileEncryptionInfo}}... we certainly don't want to 
clear the block location info because then we need to do an RPC to the NN to 
get it again...

Actually I do see one thing we should change.  We should set {{blockEnd}} to 
-1.  Otherwise, {{seek}} may attempt to use {{blockReader}} even though it's 
{{null}}.  It seems like this is also a problem in {{closeCurrentBlockReader}}. 
 And let me add a {{seek}} after the unbuffer in {{testUnbufferClosesSockets}} 
to make sure that this doesn't regress.

bq. In testOpenManyFilesViaTcp, we assert we can read but is there a reason why 
we would not be able to that unbuffer enables? (pardon if dumb question)

Not a dumb question at all.  What I was testing here was that opening a lot of 
files didn't consume too many resources.  In my local test environment, I 
increased {{NUM_OPENS}} to be a really big number... I didn't want to burden 
Jenkins too much, though.  {{testUnbufferClosesSockets}} is a more "direct" and 
straightforward test than {{testOpenManyFilesViaTcp}}... the latter is perhaps 
more of a stress test.

> FSDataInputStream should support "unbuffer"
> -------------------------------------------
>
>                 Key: HDFS-7694
>                 URL: https://issues.apache.org/jira/browse/HDFS-7694
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7694.001.patch, HDFS-7694.002.patch
>
>
> For applications that have many open HDFS (or other Hadoop filesystem) files, 
> it would be useful to have an API to clear readahead buffers and sockets.  
> This could be added to the existing APIs as an optional interface, in much 
> the same way as we added setReadahead / setDropBehind / etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to