[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

Steve Loughran (JIRA) Sun, 03 Aug 2014 04:22:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083949#comment-14083949
 ]


Steve Loughran commented on HDFS-6803:
--------------------------------------

This is fun, stack's  just opened up a whole new bag of inconsistencies.

h2. Consistency with actual file data & metadata

We should state that changes to a file (length, contents, existence, perms) may 
not be visible to an open stream; if they do become visible there are no 
guarantees when those changes become visible. That could include partway 
through a readFully operation -this cannot guaranteed to be atomic.


h2. Isolation of pread operations

When a pread is in progress, should that change be visible in {{getPos()}}? 

# If not, the method will need to be made {{synchronized}} on all 
implementations (it isn't right now; I checked). I
# If it can be visible, then we could pull the  {{synchronized}} marker off 
some implementations and remove that as a lock point.

h2. Failure Modes in concurrent/serialized operations

One problem with concurrency on read+pread is something I hadn't thought of 
before: on any failure of a pread, the pos value must be reset to the previous 
one. Everything appears to do this; the test would be

{code}

read()
try{
read(EOF+2)
} catch (IOException) {
}
assertTrue(getPos()<=EOF)
read()
{code}

The second {{read()}} would succeed/return -1 depending on the position, and 
not an {{EOFException}}. The same outcome must happen for a negative pread 
attempt.

 If someone were to add this to {{AbstractContractSeekTest}} it'd get picked up 
by all the implementations and we could see what happens.

Looking at the standard impl, it does seek() back in a finally block -but if 
there is an exception in the read(), then a subsequent exception in the final 
seek() would lose that. I think it should be reworked to catch any IOE in the 
read operation and do an exception-swallowing seek-back in this case. Or just 
do it for EOFException now that Hadoop 2.5+ has all the standard filesystems 
throwing EOFException consistently.



> Documenting DFSClient#DFSInputStream expectations reading and preading in 
> concurrent context
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6803
>                 URL: https://issues.apache.org/jira/browse/HDFS-6803
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 2.4.1
>            Reporter: stack
>         Attachments: DocumentingDFSClientDFSInputStream (1).pdf
>
>
> Reviews of the patch posted the parent task suggest that we be more explicit 
> about how DFSIS is expected to behave when being read by contending threads. 
> It is also suggested that presumptions made internally be made explicit 
> documenting expectations.
> Before we put up a patch we've made a document of assertions we'd like to 
> make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
> a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

Reply via email to