[
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083949#comment-14083949
]
Steve Loughran commented on HDFS-6803:
--------------------------------------
This is fun, stack's just opened up a whole new bag of inconsistencies.
h2. Consistency with actual file data & metadata
We should state that changes to a file (length, contents, existence, perms) may
not be visible to an open stream; if they do become visible there are no
guarantees when those changes become visible. That could include partway
through a readFully operation -this cannot guaranteed to be atomic.
h2. Isolation of pread operations
When a pread is in progress, should that change be visible in {{getPos()}}?
# If not, the method will need to be made {{synchronized}} on all
implementations (it isn't right now; I checked). I
# If it can be visible, then we could pull the {{synchronized}} marker off
some implementations and remove that as a lock point.
h2. Failure Modes in concurrent/serialized operations
One problem with concurrency on read+pread is something I hadn't thought of
before: on any failure of a pread, the pos value must be reset to the previous
one. Everything appears to do this; the test would be
{code}
read()
try{
read(EOF+2)
} catch (IOException) {
}
assertTrue(getPos()<=EOF)
read()
{code}
The second {{read()}} would succeed/return -1 depending on the position, and
not an {{EOFException}}. The same outcome must happen for a negative pread
attempt.
If someone were to add this to {{AbstractContractSeekTest}} it'd get picked up
by all the implementations and we could see what happens.
Looking at the standard impl, it does seek() back in a finally block -but if
there is an exception in the read(), then a subsequent exception in the final
seek() would lose that. I think it should be reworked to catch any IOE in the
read operation and do an exception-swallowing seek-back in this case. Or just
do it for EOFException now that Hadoop 2.5+ has all the standard filesystems
throwing EOFException consistently.
> Documenting DFSClient#DFSInputStream expectations reading and preading in
> concurrent context
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-6803
> URL: https://issues.apache.org/jira/browse/HDFS-6803
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Affects Versions: 2.4.1
> Reporter: stack
> Attachments: DocumentingDFSClientDFSInputStream (1).pdf
>
>
> Reviews of the patch posted the parent task suggest that we be more explicit
> about how DFSIS is expected to behave when being read by contending threads.
> It is also suggested that presumptions made internally be made explicit
> documenting expectations.
> Before we put up a patch we've made a document of assertions we'd like to
> make into tenets of DFSInputSteam. If agreement, we'll attach to this issue
> a patch that weaves the assumptions into DFSIS as javadoc and class comments.
--
This message was sent by Atlassian JIRA
(v6.2#6252)