[
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082965#comment-14082965
]
Colin Patrick McCabe commented on HDFS-6803:
--------------------------------------------
I agree with 2.1 (positional read and non-positional can run concurrently), and
2.2 (Two or more positional reads can run concurrently.)
2.3 seems both too strict and too loose at the same time, if that makes any
sense. Too strict, because it talks about some internal details of HDFS (a
file's length will not change if lastBlock is complete). When using a POSIX
filesystem like Ceph or Lustre, a file's length can change at any time. We
should try to accommodate the existence of those systems, even though we don't
plan on adding random write support to the HCFS. Others may modify the files
we're reading from outside of Hadoop. This problem exists with LocalFileSystem
as well, of course.
2.3 is too loose because it doesn't specify HOW getFileLength interacts with
read, pread, and other calls. If we are using the new HDFS-6633 feature (HDFS
tail) and new data is coming in, does getFileLength return that new length all
the time? Or does it keep returning the old length? Can getFileLength run
concurrently with any other functions?
I would argue that {{getFileLength}} should be able to run concurrently with
{{read}} and {{pread}}. I would also argue that it should be allowed to change
over time, and even get shorter. (Of course it will never get shorter in the
specific case of HDFS, but for LocalFileSystem... it can.) For HDFS,
{{getFileLength}} should be able to return the last known file length without
blocking or waiting for anything-- i.e. check an AtomicLong or take a mutex on
something smaller than the whole stream.
Also, it would be nice to add a section specifying that when we do two
non-positional reads at the same time, they may wait for each other to complete
before proceeding. And {{getPos}}, {{seek}}, and {{skip}} may wait for
non-positional reads to complete before running.
Basically, what this looks like is grouping the functions into two sets:
Group P: read, getPos, seek, skip, zero-copy read, releaseBuffer
Group N: pread, getFileLength, setReadahead, setDropBehind, getReadStatistics
Functions in group P can all block each other (probably they grab the same
mutex, although this isn't guaranteed).
Functions in group N do not ever block each other or functions in group P for a
long time (although they may take a mutex or two for a very short amount of
time, it's not the same mutex as for group P, and they don't hang on to it
while doing I/O.)
> Documenting DFSClient#DFSInputStream expectations reading and preading in
> concurrent context
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-6803
> URL: https://issues.apache.org/jira/browse/HDFS-6803
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Affects Versions: 2.4.1
> Reporter: stack
> Attachments: DocumentingDFSClientDFSInputStream (1).pdf
>
>
> Reviews of the patch posted the parent task suggest that we be more explicit
> about how DFSIS is expected to behave when being read by contending threads.
> It is also suggested that presumptions made internally be made explicit
> documenting expectations.
> Before we put up a patch we've made a document of assertions we'd like to
> make into tenets of DFSInputSteam. If agreement, we'll attach to this issue
> a patch that weaves the assumptions into DFSIS as javadoc and class comments.
--
This message was sent by Atlassian JIRA
(v6.2#6252)