[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

Colin Patrick McCabe (JIRA) Fri, 01 Aug 2014 14:10:41 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082965#comment-14082965
 ]


Colin Patrick McCabe commented on HDFS-6803:
--------------------------------------------

I agree with 2.1 (positional read and non-positional can run concurrently), and 
2.2 (Two or more positional reads can run concurrently.)

2.3 seems both too strict and too loose at the same time, if that makes any 
sense.  Too strict, because it talks about some internal details of HDFS (a 
file's length will  not change if lastBlock is complete).  When using a POSIX 
filesystem like Ceph or Lustre, a file's length can change at any time.  We 
should try to accommodate the existence of those systems, even though we don't 
plan on adding random write support to the HCFS.  Others may modify the files 
we're reading from outside of Hadoop.  This problem exists with LocalFileSystem 
as well, of course.

2.3 is too loose because it doesn't specify HOW getFileLength interacts with 
read, pread, and other calls.  If we are using the new HDFS-6633 feature (HDFS 
tail) and new data is coming in, does getFileLength return that new length all 
the time?  Or does it keep returning the old length?  Can getFileLength run 
concurrently with any other functions?

I would argue that {{getFileLength}} should be able to run concurrently with 
{{read}} and {{pread}}.  I would also argue that it should be allowed to change 
over time, and even get shorter.  (Of course it will never get shorter in the 
specific case of HDFS, but for LocalFileSystem... it can.)  For HDFS, 
{{getFileLength}} should be able to return the last known file length without 
blocking or waiting for anything-- i.e. check an AtomicLong or take a mutex on 
something smaller than the whole stream.

Also, it would be nice to add a section specifying that when we do two 
non-positional reads at the same time, they may wait for each other to complete 
before proceeding.  And {{getPos}}, {{seek}}, and {{skip}} may wait for 
non-positional reads to complete before running.

Basically, what this looks like is grouping the functions into two sets:
Group P: read, getPos, seek, skip, zero-copy read, releaseBuffer
Group N: pread, getFileLength, setReadahead, setDropBehind, getReadStatistics

Functions in group P can all block each other (probably they grab the same 
mutex, although this isn't guaranteed).

Functions in group N do not ever block each other or functions in group P for a 
long time (although they may take a mutex or two for a very short amount of 
time, it's not the same mutex as for group P, and they don't hang on to it 
while doing I/O.)

> Documenting DFSClient#DFSInputStream expectations reading and preading in 
> concurrent context
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6803
>                 URL: https://issues.apache.org/jira/browse/HDFS-6803
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 2.4.1
>            Reporter: stack
>         Attachments: DocumentingDFSClientDFSInputStream (1).pdf
>
>
> Reviews of the patch posted the parent task suggest that we be more explicit 
> about how DFSIS is expected to behave when being read by contending threads. 
> It is also suggested that presumptions made internally be made explicit 
> documenting expectations.
> Before we put up a patch we've made a document of assertions we'd like to 
> make into tenets of DFSInputSteam.  If agreement, we'll attach to this issue 
> a patch that weaves the assumptions into DFSIS as javadoc and class comments. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

Reply via email to