[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053396#comment-15053396
 ] 

Daryn Sharp commented on HDFS-7163:
-----------------------------------

Regarding use of client-side block locations, I did that internally for 0.23 
and it wasn't as easy as it seemed.  The excluded nodes support is actually 
much cleaner for the client.  The NN also issues the redirect based on the 
block location of the offset.

Code:
# We may want to defer the open until a read occurs.  Otherwise when 
immediately seeking (ex. splits), the file will start unnecessarily streaming 
and consuming bandwidth, the client will close the connection, then reopen at 
the seek offset.
# {{runnerState}} is initialized as CLOSED (terminal state) although reassigned 
as DISCONNECTED in the ctor.  Feels like it should just be initialized as 
DISCONNECTED.
# If {{read(...)}} throws an IOE due to an explicitly closed stream, will 
retries occur?
# In {{connect(URL)}}, {{conn = connection}}, conn is re-assigned if null, conn 
is returned, but connection remains null.  This seems confusing and wrong at 
first.  Calling it {{cachedConnection}} would clarify its purpose.
# In {{getResponse}}:
##  Should {{initializeInputStream}} be unconditionally invoked inside the 
prior null check on connection?  Ie.  Is there ever a case when {{in}} 
shouldn't be initialized when a new connection is made?
## I think the logic should be {{if (conn != cachedConnection) { 
cachedConnection = conn; in = initializeInputStream(cachedConnection) } }}
## Should use {{URL#getAuthority}} instead of explicitly extracting and joining 
the host and port.
# In {{ReadRunner#initializeInputStream}} has a misspelled "performznt".
# In {{closeInputStream}}, I'd use {{IOUtils.closeStream}} to ensure the close 
doesn't throw which would prevent the stream state from being updated.
# In general the state management isn't clear.  DISCONNECTED vs SEEK appear to 
be the same, with the exception that SEEK allows the connection to be reopened. 
 When errors occur and the stream is DISCONNECTED, are you sure it will 
retry/recover in all cases?

Tests:
# In {{testWebHdfsReadRetries}}
## A 5m timeout seems overly generous for something that will hopefully fail 
much faster if there's a problem.
## Why the 5s safemode extension?  Seems like it will unnecessarily slow down 
the test?
## The healthy check on dfs appears redundant since {{cluster#waitActive()}} 
already checked.
#  The client shouldn't just give up and do nothing for InvalidToken.  It's 
supposed to try and get another token and retry the read.  It's unclear if that 
actually happens or if the position is retaining correctly?
# May consider more mockito verifies.


> WebHdfsFileSystem should retry reads according to the configured retry policy.
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-7163
>                 URL: https://issues.apache.org/jira/browse/HDFS-7163
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 3.0.0, 2.5.1
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>         Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to