[ 
https://issues.apache.org/jira/browse/HDFS-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294337#comment-15294337
 ] 

Bob Hansen commented on HDFS-10441:
-----------------------------------

Notes in chronological order; most of these are minor, nitpicky points.

* Should Options::ha_namenodes_ be a map<string,HANamenodeInfo> to make it 
easier to look up?

* Rather than have a server_exception_type, I would like to see the explicit 
error types as peers to our other error types.  The fact that it came from a 
Java exception on the server side is an implementation detail of the 
communications.  The kExceptionType should really be an "UnexpectedException" 
type, and all of the others should be mapped to semantically informative types 
(using an awesome lookup table, like you did)

* Not super-critical, but it's a little bit cleaner and (possibly) faster to 
use strstr in HDFSConfiguration::SplitOnComma.  Once you know the start and 
end, you can use std::string::string(const char *, int) to construct the string.

* Instead of trying to manually parse the URI in ParsePartialURI, would it be 
easier to do a quick check for "://", and if it's not there, prepend 
DEFAULT_SCHEME (hdfs://), then pass it to the full-featured URI parser?

* By design the Configuration class is really a stupid container.  If we can, 
all the logic should be in the HDFSConfigurationLoader.

* It seems that the nameservice lookups aren't actually URIs; they're just 
server:port addresses.  Perphaps rather than storing a URI object, the 
HANamenodeInfo should just have a server and port field.  The rest of the URI 
semantic space (scheme, user, password, path, fragment, etc.) really doesn't 
apply here, and would be an error if it appeared in the config file.

* I would prefer "FormatURI" or "ToString" over "FmtURI" as a method name

* I've noticed that we've grown used to "if (!stat.ok) { LOG_XXX(..., <Some 
fixed string>)".  In trying to debug problems, I'm unhappy with me for not 
including the error message.  As you touch any logging messages, << out the 
status.ToString() also

* As an extension of that: LOG_WARN(kRPC, << "Tried to connect to standby!");  
What node was I trying to connect to?  

* EnqueueRequestsWithLock: I think elsewhere in the codebase we have used 
EnqueueRequests_locked; let's keep that standard

* is_locked: we already have a lock_held method in util.h (that works with 
reentrant mutexes, too)

* I know we're going to add more-than-one HA NN later, but a good idiom is to 
keep them in a sorted vector, and have an index pointing to the current one.  
Then make next = (curr + 1) % size.

* Name lookup in ResoloveAllEndpoints needs to be async.  This can be follow-up 
if necessary.

* In general, having a recursive_mutex (HA NN Tracker) is a bad smell that 
we're not thinking about concurrency tightly enough.  I'm not sure that's the 
case here, but let's think about it.

* Make sure to clean up TODOs and anything referring to internal server names

* Unset port should be a WARNING, if we're going to default and move on

* In RpcEngine::Start, if resolution fails and we clear out ha_persisted_info_, 
what will happen when we run?

* Better logging than "----" and "Caught this early"

* In RpcEngine::RpcCommsError, we should try for symmetry in retry count and 
failure count.  Let's pull the IncrementRetryCount out of being inline, and 
just use both getters in the call to ShouldRetry.

* You have a comment for eventually failing out messages if we're bouncing 
between standby nodes, but it looks like we only actually check if it's not a 
kStandbyException.  If I'm mis-reading, let me know.

* We don't need the first line here, do we?
+          last_endpoints_.clear();
+          last_endpoints_ = new_active_nn_info.endpoints;

* In RpcConnection::HandleRpcResponse, I think we should be checking status and 
erroring out before calling RemoveFromRunningQueue.  We should look at this 
closely, and then we could get rid of the EnqueueReqeustsWithLock method.

> libhdfs++: HA namenode support
> ------------------------------
>
>                 Key: HDFS-10441
>                 URL: https://issues.apache.org/jira/browse/HDFS-10441
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>         Attachments: HDFS-10441.HDFS-8707.000.patch
>
>
> If a cluster is HA enabled then do proper failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to