[
https://issues.apache.org/jira/browse/HDFS-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294337#comment-15294337
]
Bob Hansen commented on HDFS-10441:
-----------------------------------
Notes in chronological order; most of these are minor, nitpicky points.
* Should Options::ha_namenodes_ be a map<string,HANamenodeInfo> to make it
easier to look up?
* Rather than have a server_exception_type, I would like to see the explicit
error types as peers to our other error types. The fact that it came from a
Java exception on the server side is an implementation detail of the
communications. The kExceptionType should really be an "UnexpectedException"
type, and all of the others should be mapped to semantically informative types
(using an awesome lookup table, like you did)
* Not super-critical, but it's a little bit cleaner and (possibly) faster to
use strstr in HDFSConfiguration::SplitOnComma. Once you know the start and
end, you can use std::string::string(const char *, int) to construct the string.
* Instead of trying to manually parse the URI in ParsePartialURI, would it be
easier to do a quick check for "://", and if it's not there, prepend
DEFAULT_SCHEME (hdfs://), then pass it to the full-featured URI parser?
* By design the Configuration class is really a stupid container. If we can,
all the logic should be in the HDFSConfigurationLoader.
* It seems that the nameservice lookups aren't actually URIs; they're just
server:port addresses. Perphaps rather than storing a URI object, the
HANamenodeInfo should just have a server and port field. The rest of the URI
semantic space (scheme, user, password, path, fragment, etc.) really doesn't
apply here, and would be an error if it appeared in the config file.
* I would prefer "FormatURI" or "ToString" over "FmtURI" as a method name
* I've noticed that we've grown used to "if (!stat.ok) { LOG_XXX(..., <Some
fixed string>)". In trying to debug problems, I'm unhappy with me for not
including the error message. As you touch any logging messages, << out the
status.ToString() also
* As an extension of that: LOG_WARN(kRPC, << "Tried to connect to standby!");
What node was I trying to connect to?
* EnqueueRequestsWithLock: I think elsewhere in the codebase we have used
EnqueueRequests_locked; let's keep that standard
* is_locked: we already have a lock_held method in util.h (that works with
reentrant mutexes, too)
* I know we're going to add more-than-one HA NN later, but a good idiom is to
keep them in a sorted vector, and have an index pointing to the current one.
Then make next = (curr + 1) % size.
* Name lookup in ResoloveAllEndpoints needs to be async. This can be follow-up
if necessary.
* In general, having a recursive_mutex (HA NN Tracker) is a bad smell that
we're not thinking about concurrency tightly enough. I'm not sure that's the
case here, but let's think about it.
* Make sure to clean up TODOs and anything referring to internal server names
* Unset port should be a WARNING, if we're going to default and move on
* In RpcEngine::Start, if resolution fails and we clear out ha_persisted_info_,
what will happen when we run?
* Better logging than "----" and "Caught this early"
* In RpcEngine::RpcCommsError, we should try for symmetry in retry count and
failure count. Let's pull the IncrementRetryCount out of being inline, and
just use both getters in the call to ShouldRetry.
* You have a comment for eventually failing out messages if we're bouncing
between standby nodes, but it looks like we only actually check if it's not a
kStandbyException. If I'm mis-reading, let me know.
* We don't need the first line here, do we?
+ last_endpoints_.clear();
+ last_endpoints_ = new_active_nn_info.endpoints;
* In RpcConnection::HandleRpcResponse, I think we should be checking status and
erroring out before calling RemoveFromRunningQueue. We should look at this
closely, and then we could get rid of the EnqueueReqeustsWithLock method.
> libhdfs++: HA namenode support
> ------------------------------
>
> Key: HDFS-10441
> URL: https://issues.apache.org/jira/browse/HDFS-10441
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: James Clampffer
> Attachments: HDFS-10441.HDFS-8707.000.patch
>
>
> If a cluster is HA enabled then do proper failover.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]