[
https://issues.apache.org/jira/browse/HDFS-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396495#comment-15396495
]
Wei-Chiu Chuang commented on HDFS-8897:
---------------------------------------
bq. If we add the code to start the balancer with that namenodes, you will see
Balancer: namenodes = [hdfs://sandbox/, hdfs://sandbox] without the fix.
I see. Please move that code to where balancer is not yet start.
bq. This case is tricky because {fs.defaultFS}} is valid but with a trailing
slash, everything else works fine except Balancer, otherwise users would have
detected the problem much earlier in other ways.
It looks to me that whenever the URI fs.defaultFS is used, the consumer of the
URI takes either the scheme, the authority, the host or the port of the URI,
but the path component is ignored, at least when the scheme is hdfs.
> Balancer should handle fs.defaultFS with trailing slashes in HA
> ---------------------------------------------------------------
>
> Key: HDFS-8897
> URL: https://issues.apache.org/jira/browse/HDFS-8897
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer & mover
> Affects Versions: 2.7.1
> Environment: Centos 6.6
> Reporter: LINTE
> Assignee: John Zhuge
> Attachments: HDFS-8897.001.patch
>
>
> When balancer is launched, it should test if there is already a
> /system/balancer.id file in HDFS.
> When the file doesn't exist, the balancer don't want to run :
> 15/08/14 16:35:12 INFO balancer.Balancer: namenodes = [hdfs://sandbox/,
> hdfs://sandbox]
> 15/08/14 16:35:12 INFO balancer.Balancer: parameters =
> Balancer.Parameters[BalancingPolicy.Node, threshold=10.0, max idle iteration
> = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move
> Bytes Being Moved
> 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs,
> 30mins, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Block token params received from
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 15/08/14 16:35:14 INFO block.BlockTokenSecretManager: Setting block keys
> 15/08/14 16:35:14 INFO balancer.KeyManager: Update block keys every 2hrs,
> 30mins, 0sec
> java.io.IOException: Another Balancer is running.. Exiting ...
> Aug 14, 2015 4:35:14 PM Balancing took 2.408 seconds
> Looking at the audit log file when trying to run the balancer, the balancer
> create the /system/balancer.id and then delete it on exiting ...
> 2015-08-14 16:37:45,844 INFO FSNamesystem.audit: allowed=true
> [email protected] (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo
> src=/system/balancer.id dst=null perm=null proto=rpc
> 2015-08-14 16:37:45,900 INFO FSNamesystem.audit: allowed=true
> [email protected] (auth:KERBEROS) ip=/x.x.x.x cmd=create
> src=/system/balancer.id dst=null perm=hdfs:hadoop:rw-r-----
> proto=rpc
> 2015-08-14 16:37:45,919 INFO FSNamesystem.audit: allowed=true
> [email protected] (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo
> src=/system/balancer.id dst=null perm=null proto=rpc
> 2015-08-14 16:37:46,090 INFO FSNamesystem.audit: allowed=true
> [email protected] (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo
> src=/system/balancer.id dst=null perm=null proto=rpc
> 2015-08-14 16:37:46,112 INFO FSNamesystem.audit: allowed=true
> [email protected] (auth:KERBEROS) ip=/x.x.x.x cmd=getfileinfo
> src=/system/balancer.id dst=null perm=null proto=rpc
> 2015-08-14 16:37:46,117 INFO FSNamesystem.audit: allowed=true
> [email protected] (auth:KERBEROS) ip=/x.x.x.x cmd=delete
> src=/system/balancer.id dst=null perm=null proto=rpc
> The error seems to be located in
> org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java
> The function checkAndMarkRunning return null even if the /system/balancer.id
> doesn't exist before entering this function; if it exists, then it is deleted
> and the balancer exit with the same error.
> ----
> private OutputStream checkAndMarkRunning() throws IOException {
> try {
> if (fs.exists(idPath)) {
> // try appending to it so that it will fail fast if another balancer
> is
> // running.
> IOUtils.closeStream(fs.append(idPath));
> fs.delete(idPath, true);
> }
> final FSDataOutputStream fsout = fs.create(idPath, false);
> // mark balancer idPath to be deleted during filesystem closure
> fs.deleteOnExit(idPath);
> if (write2IdFile) {
> fsout.writeBytes(InetAddress.getLocalHost().getHostName());
> fsout.hflush();
> }
> return fsout;
> } catch(RemoteException e) {
>
> if(AlreadyBeingCreatedException.class.getName().equals(e.getClassName())){
> return null;
> } else {
> throw e;
> }
> }
> }
> ----
> Regards
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]