[ https://issues.apache.org/jira/browse/HDFS-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793857#comment-16793857 ]
Ayush Saxena edited comment on HDFS-14316 at 3/15/19 6:49 PM: -------------------------------------------------------------- Thanx [~elgoiri] for the patch. Had a quick look at this!!! * I guess we are retrying here for all exceptions encountered? Might be we should restrict retrying to just certain cases and let fail for some genuine ones like AccessControlException,Which are supposed to fail for all subclusters. * {code:java} final List<RemoteLocation> locations = new ArrayList<>(); for (RemoteLocation loc : rpcServer.getLocationsForPath(src, true)) { if (!loc.equals(createLocation)) { locations.add(loc); } {code} I guess this isn't working as intended, if in case the namenode is in StandbyState and isn't able to give the block locations.Thus throwing the exception at : {code:java} createLocation = rpcServer.getCreateLocation(src); {code} The createLocation stays null. So in the above Loop we land up iterating checking no null entry.Literally doing nothing Got the Log from the UT too as : {noformat} 2019-03-15 23:57:47,751 [IPC Server handler 6 on default port 38833] ERROR router.RouterClientProtocol (RouterClientProtocol.java:create(253)) - Cannot create /HASH_ALL-failsubcluster/dir100/file5.txt in null: No namenode available to invoke getBlockLocations [/HASH_ALL-failsubcluster/dir100/file5.txt, 0, 1]{noformat} * {code:java} // Check if this file already exists in other subclusters LocatedBlocks existingLocation = getBlockLocations(src, 0, 1); {code} If we supress the exception here. Is there a chance we may land up creating a file that already existed in the other subCluster? was (Author: ayushtkn): Thanx [~elgoiri] for the patch. Had a quick look at this!!! * I guess we are retrying here for all exceptions encountered? Might be we should restrict retrying to just certain cases and let fail for some genuine ones like AccessControlException,Which are supposed to fail for all subclusters. * {code:java} final List<RemoteLocation> locations = new ArrayList<>(); for (RemoteLocation loc : rpcServer.getLocationsForPath(src, true)) { if (!loc.equals(createLocation)) { locations.add(loc); } {code} I guess this isn't working as intended, if in case the namenode is in StandbyState and isn't able to give the block locations.Thus throwing the exception at : {code:java} createLocation = rpcServer.getCreateLocation(src); {code} The createLocation stays null. So in the above Loop we land up iterating checking no null entry.Literally doing nothing Got the Log from the UT too as : {noformat} 2019-03-15 23:57:47,751 [IPC Server handler 6 on default port 38833] ERROR router.RouterClientProtocol (RouterClientProtocol.java:create(253)) - Cannot create /HASH_ALL-failsubcluster/dir100/file5.txt in null: No namenode available to invoke getBlockLocations [/HASH_ALL-failsubcluster/dir100/file5.txt, 0, 1]{noformat} > RBF: Support unavailable subclusters for mount points with multiple > destinations > -------------------------------------------------------------------------------- > > Key: HDFS-14316 > URL: https://issues.apache.org/jira/browse/HDFS-14316 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Íñigo Goiri > Assignee: Íñigo Goiri > Priority: Major > Attachments: HDFS-14316-HDFS-13891.000.patch, > HDFS-14316-HDFS-13891.001.patch, HDFS-14316-HDFS-13891.002.patch, > HDFS-14316-HDFS-13891.003.patch, HDFS-14316-HDFS-13891.004.patch, > HDFS-14316-HDFS-13891.005.patch, HDFS-14316-HDFS-13891.006.patch, > HDFS-14316-HDFS-13891.007.patch > > > Currently mount points with multiple destinations (e.g., HASH_ALL) fail > writes when the destination subcluster is down. We need an option to allow > writing in other subclusters when one is down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org