[
https://issues.apache.org/jira/browse/HBASE-20668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser updated HBASE-20668:
-------------------------------
Description:
The exception from FileSystem operation in the finally block of
ExportSnapshot#doWork may hide the real exception from FileUtil.copy call.
I was debugging the following error [~romil.choksi] saw during testing
ExportSnapshot :
{code:java}
2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|2018-06-01 02:40:52,358 ERROR
[main] util.AbstractHBaseTool: Error running command-line tool
2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|java.io.FileNotFoundException:
Directory/File does not exist /apps/
hbase/data/.hbase-snapshot/.tmp/snapshot_table_334546
2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.
checkOwner(FSDirectory.java:1777)
2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.
setOwner(FSDirAttrOp.java:82)
{code}
Here is corresponding code (with extra log added):
{code:java}
try {
LOG.info("Copy Snapshot Manifest from " + snapshotDir + " to " +
initialOutputSnapshotDir);
boolean ret = FileUtil.copy(inputFs, snapshotDir, outputFs,
initialOutputSnapshotDir, false,
false, conf);
LOG.info("return val = " + ret);
} catch (IOException e) {
LOG.warn("Failed to copy the snapshot directory: from=" +
snapshotDir + " to=" + initialOutputSnapshotDir, e);
throw new ExportSnapshotException("Failed to copy the snapshot directory:
from=" +
snapshotDir + " to=" + initialOutputSnapshotDir, e);
} finally {
if (filesUser != null || filesGroup != null) {
LOG.warn((filesUser == null ? "" : "Change the owner of " +
needSetOwnerDir + " to "
+ filesUser)
+ (filesGroup == null ? "" : ", Change the group of " +
needSetOwnerDir + " to "
+ filesGroup));
setOwner(outputFs, needSetOwnerDir, filesUser, filesGroup, true);
}
{code}
"return val = " was not seen in rerun of the test.
This is what the additional log revealed:
{code:java}
2018-06-01 09:22:54,247|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|2018-06-01 09:22:54,241 WARN
[main] snapshot.ExportSnapshot: Failed to copy the snapshot directory:
from=hdfs://ns1/apps/hbase/data/.hbase-snapshot/snapshot_table_157842
to=hdfs://ns3/apps/hbase/data/.hbase-snapshot/.tmp/snapshot_table_157842
2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|org.apache.hadoop.security.AccessControlException:
Permission denied: user=hbase, access=WRITE,
inode="/apps/hbase/data/.hbase-snapshot/.tmp":hrt_qa:hadoop:dr----x-wT
2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
check(FSPermissionChecker.java:399)
2018-06-01 09:22:54,249|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
checkPermission(FSPermissionChecker.java:255)
{code}
It turned out that the exception from {{setOwner}} call in the finally block
eclipsed the real exception.
was:
I was debugging the following error [~romil.choksi] saw during testing
ExportSnapshot :
{code}
2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|2018-06-01 02:40:52,358 ERROR
[main] util.AbstractHBaseTool: Error running command-line tool
2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|java.io.FileNotFoundException:
Directory/File does not exist /apps/
hbase/data/.hbase-snapshot/.tmp/snapshot_table_334546
2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.
checkOwner(FSDirectory.java:1777)
2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 -
run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.
setOwner(FSDirAttrOp.java:82)
{code}
Here is corresponding code (with extra log added):
{code}
try {
LOG.info("Copy Snapshot Manifest from " + snapshotDir + " to " +
initialOutputSnapshotDir);
boolean ret = FileUtil.copy(inputFs, snapshotDir, outputFs,
initialOutputSnapshotDir, false,
false, conf);
LOG.info("return val = " + ret);
} catch (IOException e) {
LOG.warn("Failed to copy the snapshot directory: from=" +
snapshotDir + " to=" + initialOutputSnapshotDir, e);
throw new ExportSnapshotException("Failed to copy the snapshot directory:
from=" +
snapshotDir + " to=" + initialOutputSnapshotDir, e);
} finally {
if (filesUser != null || filesGroup != null) {
LOG.warn((filesUser == null ? "" : "Change the owner of " +
needSetOwnerDir + " to "
+ filesUser)
+ (filesGroup == null ? "" : ", Change the group of " +
needSetOwnerDir + " to "
+ filesGroup));
setOwner(outputFs, needSetOwnerDir, filesUser, filesGroup, true);
}
{code}
"return val = " was not seen in rerun of the test.
This is what the additional log revealed:
{code}
2018-06-01 09:22:54,247|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|2018-06-01 09:22:54,241 WARN
[main] snapshot.ExportSnapshot: Failed to copy the snapshot directory:
from=hdfs://ns1/apps/hbase/data/.hbase-snapshot/snapshot_table_157842
to=hdfs://ns3/apps/hbase/data/.hbase-snapshot/.tmp/snapshot_table_157842
2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|org.apache.hadoop.security.AccessControlException:
Permission denied: user=hbase, access=WRITE,
inode="/apps/hbase/data/.hbase-snapshot/.tmp":hrt_qa:hadoop:dr----x-wT
2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
check(FSPermissionChecker.java:399)
2018-06-01 09:22:54,249|INFO|MainThread|machine.py:167 -
run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
checkPermission(FSPermissionChecker.java:255)
{code}
It turned out that the exception from {{setOwner}} call in the finally block
eclipsed the real exception.
> Avoid permission change if ExportSnapshot's copy fails
> ------------------------------------------------------
>
> Key: HBASE-20668
> URL: https://issues.apache.org/jira/browse/HBASE-20668
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Priority: Major
> Attachments: 20668.v1.txt, 20668.v2.txt
>
>
> The exception from FileSystem operation in the finally block of
> ExportSnapshot#doWork may hide the real exception from FileUtil.copy call.
> I was debugging the following error [~romil.choksi] saw during testing
> ExportSnapshot :
> {code:java}
> 2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 -
> run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|2018-06-01 02:40:52,358
> ERROR [main] util.AbstractHBaseTool: Error running command-line tool
> 2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 -
> run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|java.io.FileNotFoundException:
> Directory/File does not exist /apps/
> hbase/data/.hbase-snapshot/.tmp/snapshot_table_334546
> 2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 -
> run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.
> checkOwner(FSDirectory.java:1777)
> 2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 -
> run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.
> setOwner(FSDirAttrOp.java:82)
> {code}
> Here is corresponding code (with extra log added):
> {code:java}
> try {
> LOG.info("Copy Snapshot Manifest from " + snapshotDir + " to " +
> initialOutputSnapshotDir);
> boolean ret = FileUtil.copy(inputFs, snapshotDir, outputFs,
> initialOutputSnapshotDir, false,
> false, conf);
> LOG.info("return val = " + ret);
> } catch (IOException e) {
> LOG.warn("Failed to copy the snapshot directory: from=" +
> snapshotDir + " to=" + initialOutputSnapshotDir, e);
> throw new ExportSnapshotException("Failed to copy the snapshot
> directory: from=" +
> snapshotDir + " to=" + initialOutputSnapshotDir, e);
> } finally {
> if (filesUser != null || filesGroup != null) {
> LOG.warn((filesUser == null ? "" : "Change the owner of " +
> needSetOwnerDir + " to "
> + filesUser)
> + (filesGroup == null ? "" : ", Change the group of " +
> needSetOwnerDir + " to "
> + filesGroup));
> setOwner(outputFs, needSetOwnerDir, filesUser, filesGroup, true);
> }
> {code}
> "return val = " was not seen in rerun of the test.
> This is what the additional log revealed:
> {code:java}
> 2018-06-01 09:22:54,247|INFO|MainThread|machine.py:167 -
> run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|2018-06-01 09:22:54,241 WARN
> [main] snapshot.ExportSnapshot: Failed to copy the snapshot directory:
> from=hdfs://ns1/apps/hbase/data/.hbase-snapshot/snapshot_table_157842
> to=hdfs://ns3/apps/hbase/data/.hbase-snapshot/.tmp/snapshot_table_157842
> 2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 -
> run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|org.apache.hadoop.security.AccessControlException:
> Permission denied: user=hbase, access=WRITE,
> inode="/apps/hbase/data/.hbase-snapshot/.tmp":hrt_qa:hadoop:dr----x-wT
> 2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 -
> run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> check(FSPermissionChecker.java:399)
> 2018-06-01 09:22:54,249|INFO|MainThread|machine.py:167 -
> run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
> checkPermission(FSPermissionChecker.java:255)
> {code}
> It turned out that the exception from {{setOwner}} call in the finally block
> eclipsed the real exception.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)