[
https://issues.apache.org/jira/browse/HBASE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417670#comment-15417670
]
Ted Yu commented on HBASE-14450:
--------------------------------
>From the same test:
{code}
2016-08-10 15:46:24,236 WARN [Thread-2327] mapred.LocalJobRunner$Job(560):
job_local1643260168_0005
java.lang.Exception: java.io.IOException: File copy failed:
hdfs://localhost:56218/user/tyu/test-data/f650797e-abdc-4669-aac9-39b68914fcf9/WALs/10.22.16.34,56226,1470869103454/10.22.16.
34%2C56226%2C1470869103454.regiongroup-1.1470869108161 -->
hdfs://localhost:56218/backupUT/backup_1470869176664/WALs/10.22.16.34%2C56226%2C1470869103454.regiongroup-1.1470869108161
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: File copy failed:
hdfs://localhost:56218/user/tyu/test-data/f650797e-abdc-4669-aac9-39b68914fcf9/WALs/10.22.16.34,56226,1470869103454/10.22.16.34%2C56226%2C1470869103454.regiongroup-1.1470869108161
-->
hdfs://localhost:56218/backupUT/backup_1470869176664/WALs/10.22.16.34%2C56226%2C1470869103454.regiongroup-1.1470869108161
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:285)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:253)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
hdfs://localhost:56218/user/tyu/test-data/f650797e-abdc-4669-aac9-39b68914fcf9/WALs/10.22.16.34,56226,1470869103454/10.22.16.
34%2C56226%2C1470869103454.regiongroup-1.1470869108161 to
hdfs://localhost:56218/backupUT/backup_1470869176664/WALs/10.22.16.34%2C56226%2C1470869103454.regiongroup-1.1470869108161
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:282)
... 11 more
Caused by: java.io.IOException: Mismatch in length of
source:hdfs://localhost:56218/user/tyu/test-data/f650797e-abdc-4669-aac9-39b68914fcf9/WALs/10.22.16.34,56226,1470869103454/10.22.16.
34%2C56226%2C1470869103454.regiongroup-1.1470869108161 and
target:hdfs://localhost:56218/backupUT/backup_1470869176664/WALs/.distcp.tmp.attempt_local1643260168_0005_m_000000_0
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareFileLengths(RetriableFileCopyCommand.java:193)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:126)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 12 more
2016-08-10 15:46:24,391 INFO
[org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@5c64f59]
blockmanagement.BlockManager(3488): BLOCK* BlockManager: ask 127.0.0.1:56219
to delete [blk_1073741892_1068, blk_1073741893_1069]
2016-08-10 15:46:24,622 INFO [ProcedureExecutor-5]
mapreduce.MapReduceBackupCopyService$BackupDistCp(242): Progress: 10.0%
2016-08-10 15:46:24,622 DEBUG [ProcedureExecutor-5]
impl.BackupSystemTable(122): update backup status in hbase:backup for:
backup_1470869176664 set status=RUNNING
2016-08-10 15:46:24,625 DEBUG [sync.0] wal.FSHLog$SyncRunner(1275): syncing
writer
hdfs://localhost:56218/user/tyu/test-data/f650797e-abdc-4669-aac9-39b68914fcf9/WALs/10.22.16.34,56228,1470869104167/10.22.16.34%2C56228%2C1470869104167.regiongroup-1.1470869110496
2016-08-10 15:46:24,626 DEBUG [ProcedureExecutor-5]
mapreduce.MapReduceBackupCopyService(135): Backup progress data "10%" has been
updated to hbase:backup for backup_1470869176664
2016-08-10 15:46:24,626 DEBUG [ProcedureExecutor-5]
mapreduce.MapReduceBackupCopyService$BackupDistCp(250): Backup progress data
updated to hbase:backup: "Progress: 10.0% - 514 bytes copied."
2016-08-10 15:46:24,626 DEBUG [ProcedureExecutor-5]
mapreduce.MapReduceBackupCopyService$BackupDistCp(262): DistCp job-id:
job_local1643260168_0005
{code}
Note the distcp job failed at 10% progress.
There was no other distcp job launched after the failed one.
Ran the test with hadoop 2.7.3 RC and got same result.
> HBase Backup/Restore Phase 3: Multiwal support
> ----------------------------------------------
>
> Key: HBASE-14450
> URL: https://issues.apache.org/jira/browse/HBASE-14450
> Project: HBase
> Issue Type: Task
> Affects Versions: 2.0.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Labels: backup
> Fix For: 2.0.0
>
> Attachments: 14450.v2.txt,
> org.apache.hadoop.hbase.backup.TestIncrementalBackup-output.txt
>
>
> We need to support multiwal configurations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)