[ https://issues.apache.org/jira/browse/HDFS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo Nicholas Sze resolved HDFS-8341. --------------------------------------- Resolution: Cannot Reproduce > HDFS mover stuck in loop trying to move corrupt block with no other valid > replicas, doesn't move rest of other data blocks > -------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-8341 > URL: https://issues.apache.org/jira/browse/HDFS-8341 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Affects Versions: 2.6.0 > Environment: HDP 2.2 > Reporter: Hari Sekhon > Priority: Minor > > HDFS mover gets stuck looping on a block that fails to move and doesn't > migrate the rest of the blocks. > This is preventing recovery of data from a decomissioning external storage > tier used for archive (we've had problems with that proprietary "hyperscale" > storage product which is why a couple blocks here and there have checksum > problems or premature eof as shown below), but this should not prevent moving > all the other blocks to recover our data: > {code}hdfs mover -p /apps/hive/warehouse/<custom_scrubbed> > 15/05/07 14:52:50 INFO mover.Mover: namenodes = > {hdfs://nameservice1=[/apps/hive/warehouse/<custom_scrubbed>]} > 15/05/07 14:52:51 INFO balancer.KeyManager: Block token params received from > NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec > 15/05/07 14:52:51 INFO block.BlockTokenSecretManager: Setting block keys > 15/05/07 14:52:51 INFO balancer.KeyManager: Update block keys every 2hrs, > 30mins, 0sec > 15/05/07 14:52:52 INFO block.BlockTokenSecretManager: Setting block keys > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:52:52 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:52:52 WARN balancer.Dispatcher: Failed to move > blk_1075156654_1438349 with size=134217728 from <ip>:1019:ARCHIVE to > <ip>:1019:DISK through <ip>:1019: block move is failed: opReplaceBlock > BP-120244285-<ip>-1417023863606:blk_1075156654_1438349 received exception > java.io.EOFException: Premature EOF: no length prefix available > <NOW IT STARTS LOOPING ON SAME BLOCK> > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:53:31 INFO net.NetworkTopology: Adding a new node: > /default-rack/<ip>:1019 > 15/05/07 14:53:31 WARN balancer.Dispatcher: Failed to move > blk_1075156654_1438349 with size=134217728 from <ip>:1019:ARCHIVE to > <ip>:1019:DISK through <ip>:1019: block move is failed: opReplaceBlock > BP-120244285-<ip>-1417023863606:blk_1075156654_1438349 received exception > java.io.EOFException: Premature EOF: no length prefix available > ...<repeat indefinitely>... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)