[
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069874#comment-16069874
]
fanqiushi commented on HDFS-7915:
---------------------------------
Hi Colin P. McCabe , I find a similar error on hadoop 2.5.2, i wonder if it is
the same with this jira:
when I use hbase on hadoop, sometimes the read and write on hdfs become very
slow, and some errors are found on both hbase log and datanode log.
hbase log:
2017-06-28 04:59:16,206 WARN org.apache.hadoop.hdfs.BlockReaderFactory:
BlockReaderFactory(fileName=/hyperbase1/data/default/DW_TB_PROTOCOL_201726/693313b31f0db7ffaa7b818a82f75f05/d/baa8cfbf62ac49578829a90c1f8b9603,
block=BP-673187784-166.0.8.10-1489391820899:blk_1265649740_191912424): I/O
error requesting file descriptors. Disabling domain socket
DomainSocket(fd=3830,path=/var/run/hdfs1/dn_socket)
java.net.SocketTimeoutException: read(2) error: Resource temporarily unavailable
at org.apache.hadoop.net.unix.DomainSocket.readArray0(Native Method)
at
org.apache.hadoop.net.unix.DomainSocket.access$000(DomainSocket.java:45)
at
org.apache.hadoop.net.unix.DomainSocket$DomainInputStream.read(DomainSocket.java:532)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1998)
at
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.requestNewShm(DfsClientShmManager.java:169)
at
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:261)
at
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:432)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1015)
at
org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:450)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at
org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:396)
at
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:304)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:609)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:832)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:881)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:563)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1215)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1430)
at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1312)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:387)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:642)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.next(HFileReaderV2.java:1102)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:273)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:257)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:697)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
at
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:223)
at
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:76)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:109)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1131)
at
org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1528)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:494)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:59:16,207 WARN
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache:
ShortCircuitCache(0x20a5dc77): failed to load
1265649740_BP-673187784-166.0.8.10-1489391820899
2017-06-28 04:59:16,208 DEBUG
org.apache.hadoop.hbase.regionserver.compactions.Compactor: Compaction
progress: 129941931/6104366 (2128.67%), rate=2293.77 kB/sec
2017-06-28 04:59:22,875 INFO org.apache.hadoop.hbase.util.JvmPauseMonitor:
Detected pause in JVM or host machine (eg GC): pause of approximately 1680ms
GC pool 'ParNew' had collection(s): count=1 time=1846ms
2017-06-28 04:59:31,969 ERROR
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache:
ShortCircuitCache(0x20a5dc77): failed to release short-circuit shared memory
slot Slot(slotIdx=56, shm=DfsClientShm(36b4864aeb6f612be0f3420acc48c2af)) by
sending ReleaseShortCircuitAccessRequestProto to /var/run/hdfs1/dn_socket.
Closing shared memory segment.
java.io.IOException: ERROR_INVALID: there is no shared memory segment
registered with shmId 36b4864aeb6f612be0f3420acc48c2af
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:59:31,969 WARN
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager:
EndpointShmManager(166.0.8.33:50010, parent=ShortCircuitShmManager(0993730b)):
error shutting down shm: got IOException calling shutdown(SHUT_RDWR)
java.nio.channels.ClosedChannelException
at
org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
at
org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
at
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:59:32,368 ERROR
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache:
ShortCircuitCache(0x20a5dc77): failed to release short-circuit shared memory
slot Slot(slotIdx=27, shm=DfsClientShm(c59bc3c02472ae764b297761a88984c7)) by
sending ReleaseShortCircuitAccessRequestProto to /var/run/hdfs1/dn_socket.
Closing shared memory segment.
java.io.IOException: ERROR_INVALID: there is no shared memory segment
registered with shmId c59bc3c02472ae764b297761a88984c7
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:208)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:59:32,368 WARN
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager:
EndpointShmManager(166.0.8.33:50010, parent=ShortCircuitShmManager(0993730b)):
error shutting down shm: got IOException calling shutdown(SHUT_RDWR)
java.nio.channels.ClosedChannelException
at
org.apache.hadoop.util.CloseableReferenceCount.reference(CloseableReferenceCount.java:57)
at
org.apache.hadoop.net.unix.DomainSocket.shutdown(DomainSocket.java:387)
at
org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.shutdown(DfsClientShmManager.java:378)
at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache$SlotReleaser.run(ShortCircuitCache.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:59:36,948 INFO org.mortbay.log:
org.mortbay.io.nio.SelectorManager$SelectSet@21941784 JVM BUG(s) - injecting
delay2 times
2017-06-28 04:59:36,948 INFO org.mortbay.log:
org.mortbay.io.nio.SelectorManager$SelectSet@21941784 JVM BUG(s) - recreating
selector 2 times, canceled keys 52 times
datanode log:
2017-06-28 04:48:27,157 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception for BP-673187784-166.0.8.10-1489391820899:blk_1266878404_193141813
java.net.SocketTimeoutException: 120000 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/166.0.8.33:50010 remote=/166.0.8.33:52712]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:453)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:734)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:741)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:234)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:48:27,158 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder:
BP-673187784-166.0.8.10-1489391820899:blk_1266878404_193141813,
type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2000)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1115)
at java.lang.Thread.run(Thread.java:745)
2017-06-28 04:48:27,208 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception for BP-673187784-166.0.8.10-1489391820899:blk_1266878380_193141789
I have two questions:
(1) How did this error happen?
(2) Can patch on this jira solve this problem?
looking forward to your reply
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
> the DFSClient about it because of a network error
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7915
> URL: https://issues.apache.org/jira/browse/HDFS-7915
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Colin P. McCabe
> Assignee: Colin P. McCabe
> Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1, 3.0.0-alpha1
>
> Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch,
> HDFS-7915.004.patch, HDFS-7915.005.patch, HDFS-7915.006.patch,
> HDFS-7915.branch-2.6.patch
>
>
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
> the DFSClient about it because of a network error. In
> {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first
> part (mark the slot as used) and fail at the second part (tell the DFSClient
> what it did). The "try" block for unregistering the slot only covers a
> failure in the first part, not the second part. In this way, a divergence can
> form between the views of which slots are allocated on DFSClient and on
> server.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]