Jianfeng Jia created ASTERIXDB-2081:
---------------------------------------
Summary: Failed to restart after hit an OOM issue
Key: ASTERIXDB-2081
URL: https://issues.apache.org/jira/browse/ASTERIXDB-2081
Project: Apache AsterixDB
Issue Type: Bug
Components: STO - Storage
Environment: master
Reporter: Jianfeng Jia
One of the node was failed due to the OOM error. Then when we try to restart
the service, the node couldn't be recovered and the logs is shown as below:
{code}
WARNING: Error in application message delivery!
java.lang.IllegalStateException: Failed to redo
at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
at
org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
at
org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
at
org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
at
org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
at
org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
at
org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
at
org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
at
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot
allocate dataset 245 memory since memory budget would be exceeded.
at
org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
at
org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
at
org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
at
org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
at
org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
at
org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
... 8 more
Sep 05, 2017 3:37:46 PM
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
WARNING: Exception while executing ApplicationMessage: nodeID: 4
java.lang.RuntimeException: java.lang.IllegalStateException: Failed to redo
at
org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:60)
at
org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
Caused by: java.lang.IllegalStateException: Failed to redo
at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
at
org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
at
org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
at
org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
at
org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
at
org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
at
org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
at
org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
... 1 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot
allocate dataset 245 memory since memory budget would be exceeded.
at
org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
at
org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
at
org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
at
org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
at
org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
at
org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
at
org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
... 8 more
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)