[
https://issues.apache.org/jira/browse/ASTERIXDB-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227418#comment-16227418
]
ASF subversion and git services commented on ASTERIXDB-2081:
------------------------------------------------------------
Commit 7ea84894b055289d46a3c4761748411574906f25 in asterixdb's branch
refs/heads/master from [~mhubail]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=7ea8489 ]
[ASTERIXDB-2081][STO] Introduce DatasetMemoryManager
- user model changes: no
- storage format changes: no
- interface changes: yes
Added IDatasetMemoryManager to manage datasets memory
reservation and allocation.
Details:
- Reserve metadata datasets memory to allow them to be opened
when needed.
- Add UngracefulShutdownNCApplication to force recovery
to run on AsterixHyracksIntegrationUtil.
- Refactor the use of firstAvilableUserDatasetID to check
for metadata datasets.
- Add ThreadSafe annotation.
- Add test case for RecoveryManager after creating multiple
datasets.
Change-Id: Ica76b3c8eca6f7d2ad1d962fb5ef84267c258571
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2112
Sonar-Qube: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
Contrib: Jenkins <[email protected]>
Reviewed-by: Michael Blow <[email protected]>
Integration-Tests: Jenkins <[email protected]>
> Failed to restart after hit an OOM issue
> ----------------------------------------
>
> Key: ASTERIXDB-2081
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2081
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: STO - Storage
> Environment: master
> Reporter: Jianfeng Jia
> Assignee: Murtadha Hubail
>
> One of the node was failed due to the OOM error. Then when we try to restart
> the service, the node couldn't be recovered and the logs is shown as below:
> {code}
> WARNING: Error in application message delivery!
> java.lang.IllegalStateException: Failed to redo
> at
> org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
> at
> org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
> at
> org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
> at
> org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
> at
> org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
> at
> org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
> at
> org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
> at
> org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot
> allocate dataset 245 memory since memory budget would be exceeded.
> at
> org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
> at
> org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
> at
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
> at
> org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
> ... 8 more
> Sep 05, 2017 3:37:46 PM
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run
> WARNING: Exception while executing ApplicationMessage: nodeID: 4
> java.lang.RuntimeException: java.lang.IllegalStateException: Failed to redo
> at
> org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:60)
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: java.lang.IllegalStateException: Failed to redo
> at
> org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
> at
> org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
> at
> org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
> at
> org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
> at
> org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
> at
> org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:53)
> at
> org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
> at
> org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:54)
> ... 1 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Cannot
> allocate dataset 245 memory since memory budget would be exceeded.
> at
> org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:566)
> at
> org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
> at
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:602)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:386)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:417)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:364)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
> at
> org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
> ... 8 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)