[
https://issues.apache.org/jira/browse/ASTERIXDB-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219636#comment-16219636
]
Chen Li commented on ASTERIXDB-2145:
------------------------------------
Agreed; we should recover the data sets one by one.
> Recovery process fails on 100 datasets
> --------------------------------------
>
> Key: ASTERIXDB-2145
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2145
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Ian Maxon
>
> On the Cloudberry DB, currently, there are 112 datasets on a dataverse. When
> restarting that instance, the NC showed the following error and stopped.
> java.lang.IllegalStateException: Failed to redo
> at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:712)
> at
> org.apache.asterix.app.nc.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:378)
> at
> org.apache.asterix.app.nc.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:187)
> at
> org.apache.asterix.app.nc.RecoveryManager.startLocalRecovery(RecoveryManager.java:179)
> at
> org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:43)
> at
> org.apache.asterix.app.replication.message.StartupTaskResponseMessage.handle(StartupTaskResponseMessage.java:56)
> at
> org.apache.asterix.messaging.NCMessageBroker.receivedMessage(NCMessageBroker.java:92)
> at
> org.apache.hyracks.control.nc.work.ApplicationMessageWork.run(ApplicationMessageWork.java:51)
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException:
> Cannot allocate dataset 191 memory since memory budget would be
> exceeded.
> at
> org.apache.asterix.common.context.DatasetLifecycleManager.allocateMemory(DatasetLifecycleManager.java:568)
> at
> org.apache.hyracks.storage.common.buffercache.ResourceHeapBufferAllocator.reserveAllocation(ResourceHeapBufferAllocator.java:53)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.VirtualBufferCache.open(VirtualBufferCache.java:307)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.MultitenantVirtualBufferCache.open(MultitenantVirtualBufferCache.java:119)
> at
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.allocateMemoryComponent(LSMBTree.java:611)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.allocateMemoryComponents(AbstractLSMIndex.java:389)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:421)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:368)
> at
> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceUpsert(LSMTreeIndexAccessor.java:181)
> at org.apache.asterix.app.nc.RecoveryManager.redo(RecoveryManager.java:707)
> ... 8 more
> So, I increased the storage.memorycomponent.globalbudget parameter from 3GB
> to 5GB. Still, the NC showed the following error and the recovery process
> could not finish.
> ... similar log records ...
> Oct 25, 2017 9:33:44 AM
> org.apache.asterix.transaction.management.resource.PersistentLocalResourceRepository
> loadDataverse
> INFO: Loading dataverse:berry
> Oct 25, 2017 9:33:44 AM
> org.apache.asterix.transaction.management.resource.PersistentLocalResourceRepository
> loadIndex
> INFO: Loading index:meta_idx_meta
> Oct 25, 2017 9:33:44 AM
> org.apache.asterix.transaction.management.resource.PersistentLocalResourceRepository
> loadIndex
> INFO: Resource loaded 161:storage/partition_1/berry/meta_idx_meta
> Oct 25, 2017 9:34:09 AM org.apache.hyracks.util.ExitUtil$ExitThread run
> INFO: JVM exiting with status 2; bye!
> So, I checked the parameter information page and found that the default
> parameter for storage.memorycomponent.numpages is 1/16 of the global
> component budget. Therefore, I decreased this parameter to increase the
> number of datasets in memory. And the instance was finally able to start. So,
> it seems that the recovery process tries to load and keep all datasets into
> memory and this needs to be checked.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)