2MB is too small for memory components (it will result in many merges if you aren't using no merge policy). You need to set your checkpoint threshold (txn.log.checkpoint.lsnthreshold) to a small size as well if you haven't. It has a default value of 64MB. In your case, this will result in a large number of log records being unnecessarily spilled during recovery analysis phase, which will also slow down the REDO phase.
I will also investigate the memory swapping to make sure that there isn’t unexpected behavior. -Murtadha > On May 20, 2016, at 6:02 PM, Jianfeng Jia <jianfeng....@gmail.com> wrote: > > The memory component size is very small: 16 pages. 16* 128K = 2M > The checkpoint poll frequency is 120s > > Are these files you are talking about? > jianfeng~/.asterix$ l nc1/txnLogs/recovery_temp/33/ > total 57M > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49447149112 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49502047857 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49556805925 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49611565445 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49666469761 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49721493802 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49776490438 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49831651500 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49886809324 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49941694010 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_49996847434 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_50051741552 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_50106633253 > -rw-r--r-- 1 root 4.1M May 20 16:38 33_50161512493 > -rw-r--r-- 1 root 721K May 20 16:38 33_50171087795 > > The memory usage is between 800M to 1.3g while the JVM is configured 3G. > > I will try the new master later. > My instance is based on > commit 1defc92ae43cb36e1c06eb884c5359d5c1d6b0e5 > Date: Thu May 12 18:31:47 2016 -0700 > > >> On May 20, 2016, at 5:22 PM, Murtadha Hubail <hubail...@gmail.com >> <mailto:hubail...@gmail.com>> wrote: >> >> How large is the memory component size and the checkpoint interval? If you >> have a large data size to be recovered, there will be many small recovery >> partitions files written to disk under the logs directory. You might want to >> check how many files/directories are there. Also, if the JVM memory is >> almost fully consumed, every time before a recovery partition is loaded to >> memory, any other data partition already in memory will be removed and this >> will cause it to be reloaded again when it is needed. This might cause this >> busy behavior. >> >> Also, if you are on the current master, I merged a patch earlier this week >> to make the spilled partition size configurable. You might want to increase >> it and see if the recovery time improves. >> >> Cheers, >> Murtadha >> >>> On May 20, 2016, at 5:00 PM, Jianfeng Jia <jianfeng....@gmail.com >>> <mailto:jianfeng....@gmail.com>> wrote: >>> >>> I am able to reproduce the issue. The thread dump is pasted here: >>> http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9> >>> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9>> >>> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9> >>> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9>>> >>> It looks pretty busy with loadPartitionToMemory. >>> >>> "main" #1 prio=5 os_prio=0 tid=0x00007f33f400d800 nid=0x14 runnable >>> [0x00007f33fc27a000] >>> java.lang.Thread.State: RUNNABLE >>> at java.io.FileInputStream.read(Native Method) >>> at >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.loadPartitionToMemory(RecoveryManager.java:1033) >>> at >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.serachPartition(RecoveryManager.java:983) >>> at >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.containsEntityCommitForTxnId(RecoveryManager.java:947) >>> at >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:334) >>> - locked <0x00000006c03dc1c0> (a >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager) >>> at >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:203) >>> - locked <0x00000006c03dc1c0> (a >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager) >>> at >>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.startRecovery(RecoveryManager.java:195) >>> at >>> org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.start(NCApplicationEntryPoint.java:142) >>> at >>> org.apache.hyracks.control.nc.NodeControllerService.startApplication(NodeControllerService.java:324) >>> at >>> org.apache.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:259) >>> at org.apache.hyracks.control.nc.NCDriver.main(NCDriver.java:48) >>> >>> >>> >>>> On May 17, 2016, at 12:37 AM, Jianfeng Jia <jianfeng....@gmail.com >>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>> <mailto:jianfeng....@gmail.com>>> wrote: >>>> >>>> I’m trying to reproduce the issue and will dump it then. >>>> >>>>> On May 16, 2016, at 10:57 PM, Michael Blow <mblow.apa...@gmail.com >>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com >>>>> <mailto:mblow.apa...@gmail.com>> <mailto:mblow.apa...@gmail.com >>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com >>>>> <mailto:mblow.apa...@gmail.com>>>> wrote: >>>>> >>>>> It would good to get thread dumps if this happens again. >>>>> On Mon, May 16, 2016 at 10:56 PM Jianfeng Jia <jianfeng....@gmail.com >>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>>> <mailto:jianfeng....@gmail.com>> <mailto:jianfeng....@gmail.com >>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>>> <mailto:jianfeng....@gmail.com>>>> >>>>> wrote: >>>>> >>>>>> I revisited the logs, and luckily it hasn’t been cleared. Here is part of >>>>>> the nc1’s log: >>>>>> >>>>>> May 15, 2016 1:04:10 PM >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile >>>>>> INFO: Opening file: 14 in cache: >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 >>>>>> May 15, 2016 1:04:10 PM >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile >>>>>> INFO: Opening file: 13 in cache: >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 >>>>>> May 15, 2016 1:04:10 PM >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache createFile >>>>>> INFO: Creating file: >>>>>> /nc1/iodevice1/storage/partition_0/hackathon/log_device_idx_log_device/2016-05-15-12-56-48-712_2016-05-15-12-23-31-225_f >>>>>> in cache: >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 >>>>>> May 15, 2016 1:04:10 PM >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile >>>>>> INFO: Opening file: 15 in cache: >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 >>>>>> —————————————————————————————————————— >>>>>> /// I shut down the cluster from here and start the server right away. >>>>>> —————————————————————————————————————— >>>>>> May 15, 2016 1:43:12 PM >>>>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager >>>>>> startRecoveryRedoPhase >>>>>> INFO: Logs REDO phase completed. Redo logs count: 1197 >>>>>> May 15, 2016 1:43:12 PM >>>>>> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness >>>>>> flush >>>>>> INFO: Started a flush operation for index: LSMBTree >>>>>> [/nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/] ... >>>>>> May 15, 2016 1:43:12 PM >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache createFile >>>>>> INFO: Creating file: >>>>>> /nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/2016-05-15-13-43-12-680_2016-05-15-13-43-12-680_f >>>>>> in cache: >>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10 >>>>>> >>>>>> No logs generated in that 43mins. During that time one CPU was exhaustive >>>>>> and I remember no file was touched or generated in the asterix foler. >>>>>> Then >>>>>> it may not be the problem of the buffercache in the recovery phase? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On May 16, 2016, at 9:28 PM, Mike Carey <dtab...@gmail.com >>>>>>> <mailto:dtab...@gmail.com> <mailto:dtab...@gmail.com >>>>>>> <mailto:dtab...@gmail.com>> <mailto:dtab...@gmail.com >>>>>>> <mailto:dtab...@gmail.com> <mailto:dtab...@gmail.com >>>>>>> <mailto:dtab...@gmail.com>>>> wrote: >>>>>>> >>>>>>> Agreed and agreed. But is the spinning on recovery? >>>>>>> >>>>>>> (What's the role of the buffer cache during recovery?) >>>>>>> >>>>>>> >>>>>>> On 5/17/16 2:10 AM, Jianfeng Jia wrote: >>>>>>>> I think the BuffeCache is the core issue, the recovery process may just >>>>>> run into the same spin trap where it was stopped. >>>>>>>> And I create another issue that we should be able to Abort the task so >>>>>> that we don’t need to restart the server. >>>>>>>> >>>>>>>>> On May 16, 2016, at 7:24 AM, Michael Blow <mblow.apa...@gmail.com >>>>>>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com >>>>>>>>> <mailto:mblow.apa...@gmail.com>> <mailto:mblow.apa...@gmail.com >>>>>>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com >>>>>>>>> <mailto:mblow.apa...@gmail.com>>>> >>>>>> wrote: >>>>>>>>> >>>>>>>>> This might be related: (ASTERIXDB-1438) BufferCache spins indefinitely >>>>>> when >>>>>>>>> cache is exceeded. >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1438 >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438> >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438 >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438>> >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438 >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438> >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438 >>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -MDB >>>>>>>>> >>>>>>>>> On Mon, May 16, 2016 at 1:52 AM Mike Carey <dtab...@gmail.com >>>>>>>>> <mailto:dtab...@gmail.com> <mailto:dtab...@gmail.com >>>>>>>>> <mailto:dtab...@gmail.com>>> wrote: >>>>>>>>> >>>>>>>>>> Glad it worked out - can someone also capture the core issue in >>>>>> JIRA? Thx! >>>>>>>>>> On May 15, 2016 11:40 PM, "Jianfeng Jia" <jianfeng....@gmail.com >>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>>>>>>>> <mailto:jianfeng....@gmail.com>> <mailto:jianfeng....@gmail.com >>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>>>>>>>> <mailto:jianfeng....@gmail.com>>>> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Great! The server is back now. Thanks a lot! >>>>>>>>>>>> On May 15, 2016, at 2:26 PM, Murtadha Hubail <hubail...@gmail.com >>>>>>>>>>>> <mailto:hubail...@gmail.com> <mailto:hubail...@gmail.com >>>>>>>>>>>> <mailto:hubail...@gmail.com>> <mailto:hubail...@gmail.com >>>>>>>>>>>> <mailto:hubail...@gmail.com> <mailto:hubail...@gmail.com >>>>>>>>>>>> <mailto:hubail...@gmail.com>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> You can delete the existing log files and create new empty ones >>>>>>>>>>>> with >>>>>>>>>>> incremented log file number, but it is very important that you >>>>>>>>>>> don't >>>>>>>>>>> delete the checkpoint file. >>>>>>>>>>>> Of course any data in the old log files will be lost, but the data >>>>>>>>>>> already on disk will be available. >>>>>>>>>>>>> On May 15, 2016, at 1:23 PM, Jianfeng Jia <jianfeng....@gmail.com >>>>>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>>>>>>>>>>> <mailto:jianfeng....@gmail.com>> <mailto:jianfeng....@gmail.com >>>>>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com >>>>>>>>>>>>> <mailto:jianfeng....@gmail.com>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> We submitted a long running join+insert query and stop the cluster >>>>>> to >>>>>>>>>>> stop running it. However, when it restarted it ran the recovery >>>>>> forever, >>>>>>>>>>>>> the logs shows that it is creating a lot of buffer cache. >>>>>>>>>>>>> >>>>>>>>>>>>> In order to bring the cluster back to answer the query, is there >>>>>> any >>>>>>>>>>> hacking solutions? such as remove the recovery txnlogs? I’m worried >>>>>> that >>>>>>>>>> it >>>>>>>>>>> will ruin the cluster somehow. >>>>>>>>>>>>> We are in a contest so any early helps are really appreciated! >>>>>> Thanks! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Jianfeng Jia >>>>>>>>>>>>> PhD Candidate of Computer Science >>>>>>>>>>>>> University of California, Irvine >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Jianfeng Jia >>>>>>>>>>> PhD Candidate of Computer Science >>>>>>>>>>> University of California, Irvine >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Jianfeng Jia >>>>>>>> PhD Candidate of Computer Science >>>>>>>> University of California, Irvine >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Best, >>>>>> >>>>>> Jianfeng Jia >>>>>> PhD Candidate of Computer Science >>>>>> University of California, Irvine >>>>>> >>>>>> >>>> >>>> >>>> >>>> Best, >>>> >>>> Jianfeng Jia >>>> PhD Candidate of Computer Science >>>> University of California, Irvine >>>> >>> >>> >>> >>> Best, >>> >>> Jianfeng Jia >>> PhD Candidate of Computer Science >>> University of California, Irvine >> > > > > Best, > > Jianfeng Jia > PhD Candidate of Computer Science > University of California, Irvine