2MB is too small for memory components (it will result in many merges if you 
aren't using no merge policy). You need to set your checkpoint threshold 
(txn.log.checkpoint.lsnthreshold) to a small size as well if you haven't. It 
has a default value of 64MB. In your case, this will result in a large number 
of log records being unnecessarily spilled during recovery analysis phase, 
which will also slow down the REDO phase.

I will also investigate the memory swapping to make sure that there isn’t 
unexpected behavior.

-Murtadha

> On May 20, 2016, at 6:02 PM, Jianfeng Jia <jianfeng....@gmail.com> wrote:
> 
> The memory component size is very small: 16 pages. 16* 128K = 2M
> The checkpoint poll frequency is 120s
> 
> Are these files you are talking about? 
> jianfeng~/.asterix$ l nc1/txnLogs/recovery_temp/33/
> total 57M
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49447149112
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49502047857
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49556805925
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49611565445
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49666469761
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49721493802
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49776490438
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49831651500
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49886809324
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49941694010
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_49996847434
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_50051741552
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_50106633253
> -rw-r--r-- 1 root 4.1M May 20 16:38 33_50161512493
> -rw-r--r-- 1 root 721K May 20 16:38 33_50171087795
> 
> The memory usage is between 800M to 1.3g while the JVM is configured 3G. 
> 
> I will try the new master later.
> My instance is based on 
> commit 1defc92ae43cb36e1c06eb884c5359d5c1d6b0e5
> Date:   Thu May 12 18:31:47 2016 -0700
> 
> 
>> On May 20, 2016, at 5:22 PM, Murtadha Hubail <hubail...@gmail.com 
>> <mailto:hubail...@gmail.com>> wrote:
>> 
>> How large is the memory component size and the checkpoint interval? If you 
>> have a large data size to be recovered, there will be many small recovery 
>> partitions files written to disk under the logs directory. You might want to 
>> check how many files/directories are there. Also, if the JVM memory is 
>> almost fully consumed, every time before a recovery partition is loaded to 
>> memory, any other data partition already in memory will be removed and this 
>> will cause it to be reloaded again when it is needed. This might cause this 
>> busy behavior.
>> 
>> Also, if you are on the current master, I merged a patch earlier this week 
>> to make the spilled partition size configurable. You might want to increase 
>> it and see if the recovery time improves.
>> 
>> Cheers,
>> Murtadha
>> 
>>> On May 20, 2016, at 5:00 PM, Jianfeng Jia <jianfeng....@gmail.com 
>>> <mailto:jianfeng....@gmail.com>> wrote:
>>> 
>>> I am able to reproduce the issue. The thread dump is pasted here: 
>>> http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9> 
>>> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9>> 
>>> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9> 
>>> <http://pastebin.com/KKvLVVg9 <http://pastebin.com/KKvLVVg9>>>
>>> It looks pretty busy with loadPartitionToMemory.
>>> 
>>> "main" #1 prio=5 os_prio=0 tid=0x00007f33f400d800 nid=0x14 runnable 
>>> [0x00007f33fc27a000]
>>> java.lang.Thread.State: RUNNABLE
>>>  at java.io.FileInputStream.read(Native Method)
>>>  at 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.loadPartitionToMemory(RecoveryManager.java:1033)
>>>  at 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.serachPartition(RecoveryManager.java:983)
>>>  at 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager$JobEntityCommits.containsEntityCommitForTxnId(RecoveryManager.java:947)
>>>  at 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.startRecoveryRedoPhase(RecoveryManager.java:334)
>>>  - locked <0x00000006c03dc1c0> (a 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
>>>  at 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.replayPartitionsLogs(RecoveryManager.java:203)
>>>  - locked <0x00000006c03dc1c0> (a 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager)
>>>  at 
>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.startRecovery(RecoveryManager.java:195)
>>>  at 
>>> org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.start(NCApplicationEntryPoint.java:142)
>>>  at 
>>> org.apache.hyracks.control.nc.NodeControllerService.startApplication(NodeControllerService.java:324)
>>>  at 
>>> org.apache.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:259)
>>>  at org.apache.hyracks.control.nc.NCDriver.main(NCDriver.java:48)
>>> 
>>> 
>>> 
>>>> On May 17, 2016, at 12:37 AM, Jianfeng Jia <jianfeng....@gmail.com 
>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>> <mailto:jianfeng....@gmail.com>>> wrote:
>>>> 
>>>> I’m trying to reproduce the issue and will dump it then. 
>>>> 
>>>>> On May 16, 2016, at 10:57 PM, Michael Blow <mblow.apa...@gmail.com 
>>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com 
>>>>> <mailto:mblow.apa...@gmail.com>> <mailto:mblow.apa...@gmail.com 
>>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com 
>>>>> <mailto:mblow.apa...@gmail.com>>>> wrote:
>>>>> 
>>>>> It would good to get thread dumps if this happens again.
>>>>> On Mon, May 16, 2016 at 10:56 PM Jianfeng Jia <jianfeng....@gmail.com 
>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>>> <mailto:jianfeng....@gmail.com>> <mailto:jianfeng....@gmail.com 
>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>>> <mailto:jianfeng....@gmail.com>>>>
>>>>> wrote:
>>>>> 
>>>>>> I revisited the logs, and luckily it hasn’t been cleared. Here is part of
>>>>>> the nc1’s log:
>>>>>> 
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
>>>>>> INFO: Opening file: 14 in cache:
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
>>>>>> INFO: Opening file: 13 in cache:
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache createFile
>>>>>> INFO: Creating file:
>>>>>> /nc1/iodevice1/storage/partition_0/hackathon/log_device_idx_log_device/2016-05-15-12-56-48-712_2016-05-15-12-23-31-225_f
>>>>>> in cache: 
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> May 15, 2016 1:04:10 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache openFile
>>>>>> INFO: Opening file: 15 in cache:
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> ——————————————————————————————————————
>>>>>> /// I shut down the cluster from here and start the server right away.
>>>>>> ——————————————————————————————————————
>>>>>> May 15, 2016 1:43:12 PM
>>>>>> org.apache.asterix.transaction.management.service.recovery.RecoveryManager
>>>>>> startRecoveryRedoPhase
>>>>>> INFO: Logs REDO phase completed. Redo logs count: 1197
>>>>>> May 15, 2016 1:43:12 PM 
>>>>>> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness
>>>>>> flush
>>>>>> INFO: Started a flush operation for index: LSMBTree
>>>>>> [/nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/] ...
>>>>>> May 15, 2016 1:43:12 PM
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache createFile
>>>>>> INFO: Creating file:
>>>>>> /nc1/iodevice1/storage/partition_0/Metadata/Dataset_idx_Dataset/2016-05-15-13-43-12-680_2016-05-15-13-43-12-680_f
>>>>>> in cache: 
>>>>>> org.apache.hyracks.storage.common.buffercache.BufferCache@2a7f1f10
>>>>>> 
>>>>>> No logs generated in that 43mins. During that time one CPU was exhaustive
>>>>>> and I remember no file was touched or generated in the asterix foler. 
>>>>>> Then
>>>>>> it may not be the problem of the buffercache in the recovery phase?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On May 16, 2016, at 9:28 PM, Mike Carey <dtab...@gmail.com 
>>>>>>> <mailto:dtab...@gmail.com> <mailto:dtab...@gmail.com 
>>>>>>> <mailto:dtab...@gmail.com>> <mailto:dtab...@gmail.com 
>>>>>>> <mailto:dtab...@gmail.com> <mailto:dtab...@gmail.com 
>>>>>>> <mailto:dtab...@gmail.com>>>> wrote:
>>>>>>> 
>>>>>>> Agreed and agreed.  But is the spinning on recovery?
>>>>>>> 
>>>>>>> (What's the role of the buffer cache during recovery?)
>>>>>>> 
>>>>>>> 
>>>>>>> On 5/17/16 2:10 AM, Jianfeng Jia wrote:
>>>>>>>> I think the BuffeCache is the core issue, the recovery process may just
>>>>>> run into the same spin trap where it was stopped.
>>>>>>>> And I create another issue that we should be able to Abort the task so
>>>>>> that we don’t need to restart the server.
>>>>>>>> 
>>>>>>>>> On May 16, 2016, at 7:24 AM, Michael Blow <mblow.apa...@gmail.com 
>>>>>>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com 
>>>>>>>>> <mailto:mblow.apa...@gmail.com>> <mailto:mblow.apa...@gmail.com 
>>>>>>>>> <mailto:mblow.apa...@gmail.com> <mailto:mblow.apa...@gmail.com 
>>>>>>>>> <mailto:mblow.apa...@gmail.com>>>>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> This might be related: (ASTERIXDB-1438) BufferCache spins indefinitely
>>>>>> when
>>>>>>>>> cache is exceeded.
>>>>>>>>> 
>>>>>>>>> https://issues.apache.org/jira/browse/ASTERIXDB-1438 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438> 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438>> 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438> 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438 
>>>>>>>>> <https://issues.apache.org/jira/browse/ASTERIXDB-1438>>>
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> -MDB
>>>>>>>>> 
>>>>>>>>> On Mon, May 16, 2016 at 1:52 AM Mike Carey <dtab...@gmail.com 
>>>>>>>>> <mailto:dtab...@gmail.com> <mailto:dtab...@gmail.com 
>>>>>>>>> <mailto:dtab...@gmail.com>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Glad it worked out - can someone also capture the core issue in
>>>>>> JIRA?  Thx!
>>>>>>>>>> On May 15, 2016 11:40 PM, "Jianfeng Jia" <jianfeng....@gmail.com 
>>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>>>>>>>> <mailto:jianfeng....@gmail.com>> <mailto:jianfeng....@gmail.com 
>>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>>>>>>>> <mailto:jianfeng....@gmail.com>>>>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Great! The server is back now. Thanks a lot!
>>>>>>>>>>>> On May 15, 2016, at 2:26 PM, Murtadha Hubail <hubail...@gmail.com 
>>>>>>>>>>>> <mailto:hubail...@gmail.com> <mailto:hubail...@gmail.com 
>>>>>>>>>>>> <mailto:hubail...@gmail.com>> <mailto:hubail...@gmail.com 
>>>>>>>>>>>> <mailto:hubail...@gmail.com> <mailto:hubail...@gmail.com 
>>>>>>>>>>>> <mailto:hubail...@gmail.com>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> You can delete the existing log files and create new empty ones 
>>>>>>>>>>>> with
>>>>>>>>>>> incremented  log file number, but it is very important that you 
>>>>>>>>>>> don't
>>>>>>>>>>> delete the checkpoint file.
>>>>>>>>>>>> Of course any data in the old log files will be lost, but the data
>>>>>>>>>>> already on disk will be available.
>>>>>>>>>>>>> On May 15, 2016, at 1:23 PM, Jianfeng Jia <jianfeng....@gmail.com 
>>>>>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>>>>>>>>>>> <mailto:jianfeng....@gmail.com>> <mailto:jianfeng....@gmail.com 
>>>>>>>>>>>>> <mailto:jianfeng....@gmail.com> <mailto:jianfeng....@gmail.com 
>>>>>>>>>>>>> <mailto:jianfeng....@gmail.com>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> We submitted a long running join+insert query and stop the cluster
>>>>>> to
>>>>>>>>>>> stop running it. However, when it restarted it ran the recovery
>>>>>> forever,
>>>>>>>>>>>>> the logs shows that it is creating a lot of buffer cache.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In order to bring the cluster back to answer the query, is there
>>>>>> any
>>>>>>>>>>> hacking solutions? such as remove the recovery txnlogs? I’m worried
>>>>>> that
>>>>>>>>>> it
>>>>>>>>>>> will ruin the cluster somehow.
>>>>>>>>>>>>> We are in a contest so any early helps are really appreciated!
>>>>>> Thanks!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jianfeng Jia
>>>>>>>>>>>>> PhD Candidate of Computer Science
>>>>>>>>>>>>> University of California, Irvine
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Jianfeng Jia
>>>>>>>>>>> PhD Candidate of Computer Science
>>>>>>>>>>> University of California, Irvine
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Jianfeng Jia
>>>>>>>> PhD Candidate of Computer Science
>>>>>>>> University of California, Irvine
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Jianfeng Jia
>>>>>> PhD Candidate of Computer Science
>>>>>> University of California, Irvine
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Jianfeng Jia
>>>> PhD Candidate of Computer Science
>>>> University of California, Irvine
>>>> 
>>> 
>>> 
>>> 
>>> Best,
>>> 
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>> 
> 
> 
> 
> Best,
> 
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine

Reply via email to