Re: Bootstrapping a node fails because of compactions not keeping up

Jeff Jirsa Sun, 15 Oct 2017 13:52:22 -0700

I
You’re adding the new node as rac3

The rack aware policy is going to make sure you get the rack diversity you 
asked for by making sure one replica of each partition is in rac3, which is 
going to blow up that instance




-- 
Jeff Jirsa


> On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
> 
> Hi Jeff,
> 
> this my third attempt bootstrapping the node so I tried several tricks that 
> might partially explain the output I am posting.
> 
> * To make the bootstrap incremental, I have been throttling the streams on 
> all nodes to 1Mbits. I have selectively unthrottling one node at a time 
> hoping that would unlock some routines compacting away redundant data (you'll 
> see that nodetool netstats reports back fewer nodes than nodetool status).
> * Since compactions have had the tendency of getting stuck (hundreds pending 
> but none executing) in previous bootstraps, I've tried issuing a manual 
> "nodetool compact" on the boostrapping node.
> 
> Having said that, this is the output of the commands,
> 
> Thanks a lot,
> Stefano
> 
> nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns    Host ID                      
>          Rack
> UN  X.Y.33.8   342.4 GB   256          ?       
> afaae414-30cc-439d-9785-1b7d35f74529  RAC1
> UN  X.Y.81.4   325.98 GB  256          ?       
> 00a96a5d-3bfd-497f-91f3-973b75146162  RAC2
> UN  X.Y.33.4   348.81 GB  256          ?       
> 1d8e6588-e25b-456a-8f29-0dedc35bda8e  RAC1
> UN  X.Y.33.5   384.99 GB  256          ?       
> 13d03fd2-7528-466b-b4b5-1b46508e2465  RAC1
> UN  X.Y.81.5   336.27 GB  256          ?       
> aa161400-6c0e-4bde-bcb3-b2e7e7840196  RAC2
> UN  X.Y.33.6   377.22 GB  256          ?       
> 43a393ba-6805-4e33-866f-124360174b28  RAC1
> UN  X.Y.81.6   329.61 GB  256          ?       
> 4c3c64ae-ef4f-4986-9341-573830416997  RAC2
> UN  X.Y.33.7   344.25 GB  256          ?       
> 03d81879-dc0d-4118-92e3-b3013dfde480  RAC1
> UN  X.Y.81.7   324.93 GB  256          ?       
> 24bbf4b6-9427-4ed1-a751-a55cc24cc756  RAC2
> UN  X.Y.81.1   323.8 GB   256          ?       
> 26244100-0565-4567-ae9c-0fc5346f5558  RAC2
> UJ  X.Y.177.2  724.5 GB   256          ?       
> e269a06b-c0c0-43a6-922c-f04c98898e0d  RAC3
> UN  X.Y.81.2   337.83 GB  256          ?       
> 09e29429-15ff-44d6-9742-ac95c83c4d9e  RAC2
> UN  X.Y.81.3   326.4 GB   256          ?       
> feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97  RAC2
> UN  X.Y.33.3   350.4 GB   256          ?       
> cc115991-b7e7-4d06-87b5-8ad5efd45da5  RAC1
> 
> 
> nodetool netstats -H | grep "Already received" -B 1
>     /X.Y.81.4
>         Receiving 1992 files, 103.68 GB total. Already received 515 files, 
> 23.32 GB total
> --
>     /X.Y.81.7
>         Receiving 1936 files, 89.35 GB total. Already received 554 files, 
> 23.32 GB total
> --
>     /X.Y.81.5
>         Receiving 1926 files, 95.69 GB total. Already received 545 files, 
> 23.31 GB total
> --
>     /X.Y.81.2
>         Receiving 1992 files, 100.81 GB total. Already received 537 files, 
> 23.32 GB total
> --
>     /X.Y.81.3
>         Receiving 1958 files, 104.72 GB total. Already received 503 files, 
> 23.31 GB total
> --
>     /X.Y.81.1
>         Receiving 2034 files, 104.51 GB total. Already received 520 files, 
> 23.33 GB total
> --
>     /X.Y.81.6
>         Receiving 1962 files, 96.19 GB total. Already received 547 files, 
> 23.32 GB total
> --
>     /X.Y.33.5
>         Receiving 2121 files, 97.44 GB total. Already received 601 files, 
> 23.32 GB total
> 
> nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> MutationStage                     0         0      828367015         0        
>          0
> ViewMutationStage                 0         0              0         0        
>          0
> ReadStage                         0         0              0         0        
>          0
> RequestResponseStage              0         0             13         0        
>          0
> ReadRepairStage                   0         0              0         0        
>          0
> CounterMutationStage              0         0              0         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> CompactionExecutor                1         1          12150         0        
>          0
> MemtableReclaimMemory             0         0           7368         0        
>          0
> PendingRangeCalculator            0         0             14         0        
>          0
> GossipStage                       0         0         599329         0        
>          0
> SecondaryIndexManagement          0         0              0         0        
>          0
> HintsDispatcher                   0         0              0         0        
>          0
> MigrationStage                    0         0             27         0        
>          0
> MemtablePostFlush                 0         0           8112         0        
>          0
> ValidationExecutor                0         0              0         0        
>          0
> Sampler                           0         0              0         0        
>          0
> MemtableFlushWriter               0         0           7368         0        
>          0
> InternalResponseStage             0         0             25         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> 
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> HINT                         0
> MUTATION                     1
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> 
> nodetool compactionstats -H
> pending tasks: 776
>                                      id   compaction type         keyspace    
>                table   completed     total    unit   progress
>    24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1   
> table_1     4.85 GB   7.67 GB   bytes     63.25%
> Active compaction remaining time :        n/a
> 
> 
>> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>> Can you post (anonymize as needed) nodetool status, nodetool netstats, 
>> nodetool tpstats, and nodetool compctionstats ?
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>>> 
>>> Hi Jeff,
>>> 
>>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>>> 
>>> Stefano
>>> 
>>>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>> What version?
>>>> 
>>>> Single disk or JBOD?
>>>> 
>>>> Vnodes?
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <ostef...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so 
>>>>> far. 
>>>>> Based on the source code it seems that this option doesn't affect 
>>>>> compactions while bootstrapping.
>>>>> 
>>>>> I am getting quite confused as it seems I am not able to bootstrap a node 
>>>>> if I don't have at least 6/7 times the disk space used by other nodes.
>>>>> This is weird. The host I am bootstrapping is using a SSD. Also 
>>>>> compaction throughput is unthrottled (set to 0) and the compacting 
>>>>> threads are set to 8.
>>>>> Nevertheless, primary ranges from other nodes are being streamed, but 
>>>>> data is never compacted away.
>>>>> 
>>>>> Does anybody know anything else I could try?
>>>>> 
>>>>> Cheers,
>>>>> Stefano
>>>>> 
>>>>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <ostef...@gmail.com> 
>>>>>> wrote:
>>>>>> Other little update: at the same time I see the number of pending tasks 
>>>>>> stuck (in this case at 1847); restarting the node doesn't help, so I 
>>>>>> can't really force the node to "digest" all those compactions. In the 
>>>>>> meanwhile the disk occupied is already twice the average load I have on 
>>>>>> other nodes.
>>>>>> 
>>>>>> Feeling more and more puzzled here :S
>>>>>> 
>>>>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <ostef...@gmail.com> 
>>>>>>> wrote:
>>>>>>> I have been trying to add another node to the cluster (after upgrading 
>>>>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all 
>>>>>>> nodes have been streaming to the joining node approx 1/3 of their 
>>>>>>> SSTables, basically their whole primary range (using RF=3)?
>>>>>>> 
>>>>>>> Is this expected/normal? 
>>>>>>> I was under the impression only the necessary SSTables were going to be 
>>>>>>> streamed...
>>>>>>> 
>>>>>>> Thanks for the help,
>>>>>>> Stefano
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <k...@instaclustr.com> 
>>>>>>> wrote:
>>>>>>>>> But if it also streams, it means I'd still be under-pressure if I am 
>>>>>>>>> not mistaken. I am under the assumption that the compactions are the 
>>>>>>>>> by-product of streaming too many SStables at the same time, and not 
>>>>>>>>> because of my current write load.
>>>>>>>> 
>>>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the 
>>>>>>>> performance impact from the node being backed up with compactions. If 
>>>>>>>> you haven't already, you should try disable stcs in l0 on the joining 
>>>>>>>> node. You will likely still need to do a lot of compactions, but 
>>>>>>>> generally they should be smaller. The  option is 
>>>>>>>> -Dcassandra.disable_stcs_in_l0=true
>>>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that 
>>>>>>>>> affect the disk footprint?
>>>>>>>> 
>>>>>>>> If you've been doing a lot of STCS in L0, you generally end up with 
>>>>>>>> some large SSTables. These will eventually have to be compacted with 
>>>>>>>> L1. Could also be suffering the problem of streamed SSTables causing 
>>>>>>>> large cross-level compactions in the higher levels as well.
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>

Re: Bootstrapping a node fails because of compactions not keeping up

Reply via email to