I You’re adding the new node as rac3 The rack aware policy is going to make sure you get the rack diversity you asked for by making sure one replica of each partition is in rac3, which is going to blow up that instance
-- Jeff Jirsa > On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <ostef...@gmail.com> wrote: > > Hi Jeff, > > this my third attempt bootstrapping the node so I tried several tricks that > might partially explain the output I am posting. > > * To make the bootstrap incremental, I have been throttling the streams on > all nodes to 1Mbits. I have selectively unthrottling one node at a time > hoping that would unlock some routines compacting away redundant data (you'll > see that nodetool netstats reports back fewer nodes than nodetool status). > * Since compactions have had the tendency of getting stuck (hundreds pending > but none executing) in previous bootstraps, I've tried issuing a manual > "nodetool compact" on the boostrapping node. > > Having said that, this is the output of the commands, > > Thanks a lot, > Stefano > > nodetool status > Datacenter: DC1 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN X.Y.33.8 342.4 GB 256 ? > afaae414-30cc-439d-9785-1b7d35f74529 RAC1 > UN X.Y.81.4 325.98 GB 256 ? > 00a96a5d-3bfd-497f-91f3-973b75146162 RAC2 > UN X.Y.33.4 348.81 GB 256 ? > 1d8e6588-e25b-456a-8f29-0dedc35bda8e RAC1 > UN X.Y.33.5 384.99 GB 256 ? > 13d03fd2-7528-466b-b4b5-1b46508e2465 RAC1 > UN X.Y.81.5 336.27 GB 256 ? > aa161400-6c0e-4bde-bcb3-b2e7e7840196 RAC2 > UN X.Y.33.6 377.22 GB 256 ? > 43a393ba-6805-4e33-866f-124360174b28 RAC1 > UN X.Y.81.6 329.61 GB 256 ? > 4c3c64ae-ef4f-4986-9341-573830416997 RAC2 > UN X.Y.33.7 344.25 GB 256 ? > 03d81879-dc0d-4118-92e3-b3013dfde480 RAC1 > UN X.Y.81.7 324.93 GB 256 ? > 24bbf4b6-9427-4ed1-a751-a55cc24cc756 RAC2 > UN X.Y.81.1 323.8 GB 256 ? > 26244100-0565-4567-ae9c-0fc5346f5558 RAC2 > UJ X.Y.177.2 724.5 GB 256 ? > e269a06b-c0c0-43a6-922c-f04c98898e0d RAC3 > UN X.Y.81.2 337.83 GB 256 ? > 09e29429-15ff-44d6-9742-ac95c83c4d9e RAC2 > UN X.Y.81.3 326.4 GB 256 ? > feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97 RAC2 > UN X.Y.33.3 350.4 GB 256 ? > cc115991-b7e7-4d06-87b5-8ad5efd45da5 RAC1 > > > nodetool netstats -H | grep "Already received" -B 1 > /X.Y.81.4 > Receiving 1992 files, 103.68 GB total. Already received 515 files, > 23.32 GB total > -- > /X.Y.81.7 > Receiving 1936 files, 89.35 GB total. Already received 554 files, > 23.32 GB total > -- > /X.Y.81.5 > Receiving 1926 files, 95.69 GB total. Already received 545 files, > 23.31 GB total > -- > /X.Y.81.2 > Receiving 1992 files, 100.81 GB total. Already received 537 files, > 23.32 GB total > -- > /X.Y.81.3 > Receiving 1958 files, 104.72 GB total. Already received 503 files, > 23.31 GB total > -- > /X.Y.81.1 > Receiving 2034 files, 104.51 GB total. Already received 520 files, > 23.33 GB total > -- > /X.Y.81.6 > Receiving 1962 files, 96.19 GB total. Already received 547 files, > 23.32 GB total > -- > /X.Y.33.5 > Receiving 2121 files, 97.44 GB total. Already received 601 files, > 23.32 GB total > > nodetool tpstats > Pool Name Active Pending Completed Blocked All > time blocked > MutationStage 0 0 828367015 0 > 0 > ViewMutationStage 0 0 0 0 > 0 > ReadStage 0 0 0 0 > 0 > RequestResponseStage 0 0 13 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > CounterMutationStage 0 0 0 0 > 0 > MiscStage 0 0 0 0 > 0 > CompactionExecutor 1 1 12150 0 > 0 > MemtableReclaimMemory 0 0 7368 0 > 0 > PendingRangeCalculator 0 0 14 0 > 0 > GossipStage 0 0 599329 0 > 0 > SecondaryIndexManagement 0 0 0 0 > 0 > HintsDispatcher 0 0 0 0 > 0 > MigrationStage 0 0 27 0 > 0 > MemtablePostFlush 0 0 8112 0 > 0 > ValidationExecutor 0 0 0 0 > 0 > Sampler 0 0 0 0 > 0 > MemtableFlushWriter 0 0 7368 0 > 0 > InternalResponseStage 0 0 25 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > > Message type Dropped > READ 0 > RANGE_SLICE 0 > _TRACE 0 > HINT 0 > MUTATION 1 > COUNTER_MUTATION 0 > BATCH_STORE 0 > BATCH_REMOVE 0 > REQUEST_RESPONSE 0 > PAGED_RANGE 0 > READ_REPAIR 0 > > nodetool compactionstats -H > pending tasks: 776 > id compaction type keyspace > table completed total unit progress > 24d039f2-b1e6-11e7-ac57-3d25e38b2f5c Compaction keyspace_1 > table_1 4.85 GB 7.67 GB bytes 63.25% > Active compaction remaining time : n/a > > >> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jji...@gmail.com> wrote: >> Can you post (anonymize as needed) nodetool status, nodetool netstats, >> nodetool tpstats, and nodetool compctionstats ? >> >> -- >> Jeff Jirsa >> >> >>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <ostef...@gmail.com> wrote: >>> >>> Hi Jeff, >>> >>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256). >>> >>> Stefano >>> >>>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jji...@gmail.com> wrote: >>>> What version? >>>> >>>> Single disk or JBOD? >>>> >>>> Vnodes? >>>> >>>> -- >>>> Jeff Jirsa >>>> >>>> >>>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <ostef...@gmail.com> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so >>>>> far. >>>>> Based on the source code it seems that this option doesn't affect >>>>> compactions while bootstrapping. >>>>> >>>>> I am getting quite confused as it seems I am not able to bootstrap a node >>>>> if I don't have at least 6/7 times the disk space used by other nodes. >>>>> This is weird. The host I am bootstrapping is using a SSD. Also >>>>> compaction throughput is unthrottled (set to 0) and the compacting >>>>> threads are set to 8. >>>>> Nevertheless, primary ranges from other nodes are being streamed, but >>>>> data is never compacted away. >>>>> >>>>> Does anybody know anything else I could try? >>>>> >>>>> Cheers, >>>>> Stefano >>>>> >>>>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <ostef...@gmail.com> >>>>>> wrote: >>>>>> Other little update: at the same time I see the number of pending tasks >>>>>> stuck (in this case at 1847); restarting the node doesn't help, so I >>>>>> can't really force the node to "digest" all those compactions. In the >>>>>> meanwhile the disk occupied is already twice the average load I have on >>>>>> other nodes. >>>>>> >>>>>> Feeling more and more puzzled here :S >>>>>> >>>>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <ostef...@gmail.com> >>>>>>> wrote: >>>>>>> I have been trying to add another node to the cluster (after upgrading >>>>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all >>>>>>> nodes have been streaming to the joining node approx 1/3 of their >>>>>>> SSTables, basically their whole primary range (using RF=3)? >>>>>>> >>>>>>> Is this expected/normal? >>>>>>> I was under the impression only the necessary SSTables were going to be >>>>>>> streamed... >>>>>>> >>>>>>> Thanks for the help, >>>>>>> Stefano >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <k...@instaclustr.com> >>>>>>> wrote: >>>>>>>>> But if it also streams, it means I'd still be under-pressure if I am >>>>>>>>> not mistaken. I am under the assumption that the compactions are the >>>>>>>>> by-product of streaming too many SStables at the same time, and not >>>>>>>>> because of my current write load. >>>>>>>> >>>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the >>>>>>>> performance impact from the node being backed up with compactions. If >>>>>>>> you haven't already, you should try disable stcs in l0 on the joining >>>>>>>> node. You will likely still need to do a lot of compactions, but >>>>>>>> generally they should be smaller. The option is >>>>>>>> -Dcassandra.disable_stcs_in_l0=true >>>>>>>>> I just noticed you were mentioning L1 tables too. Why would that >>>>>>>>> affect the disk footprint? >>>>>>>> >>>>>>>> If you've been doing a lot of STCS in L0, you generally end up with >>>>>>>> some large SSTables. These will eventually have to be compacted with >>>>>>>> L1. Could also be suffering the problem of streamed SSTables causing >>>>>>>> large cross-level compactions in the higher levels as well. >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >