Re: [EXTERNAL] Re: sstableloader & num_tokens change
Odd. Have you seen this behavior? I ran a test last week, loaded snapshots from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike. That's not to say that it didn't happen, but I think I'd have noticed as I was loading approx 250GB x 4 (although sequentially rather than 4x sstableloader in parallel). Also, thanks to everyone for confirming no issue with num_tokens and sstableloader; appreciate it. On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R wrote: > I would suggest to be aware of potential data size expansion. If you load > (for example) three copies of the data into a new cluster (because the RF > of the origin cluster is 3), it will also get written to the RF of the new > cluster (3 more times). So, you could see data expansion of 9x the original > data size (or, origin RF * target RF), until compaction can run. > > > > > > Sean Durity – Staff Systems Engineer, Cassandra > > > > *From:* Erick Ramirez > *Sent:* Friday, January 24, 2020 11:03 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change > > > > > > If I may just loop this back to the question at hand: > > I'm curious if there are any gotchas with using sstableloader to restore > snapshots taken from 256-token nodes into a cluster with 32-token (or your > preferred number of tokens) nodes (otherwise same # of nodes and same RF). > > > > No, there isn't. It will work as designed so you're good to go. Cheers! > > > > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: [EXTERNAL] Re: sstableloader & num_tokens change
I would suggest to be aware of potential data size expansion. If you load (for example) three copies of the data into a new cluster (because the RF of the origin cluster is 3), it will also get written to the RF of the new cluster (3 more times). So, you could see data expansion of 9x the original data size (or, origin RF * target RF), until compaction can run. Sean Durity – Staff Systems Engineer, Cassandra From: Erick Ramirez Sent: Friday, January 24, 2020 11:03 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: sstableloader & num_tokens change If I may just loop this back to the question at hand: I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF). No, there isn't. It will work as designed so you're good to go. Cheers! The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: sstableloader & num_tokens change
Hello Concerning the original question, I agreed with @eric_ramirez, sstableloader is transparent for token allocation number. just for info @voytek, check this post out https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html You lay be interested to now if you have your cluster well balanced with 32 tokens. 32 tokens seems to be the future default value, but changing the default vnodes token numbers seems not to be so straight forward cheers Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez wrote: > On the subject of DSBulk, sstableloader is the tool of choice for this > scenario. > > +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader > for CSV/JSON formats. Cheers! >
Re: sstableloader & num_tokens change
On the subject of DSBulk, sstableloader is the tool of choice for this scenario. +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader for CSV/JSON formats. Cheers!
Re: sstableloader & num_tokens change
> If I may just loop this back to the question at hand: > > I'm curious if there are any gotchas with using sstableloader to restore > snapshots taken from 256-token nodes into a cluster with 32-token (or your > preferred number of tokens) nodes (otherwise same # of nodes and same RF). > No, there isn't. It will work as designed so you're good to go. Cheers! >
Re: sstableloader & num_tokens change
If I may just loop this back to the question at hand: I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF). On Fri, Jan 24, 2020 at 11:15 AM Sergio wrote: > https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html > > Just skimming through the docs > > I see examples by loading from CSV / JSON > > Maybe there is some other command or doc page that I am missing > > > > > On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth wrote: > >> Dsbulk works same as sstableloder. >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Jan 24, 2020, at 10:40 AM, Sergio wrote: >> >> >> I was wondering if that improvement for token allocation would work even >> with just one rack. It should but I am not sure. >> >> Does Dsbulk support migration cluster to cluster without CSV or JSON >> export? >> >> Thanks and Regards >> >> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: >> >>> Instead of sstableloader consider dsbulk by datastax. >>> >>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < >>> rpinchb...@tripadvisor.com> wrote: >>> >>>> Jon Haddad has previously made the case for num_tokens=4. His >>>> Accelerate 2019 talk is available at: >>>> >>>> >>>> >>>> https://www.youtube.com/watch?v=swL7bCnolkU >>>> >>>> >>>> >>>> You might want to check that out. Also I think the amount of effort >>>> you put into evening out the token distribution increases as vnode count >>>> shrinks. The caveats are explored at: >>>> >>>> >>>> >>>> >>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>>> >>>> >>>> >>>> >>>> >>>> *From: *Voytek Jarnot >>>> *Reply-To: *"user@cassandra.apache.org" >>>> *Date: *Friday, January 24, 2020 at 10:39 AM >>>> *To: *"user@cassandra.apache.org" >>>> *Subject: *sstableloader & num_tokens change >>>> >>>> >>>> >>>> *Message from External Sender* >>>> >>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different >>>> 4 node RF=3 cluster. >>>> >>>> >>>> >>>> I've read that 256 is not an optimal default num_tokens value, and that >>>> 32 is likely a better option. >>>> >>>> >>>> >>>> We have the "opportunity" to switch, as we're migrating environments >>>> and will likely be using sstableloader to do so. I'm curious if there are >>>> any gotchas with using sstableloader to restore snapshots taken from >>>> 256-token nodes into a cluster with 32-token nodes (otherwise same # of >>>> nodes and same RF). >>>> >>>> >>>> >>>> Thanks in advance. >>>> >>>
Re: sstableloader & num_tokens change
https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html Just skimming through the docs I see examples by loading from CSV / JSON Maybe there is some other command or doc page that I am missing On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth wrote: > Dsbulk works same as sstableloder. > > > Regards, > > Nitan > > Cell: 510 449 9629 > > On Jan 24, 2020, at 10:40 AM, Sergio wrote: > > > I was wondering if that improvement for token allocation would work even > with just one rack. It should but I am not sure. > > Does Dsbulk support migration cluster to cluster without CSV or JSON > export? > > Thanks and Regards > > On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: > >> Instead of sstableloader consider dsbulk by datastax. >> >> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < >> rpinchb...@tripadvisor.com> wrote: >> >>> Jon Haddad has previously made the case for num_tokens=4. His >>> Accelerate 2019 talk is available at: >>> >>> >>> >>> https://www.youtube.com/watch?v=swL7bCnolkU >>> >>> >>> >>> You might want to check that out. Also I think the amount of effort you >>> put into evening out the token distribution increases as vnode count >>> shrinks. The caveats are explored at: >>> >>> >>> >>> >>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>> >>> >>> >>> >>> >>> *From: *Voytek Jarnot >>> *Reply-To: *"user@cassandra.apache.org" >>> *Date: *Friday, January 24, 2020 at 10:39 AM >>> *To: *"user@cassandra.apache.org" >>> *Subject: *sstableloader & num_tokens change >>> >>> >>> >>> *Message from External Sender* >>> >>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different >>> 4 node RF=3 cluster. >>> >>> >>> >>> I've read that 256 is not an optimal default num_tokens value, and that >>> 32 is likely a better option. >>> >>> >>> >>> We have the "opportunity" to switch, as we're migrating environments and >>> will likely be using sstableloader to do so. I'm curious if there are any >>> gotchas with using sstableloader to restore snapshots taken from 256-token >>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >>> same RF). >>> >>> >>> >>> Thanks in advance. >>> >>
Re: sstableloader & num_tokens change
Dsbulk works same as sstableloder. Regards, Nitan Cell: 510 449 9629 > On Jan 24, 2020, at 10:40 AM, Sergio wrote: > > > I was wondering if that improvement for token allocation would work even with > just one rack. It should but I am not sure. > > Does Dsbulk support migration cluster to cluster without CSV or JSON export? > > Thanks and Regards > >> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: >> Instead of sstableloader consider dsbulk by datastax. >> >>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback >>> wrote: >>> Jon Haddad has previously made the case for num_tokens=4. His Accelerate >>> 2019 talk is available at: >>> >>> >>> >>> https://www.youtube.com/watch?v=swL7bCnolkU >>> >>> >>> >>> You might want to check that out. Also I think the amount of effort you >>> put into evening out the token distribution increases as vnode count >>> shrinks. The caveats are explored at: >>> >>> >>> >>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>> >>> >>> >>> >>> >>> From: Voytek Jarnot >>> Reply-To: "user@cassandra.apache.org" >>> Date: Friday, January 24, 2020 at 10:39 AM >>> To: "user@cassandra.apache.org" >>> Subject: sstableloader & num_tokens change >>> >>> >>> >>> Message from External Sender >>> >>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 >>> node RF=3 cluster. >>> >>> >>> >>> I've read that 256 is not an optimal default num_tokens value, and that 32 >>> is likely a better option. >>> >>> >>> >>> We have the "opportunity" to switch, as we're migrating environments and >>> will likely be using sstableloader to do so. I'm curious if there are any >>> gotchas with using sstableloader to restore snapshots taken from 256-token >>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >>> same RF). >>> >>> >>> >>> Thanks in advance.
Re: sstableloader & num_tokens change
Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new Cassandra are unnecessary steps in my case. On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth wrote: > Instead of sstableloader consider dsbulk by datastax. > > On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < > rpinchb...@tripadvisor.com> wrote: > >> Jon Haddad has previously made the case for num_tokens=4. His Accelerate >> 2019 talk is available at: >> >> >> >> https://www.youtube.com/watch?v=swL7bCnolkU >> >> >> >> You might want to check that out. Also I think the amount of effort you >> put into evening out the token distribution increases as vnode count >> shrinks. The caveats are explored at: >> >> >> >> >> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >> >> >> >> >> >> *From: *Voytek Jarnot >> *Reply-To: *"user@cassandra.apache.org" >> *Date: *Friday, January 24, 2020 at 10:39 AM >> *To: *"user@cassandra.apache.org" >> *Subject: *sstableloader & num_tokens change >> >> >> >> *Message from External Sender* >> >> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 >> node RF=3 cluster. >> >> >> >> I've read that 256 is not an optimal default num_tokens value, and that >> 32 is likely a better option. >> >> >> >> We have the "opportunity" to switch, as we're migrating environments and >> will likely be using sstableloader to do so. I'm curious if there are any >> gotchas with using sstableloader to restore snapshots taken from 256-token >> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >> same RF). >> >> >> >> Thanks in advance. >> >
Re: sstableloader & num_tokens change
I was wondering if that improvement for token allocation would work even with just one rack. It should but I am not sure. Does Dsbulk support migration cluster to cluster without CSV or JSON export? Thanks and Regards On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: > Instead of sstableloader consider dsbulk by datastax. > > On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < > rpinchb...@tripadvisor.com> wrote: > >> Jon Haddad has previously made the case for num_tokens=4. His Accelerate >> 2019 talk is available at: >> >> >> >> https://www.youtube.com/watch?v=swL7bCnolkU >> >> >> >> You might want to check that out. Also I think the amount of effort you >> put into evening out the token distribution increases as vnode count >> shrinks. The caveats are explored at: >> >> >> >> >> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >> >> >> >> >> >> *From: *Voytek Jarnot >> *Reply-To: *"user@cassandra.apache.org" >> *Date: *Friday, January 24, 2020 at 10:39 AM >> *To: *"user@cassandra.apache.org" >> *Subject: *sstableloader & num_tokens change >> >> >> >> *Message from External Sender* >> >> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 >> node RF=3 cluster. >> >> >> >> I've read that 256 is not an optimal default num_tokens value, and that >> 32 is likely a better option. >> >> >> >> We have the "opportunity" to switch, as we're migrating environments and >> will likely be using sstableloader to do so. I'm curious if there are any >> gotchas with using sstableloader to restore snapshots taken from 256-token >> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >> same RF). >> >> >> >> Thanks in advance. >> >
Re: sstableloader & num_tokens change
Instead of sstableloader consider dsbulk by datastax. On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback wrote: > Jon Haddad has previously made the case for num_tokens=4. His Accelerate > 2019 talk is available at: > > > > https://www.youtube.com/watch?v=swL7bCnolkU > > > > You might want to check that out. Also I think the amount of effort you > put into evening out the token distribution increases as vnode count > shrinks. The caveats are explored at: > > > > > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html > > > > > > *From: *Voytek Jarnot > *Reply-To: *"user@cassandra.apache.org" > *Date: *Friday, January 24, 2020 at 10:39 AM > *To: *"user@cassandra.apache.org" > *Subject: *sstableloader & num_tokens change > > > > *Message from External Sender* > > Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 > node RF=3 cluster. > > > > I've read that 256 is not an optimal default num_tokens value, and that 32 > is likely a better option. > > > > We have the "opportunity" to switch, as we're migrating environments and > will likely be using sstableloader to do so. I'm curious if there are any > gotchas with using sstableloader to restore snapshots taken from 256-token > nodes into a cluster with 32-token nodes (otherwise same # of nodes and > same RF). > > > > Thanks in advance. >
Re: sstableloader & num_tokens change
Jon Haddad has previously made the case for num_tokens=4. His Accelerate 2019 talk is available at: https://www.youtube.com/watch?v=swL7bCnolkU You might want to check that out. Also I think the amount of effort you put into evening out the token distribution increases as vnode count shrinks. The caveats are explored at: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html From: Voytek Jarnot Reply-To: "user@cassandra.apache.org" Date: Friday, January 24, 2020 at 10:39 AM To: "user@cassandra.apache.org" Subject: sstableloader & num_tokens change Message from External Sender Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster. I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option. We have the "opportunity" to switch, as we're migrating environments and will likely be using sstableloader to do so. I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token nodes (otherwise same # of nodes and same RF). Thanks in advance.
sstableloader & num_tokens change
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster. I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option. We have the "opportunity" to switch, as we're migrating environments and will likely be using sstableloader to do so. I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token nodes (otherwise same # of nodes and same RF). Thanks in advance.